Using the attached dataset, perform a multiple regression test/analysis of life expectancy in relation to country, and BMI.
Use the steps below as a guide. The deliverable will be R file with code, and a report of findings.
• Split the data into training and test sets, with the test set including at least the last 12 months of data.
• First develop a simple time series regression model with just one predictor and one forecast variable using the training data.
• Second, create another model where you will include trend and seasonality in your model as dummy variables
• If you have more predictor variables available, include them, and create a third model.
• Draw charts showing the relationship between the variables (hint: correlation, ggpairs() etc.)
• For each model, create regression summary report, explore the model coefficients and interpret them. Talk about the significant variables and why do you think they make business sense to be significant.
• For each model, plot the actual data for all years against the forecast with forecast intervals (80% and 95% confidence). What recommendation/conclusions can you draw based on your plots?
• For each model, report residual diagnostics. What can you conclude from the residual diagnostics? Do your model residuals have serial correlation? If so, what does this mean?
• For each model, report the accuracy in forecasting the test data. How accurate are your models and which model is most accurate?
Prepare a business document with executive summary, description of the dataset, description of the variables, their relationships with each other, model selection, significant parameters, and final conclusion.
Provide nicely formatted charts and graphs.
Provide all R-codes as an appendix with clear labeling/comments so that it is easy to figure out which section of the report used which R code.