top of page

Sales Forecasting Using Auto Correlation and Regressive Models

  • Writer: Sam Hamill
    Sam Hamill
  • Oct 9, 2020
  • 3 min read

GOAL: Create a model to accurately predict future air quality based on past data


1. What Are We Testing?


In order to accurately predict future air quality value, we need to create a model that comes close to past data (data we already have). This will give business owners confidence in the model to predict future values.


2. Our Data


3. Setting Up Data


In our data, what we are looking to eventually predict in the future is going to be NO2 AQI (Air Quality). Knowing this, we first need to figure out how to find the best model to do the job. To do this we will first need to create a LAG1 column for the NO2 AQI.



4. Running a Regression With Residuals


In order to find the best model for the job of predicting future air quality, we need to find the number of LAGs required to do that. Knowing this, let's run the regression analysis with our LAG1.


Since, we are looking to predict the NO2 AQI (Air Quality), this will be our Y variable, while the NO2 AQI LAG1 will be our X variable.

Based on the above snapshot of the regression analysis, is there anything you notice? Based on the Adjusted R Square and Standard Error, the model isn't very good is it? This lets us know we will probably need to do more LAGs in order to find a more accurate model.


5. Use Residuals to Create Auto Correlation Table

In order to accomplish this table we will need to utilize the CORREL function in the Auto Correlation Column. This will result in the appropriate table needed to visualize the minimum number of LAGs needed to accurately predict values or Air Quality.


6. Which LAG to Use?


Now that we have the Auto Correlation table we can visualize the minimum number of LAGs needed to accurately forecast. The first thing we are going to do is create a scatter plot using the above table.



Now that we have the plot, the rule of thumb is to look at which points are above 0.2 or below -0.2. Using this information, we can see both LAG12 and LAG14 meet these requirements. Knowing this, there might be some LAGs beyond LAG14 that might also meet the requirements, so let's test LAG12(1 year in past), LAG24(2 years in past), LAG36(3 years into past), and LAG48 (4 years into past).


7. Testing LAGs


FYI: Keep in mind, when we run a regression analysis the Input Y and Input X variable need to have the same number of columns.


Lets start with LAG12

After running the regression analysis for LAG12 we can see that we have a noticeable improvement from the LAG1 model. Let's run an analysis for LAG24 to see if the improvement continues.


Another significant improvement! Let's keep going with LAG36.


The improvements was not as significant this time around. Let's give it one more go with LAG48.


It seems like we are nearing our plateau so let's stop here for now and create our model with LAG48.


8. Creating Prediction Model from LAG48 Model


Since we have concluded that LAG48 seems to provide the best model of the four, let's incorporate the rest of the data in the dataset to create the prediction model. From here we will run the regression analysis a final time.


















9. Testing Model Accuracy Using Past Data


FYI: Let's set a goal of testing Air Quality in July 2014. Knowing this and knowing that LAG48 provides the best model, we need to use data from four years previous (year 2013,2012,2011 and 2010)


Using the above analysis let's create the model required to make a prediction in the past. In order to do that we need to find the past values of NO2 AGI. LAG12 refers to July value in 2013, LAG24 refers to July value in 2012, etc.



Past Data Value

1

21.6774193548387

24.4516129032258

21.2333333333333

17.2

10.3548387096774

9.16129032258065

9.9

9

10.0832005483871

11.18070784

9.74329403333333

6.07551556


Now that we have these past values to go with each coefficient, we can use the SUMPRODUCT function to find the prediction value!



CONCLUSION: After using the SUMPRODUCT function for these two columns of values we are left with a prediction of 22.24533 for the Air Quality in July 2014. Let's see what the actual value was:



Based on our prediction accuracy, we have created a good model that can be utilized to

predict future values!

Comments


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page