Time-series analysis with AUTOREGRESSIVE MOVING AVERAGES (ARMA)

Abhijeet Kamble
7 min readOct 10, 2019

Random experiments and random variables

Random experiments are opportunities to observe the outcome of a chance event. If we were rolling dice, the random experiment is observing and recording the outcome, which brings us to a random variable. A random variable is the numerical outcome of a random experiment. If we rolled a two and a three, our random variable would be five.

This would be an example of a discrete random variable since when we roll the die, the possible outcomes are one, two, three, four, five, or six. These are discreet numbers, so we cannot get an outcome of 2.4 or 5.99. On the other hand, if we are measuring the time it took runners to run 100 meters, now the outcomes are continuous.

A collection of random variables is called a stochastic process. A stochastic process can be stationary, stationarity means, its values do not change over a factor of time.

We all know how a line is represented, Y = mx + b

Y is equal to mx plus b, where y is the variable that we’re trying to predict, x is some set of data that we’ve got, and then a is what we call m is the slope. It’s the relationship between x and y, and b is some sort of base value, otherwise called the y-intercept. We’re trying to fit a line to our data. In general, a line gives us a starting point for predicting the Y value based on the x value.

basic linear regression model is something like Y = xB + e

As we can see both of them are very similar, here, we see the Y value is the dependent variable and the x is the dependent variable and we are trying to get the best fit or a line for the predicted value of ‘Y’ based on ‘x’. A more complex version of this is what’s known as multiple regression, and that’s going to use multiple, different sets of x’s to try to predict the value ‘y’.

What is a time series?

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. I feel a time series is a collection of random variables indexed in time.

We can use this sequence in order to extract meaningful statistics view patterns in data and other characteristics of the data. Time series provide the opportunity to forecast future values based on previous values, mostly used to forecast trends in economics, weather, capacity planning, Risk management and stocks for this post.

Data

I have used data from a kaggle data set and time series API.

If we plotted this data, we would get something like this.

Time series analysis usually consists of trend, seasonal and errors, I have decomposed them as follows.

We can see, there is an obvious upward trend and we can confirm that with data, Jeff Bezoz being the richest man alive with amazon growing a lot. This upward trend was gradual earlier, but it increased exponentially because of AWS(amazon web services)

We are using this model to predict the stock prices of amazon, a best way to predict stocks is through moving averages. We might want to go through and develop a moving average that sort of smooths out some of the inter-day fluctuations in the stocks, and this gives us a better idea of where the trend is over time, and potentially allows us to develop some sort of a trading strategy around that. To do this, we’ll need to put together a moving average for the stock in question.

There are different combinations of moving averages, we are trying to look for something that is going to help us to smooth out the data, but at the same time, capture what’s really going on with the stock. Reasonable minds could differ on what type of moving average that might be. In our case, let’s just say that we’re going to use a moving average over the last two years. So, what we need to do then is go out two years, and then go back and average the prices from the previous two years.

There is another way that is ingrined in a SARIMAX model, which is p,d,q values, which optimize the model in my interest because I do not have other variables that I am considering which could potentially drive my stocks. We can do a grid search for these values in the range (0,2), because they are three values. A sample example would look like this

ARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:44929.81118772419
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:40862.1409133712
ARIMA(0, 0, 0)x(0, 1, 0, 12)12 - AIC:27223.68012203509
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:27105.889277035287
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:26990.948653346153
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:26984.93513938784
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:27111.03801408202
ARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:27094.050620986014
ARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:40784.52085785904
ARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:36813.25909358517
ARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:24428.461308469847
ARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:24338.5925225698
ARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:24274.613349213498
ARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:27442.79295300477
ARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:24353.530388003874
ARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:24339.28166775049
ARIMA(0, 1, 0)x(0, 0, 0, 12)12 - AIC:19709.113840489765
ARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:19638.246455470813
ARIMA(0, 1, 0)x(0, 1, 0, 12)12 - AIC:21857.121695046477
ARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:19610.07412416587
ARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:19643.43467324836
ARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:19639.44649165436
ARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:20776.627923808832
ARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:19608.62913989576
ARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:19685.555910564344
ARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:19614.408562646815
ARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:21836.602191405334
ARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:19588.066501426496
ARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:19625.123662900285
ARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:19615.633999147754
ARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:20762.893365248525
ARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:19585.89949980239
ARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:19696.823820843863
ARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:19644.424994667563
ARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:21736.430358589456
ARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:19611.450034828522
ARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:19623.141781259677
ARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:19624.919842116753
ARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:20713.039534778174
ARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:19610.432288304906
ARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:19676.612363886677
ARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:19628.59134629185
ARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:21691.024674182925
ARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:19590.695753054977
ARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:19608.471281482973
ARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:19606.13257975408
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:20688.153786629882
ARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:19588.87655406686
ARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:19692.55669396984
ARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:19621.375847990577
ARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:21844.543744424107
ARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:19595.44655959739
ARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:19621.01793830233
ARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:19622.594983639956
ARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:20757.94605550419
ARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:19593.48476707819
ARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:19684.948044797922
ARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:19614.03965430033
ARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:21691.94553706001
ARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:19587.178750448988
ARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:19619.337245966366
ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:19615.306015134967
ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:20756.528009620055
ARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:19585.252712355243

As we can see this is returning an AIC score ie (Alkaike information criterion), this is a ranked score which is conveniently returned with ARIMA models fitted using statsmodels. The AIC measures how well a model fits the data while taking into account the overall complexity of the model. A model that fits the data very well while using lots of features will be assigned a larger AIC score than a model that uses fewer features to achieve the same goodness-of-fit. Therefore, we are interested in finding the model that yields the lowest AIC value.

Once, we fit our data in the model we have something that looks like this.

This results column describes how my models performed, the coeff column shows the importance of each feature ie AR1 and AR2 and the P>|z| shows us how powerful of an impact is being created on my variables. In this case for constants = 0.994, which implies that my data needs to alter before I do any further calculations on it.

time series prediction

Just for fun, I also tried to compare the closing values of Amazon and apple stocks.

Whereas R-squared is a relative measure of fit, RMSE is an absolute measure of fit. As the square root of a variance, RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate better fit.
model diagnostics test

--

--