An Autoregresive model predicts future values using a series of past values. It accomplishes this by optimizing regressor (past value) parameters through minimizing an objective function, $J = ||A\theta - y||^2$.
A Multi-Objective Autoregressive model predicts future values using multiple series of past values. It accomplishes this by optimizing regressor (past value) parameters through minimizing a weighted sum of many objective functions, $J = \lambda_1||A_1\theta - y_1||^2 + ... + \lambda_k||A_k\theta - y_k||^2$.
A Multi-Objective Autoregressive model may have utility in forecasting time series that are generated by the same process, but the process has intervals of inactivity. Examples may be output from a manufacturing process that only runs 6:00am-8:00pm, or asset prices on an exchange where trading only occurs 9:30am-4:00pm.
Given a time series $z = z_1 ... z_T$ with $T$ time steps, an autoregressive least squares model with $M$ lags (regressors) and $T-M$ known outcomes is fit:
$y = A \theta$
Where:
$\theta = \theta_{1} \dots \theta_{M}$
$=$
$\left(\begin{array}{cc}
\theta_1\\
\vdots\\
\theta_M\\
\end{array}\right)
$
$\theta$ is an $M$ vector that represents the coefficients associated with each lag (regressor).
$y = y_{1} \dots y_{T-M}$
$=$
$\left(\begin{array}{cc}
y_1 = z_{M+1}\\
\vdots\\
y_{T-M} = z_{T}\\
\end{array}\right)
$
$y$ is a $T-M$ vector that represents the known outcome variable data ($z_{t+1}$ values).
$A =$
$\left(\begin{array}{cc}
z_{M} & \dots & z_1 \\
\vdots & \ddots & \vdots \\
z_{T-1} & \dots & z_{T-M}\\
\end{array}\right)
$
$A$ is a $T-M \times M$ matrix, where each row contains the values of the time series $M$ lags prior to the corresponding $y_{t+1}$ value.
There can be a maximum of $T-1$ lags; therefore, the system is over-determined. The autoregressive parameters are found through minimizing the objective function, $||A\theta - y||^2$:
$\nabla f(\hat{\theta}) = 2A^T (A\hat{\theta}-y) = 0$
$A^T A \hat{\theta} = A^{T}y$
Least Squares Solution:
$\hat{\theta} = (A^{T}A)^{-1}A^{T}y$
import numpy as np
import matplotlib.pyplot as plt
#Main functions:
'''main_Autoregressive_model_fit
Fits an autoregressive model to time series data and outputs optimized parameters
@Param training_time_series: time series of values (training set)
@Param lags: number of parameters (lags) to predict next time step value
@Return AR_parameters: array of optimized parameters of AR model
'''
'''main_Autoregressive_model_forecast
Forecasts future values of a time series using autoregressive model parameters
@Param testing_time_series: time series of values (test set)
@Param num_forecast_steps: number of values to forecast
@Return forecasted_vales: array of forecasted values
@Return plotted_forecast: plot of time series and forecast
'''
#Helper functions:
''' make_regressor_matrix
Makes "A matrix" of regressor data to be used for AR model
@Param time_series: time series of values
@Param AR_parameters: array of optimized parameters of AR model
@Param lags: number of parameters (lags) to predict next time step value
@Return regressor_matrix: "A matrix" of regressor data
'''
''' make_outcome_vector
Makes "y vector" of outcome variable data to be used for AR model
@Param time_series: time series of values
@Param lags: number of parameters (lags) to predict next time step value
@Return outcome_vector: "y vector" of outcome variable data
'''
''' autoregressive_model_param
Produce autoregressive model parameters using OLS
@Param regressor_matrix: A matrix
@Param outcome_vector: "y vector" of outcome variable data
@Return autoregressive_model_param: array of optimized parameters of AR model
'''
''' forecast_autoregressive_model
Produce predictions for time series using autoregressive model parameters
@Param time_series: time series of values
@Param parameters: coefficient values from autoregressive model
@Param num_forecast_steps: Number of time steps to be forecasted
@Return forecasted_vales: array of forecasted values
'''
''' plot_forecast_autoregressive_model
Produce plot of time series and AR predictions
@Param time_series: time series of values
@Param forecasted_values: array of forecasted values
@Param x_start=0: lower bound of x axis plot (default is zero)
@Return plotted_forecast: plot of time series and forecast
'''
def main_Autoregressive_model_fit(training_time_series, lags):
#Make regressor 'A' matrix
regressor_matrix = make_regressor_matrix(training_time_series, lags)
#Make outcome 'y' vector
outcome_vector = make_outcome_vector(training_time_series, lags)
#Find optimal model parameters
AR_parameters = autoregressive_model_param(regressor_matrix, outcome_vector)
return AR_parameters
def main_Autoregressive_model_forecast(testing_time_series, AR_parameters,
num_forecast_steps, x_start=0):
#Forcast future values
forecasted_values = forecast_autoregressive_model(testing_time_series, AR_parameters,
num_forecast_steps)
#Plot time series and future values
plot_forecast_autoregressive_model(testing_time_series, forecasted_values,
num_forecast_steps, x_start)
return forecasted_values
def make_regressor_matrix(time_series, lags):
#Preallocate A (regressor) matrix
A = np.empty([(len(time_series)-lags),lags])
#Set time series indexing variables
upper_bound = lags
lower_bound = 0
for i,row in enumerate(A):
#Index time series
A[i] = time_series[lower_bound:upper_bound]
A[i] = np.flip(A[i])
#Iterate index variables
lower_bound += 1
upper_bound += 1
return A
def make_outcome_vector(time_series, lags):
#Preallocate y (outcome) vector
y = np.empty([(len(time_series)-lags),1])
#Set time series indexing variables
y_index = lags
for i,row in enumerate(y):
#Index time series
y[i] = time_series[y_index]
#Iterate index variables
y_index += 1
return y
def autoregressive_model_param(A_matrix, outcome_vector):
#Multiply A transpose by A
LS_step_1 = np.linalg.inv(np.matmul(np.transpose(A_matrix), A_matrix))
#Inverse LS_step_1
LS_step_2 = np.matmul(LS_step_1, np.transpose(A_matrix))
#Multiply LS_step_2 by outcome vector
LS_step_3 = np.matmul(LS_step_2, outcome_vector)
#Declare parameters (theta_hat)
theta_hat = LS_step_3
return theta_hat
def forecast_autoregressive_model(time_series, parameters, num_forecast_steps):
#Number of parameters
num_param = len(parameters)
#Preallocate new time_series
forecast_time_series = np.empty([1,(len(time_series)+num_forecast_steps)])
for k,t_series_value in enumerate(time_series):
forecast_time_series[0][k] = time_series[k]
#Preallocate array of forecasted value
forecasted_values = np.zeros([num_forecast_steps,1])
m=0
for n,value in enumerate(forecasted_values):
j = 1
for param in parameters:
#Add component to forecast value
forecasted_values[n] += param*forecast_time_series[0][-(num_forecast_steps+j-m)]
#Iterate index
j += 1
#Allocate forecast value to time series
forecast_time_series[0][(len(time_series)+n)] = value
#Allocate forecast value to forecasted values
forecasted_values[n] = value
#Update indexes
n += 1
m += 1
return forecasted_values
def plot_forecast_autoregressive_model(time_series, forecasted_values,
num_forecast_steps, x_start):
#Preallocate and connect forecasted values to final time series point
forecasted_values_impute = np.empty([len(forecasted_values)+1])
forecasted_values_impute[0] = time_series[-1]
o=1
for j,value in enumerate(forecasted_values):
forecasted_values_impute[o] = forecasted_values[j]
o += 1
plt.figure()
plt.plot(range(len(time_series)),time_series)
plt.plot(range(len(time_series)-1,len(time_series)+len(forecasted_values)),
forecasted_values_impute)
plt.xlim(x_start, (len(time_series)+num_forecast_steps-1))
return plt.show()
EEG data, measured in microvolts at millisecond intervals, published by the Memory and Cognition lab at Swarthmore College.
#Read in numpy array of eeg data
eeg_data_original = np.load('eegdata.npy')
#Copy eeg data
eeg_data = np.copy(eeg_data_original)
#Declare training time series
eeg_training_data = eeg_data[0]
#Declare testing time series
eeg_testing_data = eeg_data[100]
#Fit autoregressive model to training data
eeg_parameters = main_Autoregressive_model_fit(eeg_training_data, 50)
#Forecast future values from testing data
eeg_forecast = main_Autoregressive_model_forecast(eeg_testing_data, eeg_parameters, 50,
x_start = len(eeg_testing_data)-300)
#Difference eeg data
diff_eeg_data = np.diff(eeg_data)
#Declare training time series
diff_eeg_training_data = diff_eeg_data[0]
#Declare testing time series
diff_eeg_testing_data = diff_eeg_data[100]
#Fit autoregressive model to training data
diff_eeg_parameters = main_Autoregressive_model_fit(diff_eeg_training_data, 50)
#Forecast future values from testing data
diff_eeg_forecast = main_Autoregressive_model_forecast(diff_eeg_testing_data, diff_eeg_parameters, 50,
x_start = len(diff_eeg_testing_data)-100)
Given a set of $k$ normalized time series datasets, each with $T$ time steps, a Multi-Objective Autoregressive least squares model with $M$ lags (regressors) and $k \times T-M$ known outcomes can be fit:
$\tilde{y} = \tilde{A}\theta$
Where:
$\theta = \theta_{1} \dots \theta_{M}$
$=$
$\left(\begin{array}{cc}
\theta_1\\
\vdots\\
\theta_M\\
\end{array}\right)
$
$\theta$ is an $M$ vector that represents the coefficients associated with each lag (regressor).
$\tilde{y}$
$=$
$\left(\begin{array}{cc}
\sqrt{\lambda_1} y_1\\
\vdots\\
\sqrt{\lambda_k} y_{k}\\
\end{array}\right)
$
$\tilde{y}$ is a stacked vector consisting of $k(T-M)$ vectors that represents the known outcome variable data ($z_{t+1}$ values). Each $y$ vector is multiplied by the weighting factor $\sqrt{\lambda_{k}}$
$\tilde{A}$
$=$
$\left(\begin{array}{cc}
\sqrt{\lambda_1}A_1\\
\vdots\\
\sqrt{\lambda_k}A_k\\
\end{array}\right)
$
$\tilde{A}$ is a vertically stacked matrix consistenting of $k$($T-M \times M$) matrices, where each row contains the values of the time series $M$ lags prior to the corresponding $y_{t+1}$ value. Each $A$ matrix is multiplied by the weighting factor $\sqrt{\lambda_{k}}$.
Given $\tilde{A}$ has linearly independent columns, the minimization of the aggregate objective function produces the optimal autoregressive parameters.
Least Squares Solution:
$\hat{\theta} = (\tilde{A}^{T}\tilde{A})^{-1}\tilde{A}^{T}\tilde{y}$
$\hat{\theta} = (\sum_{i=1}^{k} \lambda_{i}A_{i}^{T}A_{i})^{-1}(\sum_{i=1}^{k} \lambda_{i}A_{i}^{T}y_{i})$
The model error can be calculated using Root Mean Square Error.
Where:
$RMSE = \sqrt{\cfrac{\|Forecast_{i}-Actual_{i}\|^{2}}{N}}$
#Main Functions
'''main_MO_Autoregressive_model_fit
Fits an autoregressive model to time series data and outputs optimized parameters
@Param training_time_series: time series of values (training set)
@Param weights: array of weights for each objective function
@Param lags: number of parameters (lags) to predict next time step value
@Return AR_parameters: array of optimized parameters of AR model
'''
'''main_Autoregressive_model_forecast
Forecasts future values of a time series using autoregressive model parameters
@Param testing_time_series: time series of values (test set)
@Param num_forecast_steps: number of values to forecast
@Return forecasted_vales: array of forecasted values
@Return plotted_forecast: plot of time series and forecast
'''
'''main_MO_Autoregressive_model_error
Compares MO Autoregressive model forecast to actual time series values and/
returns RMSE error, plot of forecast vs. actual, and forecast comparison values
@Param testing_time_series: time series of values (test set)
@Param lags: number of parameters (lags) to predict next time step value
@Param index_start_forecast: The index *(starting from zero)* in testing /
time series to begin the forecast
@Param AR_parameters: array of optimized parameters of AR model
@Param num_forecast_steps: number of values to forecast
@Param index_start_hold_out = 0: The index in time series to start the hold out data
@Param x_start = 0: lower bound of x axis plot
@Return (print) RMSE: Root mean square error for forecast/actual values
@Return plotted_comparison_forecast: plot of forecast compared to test time series
@Return comparison_values: array of forecasted values
'''
#Helper Functions
''' make_MO_regressor_matrix
Makes stacked "A tilde matrix" of regressor data to be used for MO AR model
@Param time_series: k number of time series (normalized)
@Param weights: weight associated with each A matrix
@Param lags: number of parameters (lags) to predict next time step value
@Return regressor_matrix: "A tilde matrix" of regressor data
'''
''' make_MO_outcome_vector
Makes stacked "y tilde vector" of outcome variable data to be used for MO AR model
@Param time_series: k number of time series (normalized)
@Param lags: number of parameters (lags) to predict next time step value
@Return outcome_vector: "y tilde vector" of outcome variable data
'''
''' autoregressive_model_param
Produce autoregressive model parameters using OLS
@Param regressor_matrix: A matrix
@Param outcome_vector: "y vector" of outcome variable data
@Return autoregressive_model_param: array of optimized parameters of AR model
'''
''' forecast_autoregressive_model
Produce predictions for time series using autoregressive model parameters
@Param time_series: time series of values
@Param parameters: coefficient values from autoregressive model
@Param num_forecast_steps: Number of time steps to be forecasted
@Return forecasted_vales: array of forecasted values
'''
''' plot_forecast_autoregressive_model
Produce plot of time series and AR predictions
@Param time_series: time series of values
@Param forecasted_values: array of forecasted values
@Param x_start=0: lower bound of x axis plot (default is zero)
@Return plotted_forecast: plot of time series and forecast
'''
''' plot_comparison_autoregressive_model
@Param testing_time_series: time series of values (test set)
@Param comparison_values: array of forecasted values
@Param num_forecast_steps: number of values to forecast
@Param index_start_forecast: The index *(starting from zero)* /
in testing time series to begin the forecast
@Param x_start: lower bound of x axis plot
'''
def main_MO_Autoregressive_model_fit(training_time_series, weights, lags):
#Make regressor 'A tilde' matrix
regressor_matrix = make_MO_regressor_matrix(training_time_series, weights, lags)
#Make outcome 'y tilde' vector
outcome_vector = make_MO_outcome_vector(training_time_series, weights, lags)
#Find optimal model parameters
AR_parameters = autoregressive_model_param(regressor_matrix, outcome_vector)
return AR_parameters
def main_MO_Autoregressive_model_error(testing_time_series, lags, index_start_forecast,
AR_parameters, num_forecast_steps,
index_start_hold_out = 0, x_start = 0):
#Ensure hold out time series data is greater than number of lags
if(lags > (index_start_forecast+1)):
raise ValueError("Too little time series values to accomodate chosen number of lags")
#Portion of testing time series to be held out for forecasting
hold_out_time_series = testing_time_series[index_start_hold_out:index_start_forecast]
#Forecasted values to be compared to actual values
comparison_values = forecast_autoregressive_model(hold_out_time_series, AR_parameters,
num_forecast_steps)
#Actual values from series to be compared to forecast
comparison_series = testing_time_series[index_start_forecast:(len(comparison_values)+index_start_forecast)]
#Compare forecast to actual values using RMSE
RMSE = np.sqrt(np.linalg.norm(comparison_values - comparison_series) / num_forecast_steps)
print('RMSE: ')
print(RMSE)
plot_comparison_autoregressive_model(testing_time_series, comparison_values,
num_forecast_steps,
index_start_forecast, x_start)
return comparison_values
def make_MO_regressor_matrix(multiple_time_series, weights, lags):
#List to hold regressor matrices
list_A_matrices = []
for k,series in enumerate(multiple_time_series):
#Make regressor matrix for each time series, append to list
regressor_matrix = make_regressor_matrix(multiple_time_series[k], lags)
list_A_matrices.append(regressor_matrix)
#Apply square root to weights
root_weights = np.sqrt(weights)
#Apply root weights to A matrices
for k,series in enumerate(list_A_matrices):
list_A_matrices[k] = np.multiply(list_A_matrices[k],root_weights[k])
#Convert A matrix list to tuple
tuple_A_matrices = tuple(list_A_matrices)
#Stack A matrices
A_tilde = np.concatenate(tuple_A_matrices, axis=0)
return A_tilde
def make_MO_outcome_vector(multiple_time_series, weights, lags):
#List to hold outcome vectors
list_y_vectors = []
for k,series in enumerate(multiple_time_series):
#Make outcome vector for each time series, append to list
outcome_vector = make_outcome_vector(multiple_time_series[k], lags)
list_y_vectors.append(outcome_vector)
#Apply square root to weights
root_weights = np.sqrt(weights)
#Apply root weights to y vectors
for k,series in enumerate(list_y_vectors):
list_y_vectors[k] = np.multiply(list_y_vectors[k],root_weights[k])
#Convert y vector list to tuple
tuple_y_vectors = tuple(list_y_vectors)
#Stack y vectors
y_tilde = np.concatenate(tuple_y_vectors, axis=0)
return y_tilde
def plot_comparison_autoregressive_model(testing_time_series, comparison_values,
num_forecast_steps,
index_start_forecast, x_start):
'''Code to be used if it is desired to include final testing data point in forecast
#Preallocate and connect comparison values to final testing time series point
comparison_values_impute = np.empty([len(comparison_values)+1])
comparison_values_impute[0] = testing_time_series[index_start_forecast-1]
o=1
for j,value in enumerate(comparison_values):
comparison_values_impute[o] = comparison_values[j]
o += 1
'''
#Preallocate testing time series to include values up until end of comparison values
time_series_compared = testing_time_series[0:(len(comparison_values)+index_start_forecast)]
plt.figure()
plt.plot(time_series_compared)
plt.plot(range(index_start_forecast,len(time_series_compared)), comparison_values)
plt.xlim(x_start, len(time_series_compared))
'''Code to be used if it is desired to include final testing data point in forecast
#Preallocate testing time series to include values up until end of comparison values
time_series_compared = testing_time_series[0:(len(comparison_values)+index_start_forecast)]
print(time_series_compared[-1])
plt.figure()
plt.plot(time_series_compared)
plt.plot(range(index_start_forecast-1,len(time_series_compared)),comparison_values_impute)
plt.xlim(x_start, len(time_series_compared))
'''
return plt.show()
#Read in numpy array of eeg data
eeg_data_original = np.load('eegdata.npy')
#Copy eeg data
eeg_data = np.copy(eeg_data_original)
#Declare training time series
MO_eeg_training_data = np.stack((eeg_data[0], eeg_data[10], eeg_data[20], eeg_data[30],
eeg_data[40], eeg_data[50], eeg_data[150], eeg_data[200],
eeg_data[250], eeg_data[300], eeg_data[350], eeg_data[400],
eeg_data[450], eeg_data[500], eeg_data[550]))
#Declare testing time series
eeg_testing_data = eeg_data[100]
#Declare objective function weighting (equal weighting)
MO_eeg_weights = np.ones(len(MO_eeg_training_data))
#Fit MO autoregressive model to training data
MO_eeg_parameters = main_MO_Autoregressive_model_fit(MO_eeg_training_data, MO_eeg_weights, 50)
#Forecast future values from testing data
MO_eeg_forecast = main_Autoregressive_model_forecast(eeg_testing_data, MO_eeg_parameters,
50, x_start = len(eeg_testing_data)-300)
#Difference eeg training time series
diff_MO_eeg_training_data = np.diff(MO_eeg_training_data)
#Declare testing time series
diff_eeg_testing_data = diff_eeg_data[100]
#Fit MO autoregressive model to training data
diff_MO_eeg_parameters = main_MO_Autoregressive_model_fit(diff_MO_eeg_training_data, MO_eeg_weights, 50)
#Forecast future values from testing data
diff_MO_eeg_forecast = main_Autoregressive_model_forecast(diff_eeg_testing_data, diff_MO_eeg_parameters,
50, x_start = len(eeg_testing_data)-100)
The Multi-Objective Autoregressive model may be fit on normalized equity price time series data. This may have utility in forecasting intraday equity prices, where off-hours pricing is unavailable or undesirable. The following data is MSFT minute-by-minute price data within trading hours (9:30am-4:00pm); training data is from March 9th-12th, 2020, and test data is from March 13th, 2020. Additional test data is provided from March 16th-20th.
#Read in numpy array of normalized equity price training data
norm_equity_training_data = np.load('normalized_equity_price_training_data.npy')
#Read in numpy array of equity price test data
equity_test_data = np.load('equity_price_test_data.npy')
#Read in numpy arrays of additional test data
add_equity_test_data = np.load('add_equity_price_test_data.npy')
#Declare objective function weighting (equal weighting)
MO_equity_weights = np.ones(len(norm_equity_training_data))
#Fit MO autoregressive model to training data
MO_equity_parameters_60 = main_MO_Autoregressive_model_fit(norm_equity_training_data,
MO_equity_weights, 60)
#Forecast future values from testing data
MO_equity_forecast = main_Autoregressive_model_forecast(equity_test_data,
MO_equity_parameters_60,
30, x_start = 0)
#Forecast future values from testing data
MO_equity_forecast_scaled = main_Autoregressive_model_forecast(equity_test_data,
MO_equity_parameters_60,
30, x_start = 320)
#Fit MO autoregressive model to training data
MO_equity_parameters_120 = main_MO_Autoregressive_model_fit(norm_equity_training_data,
MO_equity_weights, 120)
#Forecast future values from testing data
MO_equity_forecast = main_Autoregressive_model_forecast(equity_test_data,
MO_equity_parameters_120,
30, x_start = 0)
#Forecast future values from testing data
MO_equity_forecast_scaled = main_Autoregressive_model_forecast(equity_test_data,
MO_equity_parameters_120,
30, x_start = 320)
#Fit MO autoregressive model to training data
MO_parameters_60 = main_MO_Autoregressive_model_fit(norm_equity_training_data,
MO_equity_weights, 60)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(equity_test_data,
60, 120,
MO_parameters_60,
45,
x_start=100)
#Fit MO autoregressive model to training data
MO_parameters_120 = main_MO_Autoregressive_model_fit(norm_equity_training_data,
MO_equity_weights, 120)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(equity_test_data,
120, 120,
MO_parameters_120,
45, x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[0],
60, 120,
MO_parameters_60, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[0],
120, 120,
MO_parameters_120, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[1],
60, 120,
MO_parameters_60, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[1],
120, 120,
MO_parameters_120, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[2],
60, 120,
MO_parameters_60, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[2],
120, 120,
MO_parameters_120, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[3],
60, 120,
MO_parameters_60, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[3],
120, 120,
MO_parameters_120, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[4],
60, 120,
MO_parameters_60, 45,
x_start=100)
#Compare forecast results to test equity pricing data
MO_comparison = main_MO_Autoregressive_model_error(add_equity_test_data[4],
120, 120,
MO_parameters_120, 45,
x_start=100)