Stock Selection Model
In your first step, you should have done for these parts:
Use the WFE Testing. PDF as the check list
(1) Fundamental Research into the companies: Ratios and Ownership
(2) Descriptive Statistics
(3) Stationarity Check + AR models to check autocorrelations in residuals
(4) 5-6 Statistical Testing
(5) Construct Linear Regression models and check assumptions
The highlighted parts are added this time:
(6) Use the first 90% of the data as training to predict the 10%:
The most important step is the final steps for the prediction.
The final prediction method should be selected from ARIMA model, Linear State Space Model(OPTIONAL) and VARMA models(OPTIONAL).
Target Firm: The same with your first task
The whole idea is to your analysis to this company very comprehensive.
Some References
ARIMA model
ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is actually a class of models that ‘explains’ a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.
Any ‘non-seasonal’ time series that exhibits patterns and is not a random white noise can be modeled with ARIMA models.
An ARIMA model is characterized by 3 terms: p, d, q
where,
p is the order of the AR term
q is the order of the MA term
d is the number of differencing required to make the time series stationary
If a time series, has seasonal patterns, then you need to add seasonal terms and it becomes SARIMA, short for ‘Seasonal ARIMA’. More on that once we finish ARIMA.
So, what does the ‘order of AR term’ even mean? Before we go there, let’s first look at the ‘d’ term.
You can have the reference to the website
ARIMA Model – Complete Guide to Time Series Forecasting in Python
Linear State Space Model (OPTIONAL)
Its many applications include:
representing dynamics of higher-order linear systems
predicting the position of a system j steps into the future
predicting a geometric sum of future values of a variable like
You can have the reference to the website
https://python.quantecon.org/linear_models.html
VARMA models (OPTIONAL)
You can have the reference to the file
https://www.economics-sociology.eu/files/12_Simionescu_1_7.pdf
Your submission list
Original Code – you could use R or Python
Final PPT
Here are example codes for the two steps, please find this attachment and work on it.
This is on AVTI.
#step one
import pandas as pd
import numpy as np
from matplotlib import pyplot
ATVI = pd.read_csv(“C:/Users/Administrator/Downloads/ATVI.csv”,parse_dates=[‘Date’])
ATVI[‘Return’] = (ATVI[‘Adj Close’].pct_change())
ATVI[‘Logrt’] =np.log(ATVI[‘Adj Close’]/ATVI[‘Adj Close’].shift(1))
ATVI=ATVI.drop(ATVI.index[0])
ATVI.plot(x = ‘Date’,y = ‘Return’)
ATVI.plot(x = ‘Date’, y = ‘Adj Close’)
pyplot.show()
#step two
from scipy.stats import kurtosis
from scipy.stats import skew
ATVImean = np.mean(ATVI[‘Return’])
ATVIvar = np.var(ATVI[‘Return’])
ATVImax = np.max(ATVI[‘Return’])
ATVImin = np.min(ATVI[‘Return’])
ATVIstd = np.std(ATVI[‘Return’])
ATVIkur = kurtosis(ATVI[‘Return’])
ATVIskew = skew(ATVI[‘Return’])
ATVInew = ATVI[[‘Date’,’Return’]].copy()
print(‘mean’,ATVImean)
print(‘var’,ATVIvar)
print(‘max’,ATVImax)
print(‘min’,ATVImin)
print(‘std’,ATVIstd)
print(‘kur’,ATVIkur)
print(‘skew’,ATVIskew)
#from matplotlib import pyplot
#from pandas.plotting import lag_plot
#lag_plot(ATVInew)
#pyplot.show()
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
values = DataFrame(ATVInew.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = [‘t1′,’t-1′,’t2’, ‘t+1’]
dataframe = dataframe.drop(‘t1’,axis = 1)
dataframe = dataframe.drop(‘t2’,axis = 1)
dataframe=dataframe.drop(dataframe.index[0])
dataframe = dataframe.astype(float)
rst = dataframe.corr()
print(rst)
ATVInew = ATVInew[‘Return’].astype(float)
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(ATVInew)
pyplot.show()
from pandas.plotting import lag_plot
lag_plot(ATVInew)
pyplot.show()
“””
Autoregressive (AR) Model
“””
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
#from statsmodels.tsa.arima_model import ARMA
# Detect ACF&PACF
values.columns = [‘Date’, ‘Value’]
fig, axes = pyplot.subplots(1,2,figsize=(16,3), dpi= 100)
plot_acf(values[“Value”].tolist(), lags=50, ax=axes[0])
plot_pacf(values[“Value”].tolist(), lags=50, ax=axes[1])
#Data processing – take out the first value of NaN
import statsmodels.api as sm
y = values[‘Value’].values
y=np.array(y, dtype=float)
X = values[[‘Value’]].shift(1).values
X = sm.add_constant(X)
X=np.array(X, dtype=float)
X=np.delete(X,0,0)
y=np.delete(y,0,0)
# train autoregression AR(1)
model = sm.OLS(y,X)
#mod= ARMA()
results = model.fit()
print(‘AR’)
print(results.summary())
“””
Regression:Day of the Week Effect
“””
#Create day dummy
names = [‘D1’, ‘D2’, ‘D3’, ‘D4’, ‘D5’]
for i, x in enumerate(names):
values[x] = (values[‘Date’].dt.dayofweek == i).astype(int)
print (values.head(n=5))
# Regression – Seasonality
X2=values[[‘D1′,’D2′,’D3′,’D4′,’D5’]].values
X2=np.delete(X2,0,0)
model = sm.OLS(y,X2)
results = model.fit()
print(results.summary())
“””
Regression:OLS Assumption Tests
“””
#get residuals
def calculate_residuals(model, features, label):
“””
Creates predictions on the features with the model and calculates residuals
“””
predictions = results.predict(features)
df_results = pd.DataFrame({‘Actual’: label, ‘Predicted’: predictions})
df_results[‘Residuals’] = abs(df_results[‘Actual’]) – abs(df_results[‘Predicted’])
return df_results
calculate_residuals(model, X2, y)
“””
Assumption 2: Normality of the Error Terms
except the way below, Regression output showed Prob(JB) <0.05, reject the null hypothesis that it is normally distributed
"""
import seaborn as sns
def normal_errors_assumption(model, features, label, p_value_thresh=0.05):
from statsmodels.stats.diagnostic import normal_ad
print('Assumption 2: The error terms are normally distributed', 'n')
# Calculating residuals for the Anderson-Darling test
df_results = calculate_residuals(model, features, label)
print('Using the Anderson-Darling test for normal distribution')
# Performing the test on the residuals
p_value = normal_ad(df_results['Residuals'])[1]
print('p-value from the test - below 0.05 generally means non-normal:', p_value)
# Reporting the normality of the residuals
if p_value < p_value_thresh:
print('Residuals are not normally distributed')
else:
print('Residuals are normally distributed')
# Plotting the residuals distribution
pyplot.subplots(figsize=(12, 6))
pyplot.title('Distribution of Residuals')
sns.distplot(df_results['Residuals'])
pyplot.show()
print()
if p_value > p_value_thresh:
print(‘Assumption satisfied’)
else:
print(‘Assumption not satisfied’)
print()
print(‘Confidence intervals will likely be affected’)
print(‘Try performing nonlinear transformations on variables’)
normal_errors_assumption(model, X2, y)
“””
Assumption 3: No Autocorrelation
Performing Durbin-Watson Test:
Values of 1.5 < d < 2.5 generally show that there is no autocorrelation in the data
0 to 2< is positive autocorrelation
>2 to 4 is negative autocorrelation
Durbin-Watson: 2.115
Conclusion: Little to no autocorrelation
Assumption satisfied
“””
“””
Assumption 4:Homoscedasticity
“””
def homoscedasticity_assumption(model, features, label):
“””
Homoscedasticity: Assumes that the errors exhibit constant variance
“””
print(‘Assumption 5: Homoscedasticity of Error Terms’, ‘n’)
print(‘Residuals should have relative constant variance’)
# Calculating residuals for the plot
df_results = calculate_residuals(model, features, label)
# Plotting the residuals
pyplot.subplots(figsize=(12, 6))
ax = pyplot.subplot(111) # To remove spines
pyplot.scatter(x=df_results.index, y=df_results.Residuals, alpha=0.5)
pyplot.plot(np.repeat(0, df_results.index.max()), color=’darkorange’, linestyle=’–‘)
ax.spines[‘right’].set_visible(False) # Removing the right spine
ax.spines[‘top’].set_visible(False) # Removing the top spine
pyplot.title(‘Residuals’)
pyplot.show()
homoscedasticity_assumption(model, X2, y)
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount