Target Firm: DocuSign, Inc.
Week 1 – Task 1 Time Series Quant Model Analysis
A quantitative analyst uses quantitative methods to help us make business and financial decisions. As a broker dealer firm, we provide advanced suggestions to our investors in the capital market. For “quants,” the goal is to help them identify profitable investment opportunities. The Goal for your work is to improve our model for historical quantitative analysis for certain stocks so we can apply this model to every stock and industry. Please follow the model of the example report and the MATLAB, or R/Python code. BUT these files are only references. For your strong academic background, we wish you can make this model more useful. The purpose for this model and your work is to improve our model for historical quantitative stock analysis and we can analyze more stocks and industry in a faster pace and help our analyst for technical analysis. We have provided you code examples for three different software, R Python and Matlab.
Week 1 – Task 1 Detailed Instruction
Submission List:
Matlab, R, Python code and Quantitative Data Analysis (PPT)
Analysis Areas:
Test of Weak Form Efficiency
In this part of the project, you will need to perform few tests for the weak form efficiency on the stocks and/or the market indices and discuss the empirical findings.
Step 1: Obtain the Data
You need to obtain daily data for 2 – 3 stocks or market indices that is related to THE COMPANY over a span of at least 10 to 20 years.
So you need to find at least one or two competitor of Twilio. (You may obtain the data from Datastream, EIA, or Yahoo Finance. )
You can compute the simple return as:
Rt = [(Yt -Yt-1 ) / Yt-1 ]*100
where is return at time t, Yt is the Adjusted close price at time t and Yt-1 is the Adjusted close price at time t-1
Step 2: Tests of Weak Form Efficiency
You should read about EMH to get guidance about how to interpret and comment on the results of the weak-form efficiency tests as applied to your data.
1) Descriptive Statistics and return distributions
Begin by analyzing the data using the summary statistics (e.g., mean, median, max, min, skewness, kurtosis etc.) in a Table. Provide a few (may be 1 or 2) plots of time series of returns of the data and a histogram of the return distribution for your stocks and indices. Using appropriate wording you should comment on the statistical and economic (if any) interpretation of your data.
2) Autoregressive (AR) Model
You can test to see if the return data in your sample follow a random walk using the AR (1) model:
Rt= σ+ β1 Rt-1 + ℇt
where the dependent variable Rt is the return for the time t, and the independent variable Rt-1 is the return lagged one period for time t-1 .
You can create a one-period lag of return either manually in excel or you simply do that in SPSS, for instance, by using the LAG function under the Transform>Compute variables menu.
You should report and discuss your results.
3) Day of the Week Effect
In this part, you will test and see if the data show any seasonal effects over days of the week.
RT=β1 D1,t + β2 D2,t + β3 D3,t + β4 D4,t + β5 D5,t + ℇt
Note that there is no constant in the above regression; if you want to include a constant you can have only 4 dummy variables. Again, you should report and comment on your findings.
Task 1: Test of Weak Form EMH/Random Walk Model { Use BP(example firm: British Petroleum) Monthly Return Data }
(1) Create BP_1 Variable for t-1 {Hints: Transform….create time series…}
(2) Compute stock return of BP (BP_Ret) {Transform….Compute Variable ….((BP-BP_1)/BP_1)*100
(3) Create a Lag Variable for BP_Ret_1 for t-1 {Hints: Transform…Create time series…}
(4) Compute Logarithmic stock return of BP (BP_LogRet) {Transform…Compute Variable…(Ln(BP)-LN(BP_1)*100
(5) Do a formal test of independence to test H0 : No Autocorrelation { Hints: Analyse…Forecasting …Autocorrelation…Variables (BP_Return), Display (Autocorrelation and Partial Autocorrelation). Can you reject H0 : No Autocorrelation (e.g Reject H0 if p-value is statistically significant)? How do you interpret the results?
(6) Repeat question 5 with BP share prices (instead of BP_Return).Do you observe any difference ?
(7) Estimate Autoregressive 1 (AR1) Model with and without constant term). Do the results tell you anything about the random walk model?
(8) In order to test Weak Form EMH , estimate the following AR(1) Model with the BP_Return (with constant term). Can you reject H0 : β=0 (e.g. Reject H0 if p-value is statistically significant)?
Do the result tell you anything about the EMH/ random walk model?
Ri= σ+ β1 Rt-1 + ℇt…….(1)
(9) Draw scatter plot and line charts to find evidence of Weak Form EMH/Random Walk Model.
Detecting Seasonality in Financial Markets (Day-of-the-Week Effect) {use BP daily price}
(1) Convert DATES into DAYS: Transform….Date & Time Wizard….Extract a part of a date or time variable…Date or time (DATE) and unit to extract (DAY OF WEEK)…Finish.
(2) Convert DAYS into NUMBERS (e.g., Mon=1, Fri=5): Transform…Automatic Recode…..choose New Variable name (DAYS2)….Recode starting from lowest to highest ….Use the same recoding scheme for all variables…}
(3) Create DUMMY Variables from DAYS2 (D1 FOR Monday,D2 for Tuesday,D3 for Wednesday,D4 for Thursday and D5 for Friday): Transform….Recode into different variable…Use Old & New Value option…….
(4) In order to investigate the seasonal effect on financial markets (calendar anomalies),estimate the following OLS regression (or Dummy Variable regression) model to test `Day-of-the-Week-Effect` (WITHOUT CONSTANT):
RT=β1 D1,t + β2 D2,t + β3 D3,t + β4 D4,t + β5 D5,t + ℇt
Where,
D1 = 1 if the return is on a Monday and 0 otherwise
D2 = 1 if the return is on a Tuesday and 0 otherwise
D3 = 1 if the return is on a Wednesday and 0 otherwise
D4 = 1 if the return is on a Thursday and 0 otherwise
D5 = 1 if the return is on a Friday and 0 otherwise
(5) Estimate the Following OLS regression (or Dummy Variable regression) model to test if `Monday return` is statistically significant than other days of the week. State Null and alternative hypotheses.
RT=β1 D1,t + β2 D2,t + β3 D3,t + β4 D4,t + β5 D5,t + ℇt
For the following questions, you need to create a WORD document that lists your finds for all of them. So what you have to submit is based on the following questions.
Interpret your response to the follow questions:
(i) Goodness of fit: how well does the model describe the relationship between the variables?{ Look at the adjusted R2 value}.
(ii) Overall model significance: is there a linear relationship between all X variables taken together with Y ? is the ration of explained and unexplained variances greater than 1?{ look at the F-statistic, together with the level of significance from the ANOVA table.
Note that the larger the R2 value, the greater the F value.}
(iii) Do the assumptions underlying the model (e.g., OLS regression) hold?
(a) SAVE predicted and residual values
(b) E(u)=0: the errors have zero mean {Look at the `residual statistics` table)
(c) Cov(ui,uj)=0 : The errors are linearly independent of one another e.g., no autocorrelation {Look at the Durbin-Watson/DW value from `Model Summary` table. If the DW statistic is close to 2, we CAN`T reject ` H0 : No serial correlation`. This means the errors are independent }
(d) Var(u)=σ2 : The Variance of the errors is constant/Homoskedastic/ equal e.g, No Heteroskedastity { Draw scatter plot: Y= Standardized residuals and X= Dependent Variable. IF you see that the errors increase or decrease with an increase of the dependent variable, you`ve got heteroskedascity problem}.
(e) Cov(u,x) =0; There is no relationship between the error and corresponding X variable {Run correlation and covariance between X and residual values. Tips: Analyse…..Correlation…Bivariate…Option…}
(f) U~N(0,σ2): The errors are normally distributed {Draw Histogram for the Unstandardized residuals. Do tests of Normality (Descriptive statistics…explore…dependent list: unstandardized residuals….Plots…normality plots with tests…).if p-values are high/insignificant, we CAN`T reject H0 : Normality. This means the errors are normally distributed}
(g) Is there any multicollinearity problem:? {(i) Look at the correlation matrix to confirm that there is no linear relationship between independent variables (ii) Check tolerance and VIF: Analyse ….Regression…linear….Statistics (check collinearity diagnostics). The tolerance should be above 0.10 and VIF should be below 10}
(h) Model specification: (a) correct functional form (b) no omission of important variables (c) no inclusion of irrelevant variables.
Step 2.
Quantitative Analysis PowerPoint Slides Requirement (Use the PPT example as a reference. It is an example that we did for Twilio.)
Part 1 – Executive Summary (1-2 Slides)
Summarize your major findings based on your target companies.
What is the industry?
What are the company historical highlights?
History (Create a timeline)
Strategy (Overall and recent)
Objectives (Operational and Strategical)
Historical Finance Transactions (M&A and Long-term Investment)
What does the historical data of the company tells you?
Summarize each category into 2-3 sentences.
Part 2 – Major Competitors (1-2 Slides)
Notes: List the name of the competitors and their major businesses.
What are their characteristics?
How are they doing financially and operationally?
Part 3 – Financial Highlights (1-2 Slides)
Search for average ratios in the industry on Marketwatch:
PE Multiple, EV/EBITDA, PBV, ROA, ROE
Compare your company to the industrial average.
Please draw your conclusions based on the formula and principles.
Part 4 – Quantitative Analysis (5-8 Slides)
List your model and explain the following questions. (Transform your report to slides)
Do not show the Screen shots of your graphs, please show more about your findings and conclusions.
(i) Goodness of fit: how well does the model describe the relationship between the variables?{ Look at the adjusted R2 value}.
(ii) Overall model significance: is there a linear relationship between all X variables taken together with Y ? is the ration of explained and unexplained variances greater than 1?{ look at the F-statistic, together with the level of significance from the ANOVA table.
Note that the larger the R2 value, the greater the F value.}
(iii) Do the assumptions underlying the model (e.g., OLS regression) hold?
(a) SAVE predicted and residual values
(b) E(u)=0: the errors have zero mean {Look at the `residual statistics` table)
(c) Cov(ui,uj)=0 : The errors are linearly independent of one another e.g., no autocorrelation {Look at the Durbin-Watson/DW value from `Model Summary` table. If the DW statistic is close to 2, we CAN`T reject ` H0 : No serial correlation`. This means the errors are independent }
(d) Var(u)=σ2 : The Variance of the errors is constant/Homoskedastic/ equal e.g, No Heteroskedastity { Draw scatter plot: Y= Standardized residuals and X= Dependent Variable. IF you see that the errors increase or decrease with an increase of the dependent variable, you`ve got heteroskedascity problem}.
(e) Cov(u,x) =0; There is no relationship between the error and corresponding X variable {Run correlation and covariance between X and residual values. Tips: Analyse…..Correlation…Bivariate…Option…}
(f) U~N(0,σ2): The errors are normally distributed {Draw Histogram for the Unstandardized residuals. Do tests of Normality (Descriptive statistics…explore…dependent list: unstandardized residuals….Plots…normality plots with tests…).if p-values are high/insignificant, we CAN`T reject H0 : Normality. This means the errors are normally distributed}
(g) Is there any multicollinearity problem:? {(i) Look at the correlation matrix to confirm that there is no linear relationship between independent variables (ii) Check tolerance and VIF: Analyse ….Regression…linear….Statistics (check collinearity diagnostics). The tolerance should be above 0.10 and VIF should be below 10}
(h) Model specification: (a) correct functional form (b) no omission of important variables (c) no inclusion of irrelevant variables.
EXAMPLE PYTHON CODES
Here are example PYTHON codes for the two steps, please find this attachment and work on it.
This is on AVTI.
#step one
import pandas as pd
import numpy as np
from matplotlib import pyplot
ATVI = pd.read_csv(“C:/Users/Administrator/Downloads/ATVI.csv”,parse_dates=[‘Date’])
ATVI[‘Return’] = (ATVI[‘Adj Close’].pct_change())
ATVI[‘Logrt’] =np.log(ATVI[‘Adj Close’]/ATVI[‘Adj Close’].shift(1))
ATVI=ATVI.drop(ATVI.index[0])
ATVI.plot(x = ‘Date’,y = ‘Return’)
ATVI.plot(x = ‘Date’, y = ‘Adj Close’)
pyplot.show()
#step two
from scipy.stats import kurtosis
from scipy.stats import skew
ATVImean = np.mean(ATVI[‘Return’])
ATVIvar = np.var(ATVI[‘Return’])
ATVImax = np.max(ATVI[‘Return’])
ATVImin = np.min(ATVI[‘Return’])
ATVIstd = np.std(ATVI[‘Return’])
ATVIkur = kurtosis(ATVI[‘Return’])
ATVIskew = skew(ATVI[‘Return’])
ATVInew = ATVI[[‘Date’,’Return’]].copy()
print(‘mean’,ATVImean)
print(‘var’,ATVIvar)
print(‘max’,ATVImax)
print(‘min’,ATVImin)
print(‘std’,ATVIstd)
print(‘kur’,ATVIkur)
print(‘skew’,ATVIskew)
#from matplotlib import pyplot
#from pandas.plotting import lag_plot
#lag_plot(ATVInew)
#pyplot.show()
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
values = DataFrame(ATVInew.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = [‘t1′,’t-1′,’t2’, ‘t+1’]
dataframe = dataframe.drop(‘t1’,axis = 1)
dataframe = dataframe.drop(‘t2’,axis = 1)
dataframe=dataframe.drop(dataframe.index[0])
dataframe = dataframe.astype(float)
rst = dataframe.corr()
print(rst)
ATVInew = ATVInew[‘Return’].astype(float)
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(ATVInew)
pyplot.show()
from pandas.plotting import lag_plot
lag_plot(ATVInew)
pyplot.show()
“””
Autoregressive (AR) Model
“””
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
#from statsmodels.tsa.arima_model import ARMA
# Detect ACF&PACF
values.columns = [‘Date’, ‘Value’]
fig, axes = pyplot.subplots(1,2,figsize=(16,3), dpi= 100)
plot_acf(values[“Value”].tolist(), lags=50, ax=axes[0])
plot_pacf(values[“Value”].tolist(), lags=50, ax=axes[1])
#Data processing – take out the first value of NaN
import statsmodels.api as sm
y = values[‘Value’].values
y=np.array(y, dtype=float)
X = values[[‘Value’]].shift(1).values
X = sm.add_constant(X)
X=np.array(X, dtype=float)
X=np.delete(X,0,0)
y=np.delete(y,0,0)
# train autoregression AR(1)
model = sm.OLS(y,X)
#mod= ARMA()
results = model.fit()
print(‘AR’)
print(results.summary())
“””
Regression:Day of the Week Effect
“””
#Create day dummy
names = [‘D1’, ‘D2’, ‘D3’, ‘D4’, ‘D5’]
for i, x in enumerate(names):
values[x] = (values[‘Date’].dt.dayofweek == i).astype(int)
print (values.head(n=5))
# Regression – Seasonality
X2=values[[‘D1′,’D2′,’D3′,’D4′,’D5’]].values
X2=np.delete(X2,0,0)
model = sm.OLS(y,X2)
results = model.fit()
print(results.summary())
“””
Regression:OLS Assumption Tests
“””
#get residuals
def calculate_residuals(model, features, label):
“””
Creates predictions on the features with the model and calculates residuals
“””
predictions = results.predict(features)
df_results = pd.DataFrame({‘Actual’: label, ‘Predicted’: predictions})
df_results[‘Residuals’] = abs(df_results[‘Actual’]) – abs(df_results[‘Predicted’])
return df_results
calculate_residuals(model, X2, y)
“””
Assumption 2: Normality of the Error Terms
except the way below, Regression output showed Prob(JB) <0.05, reject the null hypothesis that it is normally distributed
"""
import seaborn as sns
def normal_errors_assumption(model, features, label, p_value_thresh=0.05):
from statsmodels.stats.diagnostic import normal_ad
print('Assumption 2: The error terms are normally distributed', 'n')
# Calculating residuals for the Anderson-Darling test
df_results = calculate_residuals(model, features, label)
print('Using the Anderson-Darling test for normal distribution')
# Performing the test on the residuals
p_value = normal_ad(df_results['Residuals'])[1]
print('p-value from the test - below 0.05 generally means non-normal:', p_value)
# Reporting the normality of the residuals
if p_value < p_value_thresh:
print('Residuals are not normally distributed')
else:
print('Residuals are normally distributed')
# Plotting the residuals distribution
pyplot.subplots(figsize=(12, 6))
pyplot.title('Distribution of Residuals')
sns.distplot(df_results['Residuals'])
pyplot.show()
print()
if p_value > p_value_thresh:
print(‘Assumption satisfied’)
else:
print(‘Assumption not satisfied’)
print()
print(‘Confidence intervals will likely be affected’)
print(‘Try performing nonlinear transformations on variables’)
normal_errors_assumption(model, X2, y)
“””
Assumption 3: No Autocorrelation
Performing Durbin-Watson Test:
Values of 1.5 < d < 2.5 generally show that there is no autocorrelation in the data
0 to 2< is positive autocorrelation
>2 to 4 is negative autocorrelation
Durbin-Watson: 2.115
Conclusion: Little to no autocorrelation
Assumption satisfied
“””
“””
Assumption 4:Homoscedasticity
“””
def homoscedasticity_assumption(model, features, label):
“””
Homoscedasticity: Assumes that the errors exhibit constant variance
“””
print(‘Assumption 5: Homoscedasticity of Error Terms’, ‘n’)
print(‘Residuals should have relative constant variance’)
# Calculating residuals for the plot
df_results = calculate_residuals(model, features, label)
# Plotting the residuals
pyplot.subplots(figsize=(12, 6))
ax = pyplot.subplot(111) # To remove spines
pyplot.scatter(x=df_results.index, y=df_results.Residuals, alpha=0.5)
pyplot.plot(np.repeat(0, df_results.index.max()), color=’darkorange’, linestyle=’–‘)
ax.spines[‘right’].set_visible(False) # Removing the right spine
ax.spines[‘top’].set_visible(False) # Removing the top spine
pyplot.title(‘Residuals’)
pyplot.show()
homoscedasticity_assumption(model, X2, y)
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount