Please don’t accept if you don’t have or are not familiar with R Studio
This problem examines logistic regression. You will want to review the material on linear regression including the use of logistic regression in R Studio. The date needed is attached Below
Here is a video on how to handle parts a.i to a.iii
DataDeliverable:
RMD and the KNITed output
Run the code below and answer parts d and e.
Code
#NOTE: Prepared with R version 3.6.0
#set the working directory to appropriate folder on your machine, so as to access the data
#files.
#load the required librarie(s)/package(s) for this chapter
#Install the package(s) below once on your machine. To do so, uncomment the
#install.packages line(s) below.
NOTE: DO NOT INCLUDE INSTALL.PACKAGES IN .RMD FILE – YOU WILL GET AN ERROR WHEN YOU KNIT
#install.packages(“caret”)
#install.packages(“MASS”)
library(caret)
library(MASS)
## Financial Condition of Banks.
##The file Banks.csv includes data on a sample of 20 banks. The “Financial
##Condition” column records the judgment of an expert on the financial
##condition of each bank. This outcome variable takes one of two possible
##values-weak or strong-according to the financial condition of the bank. The
##predictors are two ratios used in the financial analysis of banks:
##TotLns&Lses/Assets is the ratio of total loans and leases to total assets
##and TotExp/Assets is the ratio of total expenses to total assets. The target
##is to use the two ratios for classifying the financial condition of a newbank.
##Run a logistic regression model (on the entire dataset) that models the
##status of a bank as a function of the two financial measures provided.
##Specify the success class as weak (this is similar to creating a dummy that
##is 1 for financially weak banks and 0 otherwise), and use the default cutoff
##value of 0.5.
#load the data
bank.df <- read.csv("banks.csv")
head(bank.df)
#fit logistic regression model and obtain the summary
reg<-glm(Financial.Condition ~ TotExp.Assets + TotLns.Lses.Assets,
data = bank.df, family = "binomial")
summary(reg)
reg$coefficients
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -14.721 6.675 -2.205 0.0274 *
#TotExp.Assets 89.834 47.781 1.880 0.0601 .
#TotLns.Lses.Assets 8.371 5.779 1.449 0.1474
##a Write the estimated equation that associates the financial condition
##of a bank with its two predictors in three formats:
NOTE: You may need to convert the formulas given here into a form that works in R
##a.i derive the logit values – The logit as a function of the predictors
#the function would be in this form
#logit = -14.721 + (89.834 * TotExp.Assets) + (8.371 * TotLns.Lses.Assets)
#PUT the above code in the correct form so that it will run
##a.ii The odds as a function of the predictors
#the function would be in this form
# Odds = e^(logit) = e^(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets)
#NOTE: You may need to convert the formulas given here into a form that works in R
#for instance substitute “e^” above the R function for e which is exp()
#PUT the above code in the correct form so that it will run
##a.iii The probability as a function of the predictors
#the function would be in this form
# p = 1/(1 + Exp(-(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets))))
#Convert the above code in the correct form so that it will run – you will need to change the names to match the
#column names in bank.df
#b Consider a new bank whose total loans and leases/assets ratio = 0.6
##and total expenses/assets ratio = 0.11.
#From your logistic regression model,
##estimate the following four quantities for this bank (use R to do all the
##intermediate calculations; show your final answers to four decimal places):
##the logit, the odds, the probability of being financially weak, and the
##classification of the bank (use cutoff = 0.5).
# new record logit value
# you can use matrix multiplication to determine the probability
#this will be same calculation as above
logit <- c(1, 0.11, 0.6) %*% reg$coefficients
#or you can use the logit formula above replacing the variables with 1, 0.11, and 0.6
#in this form: logit = -14.721 + (89.834 * TotExp.Assets) + (8.371 * TotLns.Lses.Assets)
odds <- exp(-logit)
prob <- 1/(1+odds)
prob
#show your results
#> prob
# [,1]
#[1,] 0.5457504
#probability that the new bank is 0.5457 and therefore the predicted class
#for this new bank is 1, or “financially week”.
##c The cutoff value of 0.5 can be used in conjunction with the probability of
##being financially weak. Compute the threshold that should be used if we want
##to make a classification based on the odds of being financially weak, and
##the threshold for the corresponding logit.
###Convert and RUN thhe following code using Cutoff value of p=0.5.
#first determine the based on the probability of 0.5 which is based on the cutoff value
#Odds = (p) / (1-p) = (0.5) / (1-0.5) = 1
(0.6)/(1-0.6)
#If odds > 1 then classify financial status as “weak” (otherwise classify as
#”strong”).
#now determine the Logit value which is the log of the odds
#Logit = ln (odds) = ln (1) = 0
#If Logit > 0 then classify financial status as “weak” (otherwise, classify it
#as “strong”)
#YOU SHOULD GET THIS CONCLUSION
#Therefore, a cutoff of 0.5 on the probability of being weak is equivalent to a
#threshold of 1 on the odds of being weak, and to a threshold of 0 on the logit.
##d Interpret the estimated coefficient for the total loans & leases to
##total assets ratio (TotLns&Lses/Assets) in terms of the odds of being
##financially weak.
#look at how we determine the odds How does the 8.3712 impact the odds?
#in other words assume we only have the (TotLns&Lses/Assets) ratio
#zero out the -14.7207 and the 89.8321
#this will tell the impact of (TotLns&Lses/Assets)
#so if (TotLns&Lses/Assets)= 1 then what is the effect on the odds?
# Odds = e^(logit) = e^(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets)
#USING THE EQUATION DETERMINE THE EFFECT ON THE ODD
PUT your answer here
##e When a bank that is in poor financial condition is misclassified as
##financially strong, the misclassification cost is much higher than when a
##financially strong bank is misclassified as weak. To minimize the expected
##cost of misclassification, should the cutoff value for classification
##(which is currently at 0.5) be increased or decreased?
PUT your answer here.