Data Analytics Archives - Page 5 of 31

Please don’t accept if you don’t have or are not familiar with R Studio This pr

Posted on August 20, 2024 | by Linus | Leave a Comment

Please don’t accept if you don’t have or are not familiar with R Studio
This problem examines logistic regression. You will want to review the material on linear regression including the use of logistic regression in R Studio. The date needed is attached Below
Here is a video on how to handle parts a.i to a.iii

DataDeliverable:
RMD and the KNITed output
Run the code below and answer parts d and e.
Code
#NOTE: Prepared with R version 3.6.0
#set the working directory to appropriate folder on your machine, so as to access the data
#files.
#load the required librarie(s)/package(s) for this chapter
#Install the package(s) below once on your machine. To do so, uncomment the
#install.packages line(s) below.
NOTE: DO NOT INCLUDE INSTALL.PACKAGES IN .RMD FILE – YOU WILL GET AN ERROR WHEN YOU KNIT
#install.packages(“caret”)
#install.packages(“MASS”)
library(caret)
library(MASS)
## Financial Condition of Banks.
##The file Banks.csv includes data on a sample of 20 banks. The “Financial
##Condition” column records the judgment of an expert on the financial
##condition of each bank. This outcome variable takes one of two possible
##values-weak or strong-according to the financial condition of the bank. The
##predictors are two ratios used in the financial analysis of banks:
##TotLns&Lses/Assets is the ratio of total loans and leases to total assets
##and TotExp/Assets is the ratio of total expenses to total assets. The target
##is to use the two ratios for classifying the financial condition of a newbank.
##Run a logistic regression model (on the entire dataset) that models the
##status of a bank as a function of the two financial measures provided.
##Specify the success class as weak (this is similar to creating a dummy that
##is 1 for financially weak banks and 0 otherwise), and use the default cutoff
##value of 0.5.
#load the data
bank.df <- read.csv("banks.csv") head(bank.df) #fit logistic regression model and obtain the summary reg<-glm(Financial.Condition ~ TotExp.Assets + TotLns.Lses.Assets, data = bank.df, family = "binomial") summary(reg) reg$coefficients #Coefficients: # Estimate Std. Error z value Pr(>|z|)
#(Intercept) -14.721 6.675 -2.205 0.0274 *
#TotExp.Assets 89.834 47.781 1.880 0.0601 .
#TotLns.Lses.Assets 8.371 5.779 1.449 0.1474
##a Write the estimated equation that associates the financial condition
##of a bank with its two predictors in three formats:
NOTE: You may need to convert the formulas given here into a form that works in R
##a.i derive the logit values – The logit as a function of the predictors
#the function would be in this form
#logit = -14.721 + (89.834 * TotExp.Assets) + (8.371 * TotLns.Lses.Assets)
#PUT the above code in the correct form so that it will run
##a.ii The odds as a function of the predictors
#the function would be in this form
# Odds = e^(logit) = e^(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets)
#NOTE: You may need to convert the formulas given here into a form that works in R
#for instance substitute “e^” above the R function for e which is exp()
#PUT the above code in the correct form so that it will run
##a.iii The probability as a function of the predictors
#the function would be in this form
# p = 1/(1 + Exp(-(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets))))
#Convert the above code in the correct form so that it will run – you will need to change the names to match the
#column names in bank.df
#b Consider a new bank whose total loans and leases/assets ratio = 0.6
##and total expenses/assets ratio = 0.11.
#From your logistic regression model,
##estimate the following four quantities for this bank (use R to do all the
##intermediate calculations; show your final answers to four decimal places):
##the logit, the odds, the probability of being financially weak, and the
##classification of the bank (use cutoff = 0.5).
# new record logit value
# you can use matrix multiplication to determine the probability
#this will be same calculation as above
logit <- c(1, 0.11, 0.6) %*% reg$coefficients #or you can use the logit formula above replacing the variables with 1, 0.11, and 0.6 #in this form: logit = -14.721 + (89.834 * TotExp.Assets) + (8.371 * TotLns.Lses.Assets) odds <- exp(-logit) prob <- 1/(1+odds) prob #show your results #> prob
# [,1]
#[1,] 0.5457504
#probability that the new bank is 0.5457 and therefore the predicted class
#for this new bank is 1, or “financially week”.
##c The cutoff value of 0.5 can be used in conjunction with the probability of
##being financially weak. Compute the threshold that should be used if we want
##to make a classification based on the odds of being financially weak, and
##the threshold for the corresponding logit.
###Convert and RUN thhe following code using Cutoff value of p=0.5.
#first determine the based on the probability of 0.5 which is based on the cutoff value
#Odds = (p) / (1-p) = (0.5) / (1-0.5) = 1
(0.6)/(1-0.6)
#If odds > 1 then classify financial status as “weak” (otherwise classify as
#”strong”).
#now determine the Logit value which is the log of the odds
#Logit = ln (odds) = ln (1) = 0
#If Logit > 0 then classify financial status as “weak” (otherwise, classify it
#as “strong”)
#YOU SHOULD GET THIS CONCLUSION
#Therefore, a cutoff of 0.5 on the probability of being weak is equivalent to a
#threshold of 1 on the odds of being weak, and to a threshold of 0 on the logit.
##d Interpret the estimated coefficient for the total loans & leases to
##total assets ratio (TotLns&Lses/Assets) in terms of the odds of being
##financially weak.
#look at how we determine the odds How does the 8.3712 impact the odds?
#in other words assume we only have the (TotLns&Lses/Assets) ratio
#zero out the -14.7207 and the 89.8321
#this will tell the impact of (TotLns&Lses/Assets)
#so if (TotLns&Lses/Assets)= 1 then what is the effect on the odds?
# Odds = e^(logit) = e^(-14.7207 + (89.8321 * TotExp/Assets) +
# (8.3712 * TotLns&Lses/Assets)
#USING THE EQUATION DETERMINE THE EFFECT ON THE ODD
PUT your answer here
##e When a bank that is in poor financial condition is misclassified as
##financially strong, the misclassification cost is much higher than when a
##financially strong bank is misclassified as weak. To minimize the expected
##cost of misclassification, should the cutoff value for classification
##(which is currently at 0.5) be increased or decreased?
PUT your answer here.

Please don’t accept if you don’t have or are not familiar with R Studio This pr

Posted on August 20, 2024 | by Linus | Leave a Comment

Exerice 3 Phayton In this exercise, you will explore the use of programming lang

Posted on August 19, 2024 | by Linus | Leave a Comment

Exerice 3 Phayton
In this exercise, you will explore the use of programming language to carry out statistical analysis
Carry out the statistics parts
There are two r files for this lecture
classPrt1stat.R. https://drive.google.com/file/d/1M7hy-o2myh-GCuQjd…
classPrt2stat.R. https://drive.google.com/file/d/1qrgkPaX5DgYPaF8A6…
Create two .rmd files for each and
the output from the knit for each.
Exercise 4 Exercise2 R coding
This assignment is for learning some of the basic operations in R
You will want to execute the following R script in R studio.
NOTE: There is a section for you to add your own code please do so.
Here is the r code https://drive.google.com/file/d/1Tpr_aTOwdL0iS2h6i…
Doc version attached below
note: you can run this by going to File in R Studio and choosing to open a file.
You will need to separate the code into Chunks for the rmd file. It is up to you as to how.
Deliverables:
Output from KNIT
.RMD file

Please see the attached documents to complete the assignment. Textbook (Not Requ

Posted on June 20, 2024 | by Linus | Leave a Comment

Please see the attached documents to complete the assignment.
Textbook (Not Required but recommended): Cost-Benefit Analysis: Concepts and Practice (4th Edition) by Anthony E. Boardman, David H. Greenberg, Aidan R. Vining, and David L. Weimer, Prentice Hall, Inc., Upper Saddle, NJ, 2011 (ISBN-10: 0137002696). Include additional resources.

Please see the attached documents to complete the assignment. Textbook (Not Requ

Posted on June 20, 2024 | by Linus | Leave a Comment

Check attachments for instructions for data analysis 3. I have completed data an

Posted on June 20, 2024 | by Linus | Leave a Comment

Check attachments for instructions for data analysis 3. I have completed data analysis 1 and 2 if you might need what was done you can ask.

Check attachments for instructions for data analysis 3. I have completed data an

Posted on June 20, 2024 | by Linus | Leave a Comment

Check attachments for instructions for data analysis 3. I have completed data analysis 1 and 2 if you might need what was done you can ask.

The Final Project consists of Lucida Inc company dataset. You are to READ the ca

Posted on June 17, 2024 | by Linus | Leave a Comment

The Final Project consists of Lucida Inc company dataset. You are to READ the case context and use the Lucida_Employee_Dataset presented to answer the questions that follow.
There are eleven (11) instructions and questions. Please complete all of them and present your answers as separate worksheets and in text boxes (for text answers) as they may apply. Use as many worksheets as possible. Highlight your answers as much as possible. Please, maintain the numbering system for easy follow up.
You are permitted to use Google Scholar to support your work with references (as applicable).

The Final Project consists of Lucida Inc company dataset. You are to READ the ca

Posted on June 17, 2024 | by Linus | Leave a Comment

first row of questions (please use document below for the data) 1. 2.Based on th

Posted on June 17, 2024 | by Linus | Leave a Comment

first row of questions (please use document below for the data)
1.
2.Based on the above simple regression analysis, how much should the hiring managers expect to have for Newspaper/Magazine in the following January (Month 13)? Round to two decimal places, do not include the dollar sign.
3The hiring managers want to compare three different employment sources: Newspaper/Magazine, CareerBuilder, and Monster.com. Perform simple regressions as you did above predicting allocations for month 13 for those three employment sources. Which of the following statements would be the best conclusion from your analysis?
Group of answer choicesThe model R-square for Monster.com is the highest R-square value of the three platforms. Therefore, management will likely provide the most funding for that platform, and we should focus on that above the other two.The models for Monster.com and Newspaper/Magazine are both significant, which means we should focus on these two platforms over CareerBuilder because management will likely provide us the most funding for those two platforms.
The predicted monthly amount for CareerBuilder is lower than the predicted amount for Newspaper/Magazine. Therefore, they should expect the lowest amount of funding for CareerBuilder out of all of the three, and should concentrate on their efforts on the other two employment resources.
The monthly amounts for Monster.com have a negative slope, compared to the other two, which have a positive slope. Therefore, they should expect lower funding for Monster.com than in the past, and should concentrate their efforts on the other two employment resources.
All of the above are correct.
None of the above are correct.
4For ease of performing the next task, move the variable “Social Networks – Facebook, Twitter, etc” to column A on the spreadsheet. (i.e. Make sure it is the first column in the spreadsheet.)
The hiring managers recognize that the funding for one employment source depends on the funding for the other sources. Therefore, they want you to perform a multiple regression predicting Social Network funding from Months 1-12, while controlling for the funding of Billboard, Careerbuilder, Company Intranet – Partner, and Diversity Job Fair.
Assume that we know the funding for the the controlled variables for Month 13. They are:
Billboard: 520
Careerbuilder: 800
Company Intranet: 0
Diversity Job Fair: 1000
For this model and these known values, what is the predicted value for Social Network funding for Month 13?
5.Compare the performance of the model above with a model that just controls for the funding of Billboard. In other words, create a multiple regression predicting Social Network funding from Months 1-12, while controlling for the funding of Billboard. Compare the Regression Statistics. Which model does a better job of explaining the variance in Social Network funding?
Group of answer choicesThe model that just controls for BillboardThe model that controls for Billboard, Careerbuilder, Company Intranet, and Diversity Job Fair
second row
1
xiyi
13
27
35
411
514
Which of the following is a scatter diagrams accurately represents the data above?
Group of answer choicesNone of the diagramsDiagram 1
Diagram 2
Diagram 3
2Find the slope (b1) for the regression equation for the following values. Round to 3 decimal places.
Define Variablesxiyi
33180
25
170
50200
65155
57160
27165
3Try to approximate the relationship between x and y by drawing a straight line through the data. Which of the following is a scatter diagrams accurately represents the data?
Group of answer choicesDiagram 1Diagram 2
Diagram 3
None of the diagrams
4Find the intercept (b0) for the regression equation for the following values. Round to 3 decimal places.
Attention: The numbers may be different from the previous question.
Interceptxiyi
33180
25
170
50200
65192
57160
27165
5Flight bookings on the Orbitz travel site fluctuate throughout the year. In the month of December, the Orbitz team knows that bookings increase throughout the month. The team is trying to predict number of bookings in a given day throughout the month. They have sampled data from a few dates out of last December and would like to predict the upcoming December bookings. The data are below.
In this scenario’s regression equation, x is __________.
Orbitz TravelDates in DecemberBookings in the thousands
13
4
7
74
105
136
165
198
2210
2513
2714
3016
Group of answer choicesDate in DecemberDate in January
16 thousand
Bookings in the thousands
6Flight bookings on the Orbitz travel site fluctuate throughout the year. In the month of December, the Orbitz team knows that bookings increase throughout the month. The team is trying to predict number of bookings in a given day throughout the month. They have sampled data from a few dates out of last December and would like to predict the upcoming December bookings. The data are below.
Orbitz Travel SiteDates in DecemberBookings in the thousands
13
4
7
74
105
136
165
198
2210
2513
2714
3016
If the intercept for the above data is 1.82 and the slope is 0.41, what is the complete regression equation?
Group of answer choicesx = .41y + 1.82y = .41x + 1.82
y = 1.82(.41) + x
y = 1.82x + .41
7A casino is interested in the relationship between the amount of alcohol people buy and the amount of money they spend at the slot machines. They have sampled 10 people and measured the amount of alcohol they drank and how much they spend that day. Based on the regression line, they would like to predict how much money people will spend based on the amount that they drink. The data are below.
Alcohol by the OunceOunces purchasedDollars spent on slots
.535
1
64
2100
173
2.5110
3150
1.5130
5300
3130
2.5105
If the regression line is:
y = 50.87x + 7.78,
How much money do we predict that a person who drinks 2.94 ounces of alcohol spends? Round to 2 decimal places. Do not put a dollar sign in your answer.
8An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because they know that might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below:
Multiple Dog ConsiderationNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
In this scenario, the y (dependent variable) is:
Group of answer choicesRating (out of 5)Number of Dogs
Year of Facility
Number of Residents
9An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because thaty might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below. These data are the same as the previous question.
Multiple Dog ConsequencesNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
What is the b coefficient for Number of Dogs? Round to 3 decimal places.
10An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because that might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below. These data are the same as the previous question.
Consequences of Multiple DogsNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
What is the p-value for Number of Dogs? Round to 3 decimal places.
11An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because thaty might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below. These data are the same as the previous question.
Number of Dogs and RatingsNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
Fill in the blanks using the dropdown menus.
The effect of Number of Dogs has a p-value [ Select ] [“more than”, “equal to”, “less than”] .05, which means it is [ Select ] [“significant”, “not significant”] . It is [ Select ] [“likely”, “unlikely”] that the effect of Number of Dogs on Resident Ratings is due to chance (i.e. not a real effect).
The effect of Year of Facility has a p-value [ Select ] [“less than”, “equal to”, “more than”] .05, which means it is [ Select ] [“not significant”, “significant”] . It is [ Select ] [“unlikely”, “likely”] that the effect of Year of Facility on Resident Ratings is due to chance (i.e. not a real effect).
12An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because that might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below. These data are the same as the previous question.
Multiple Dog RatingsNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
What is the R Square value for this model? Round to 3 decimal places.
13. An apartment management company wants to explore the consequences of allowing residents to have multiple dogs. They would like to find out whether the number of dogs predicts resident ratings. They would also like to control for the year the apartment complex was built because that might also affect the resident rating. They have collected data on several of their existing complexes. For each complex, they have counted the number of dogs currently living in the complex, the year the complex was built, and the average rating for that particular complex. They would like to perform a multiple regression on these variables to predict resident ratings. See data below. These data are the same as the previous question.
Positive and Negative Dog RatingsNumber of dogsYear of facilityRating (out of 5)
5419752
31
19643.5
020154.8
1120113.8
7319642.3
2320163.7
020154.7
4919892.7
What can we conclude about this multiple regression analysis? Fill in the blanks.
There is a [ Select ] [“negative”, “positive”] effect of Number of Dogs on Resident Ratings. There is a [ Select ] [“positive”, “negative”] effect of Year of Facility on Resident Ratings. As Number of Dogs increases, Resident Ratings [ Select ] [“decrease”, “stay the same”, “increase”] . Therefore, we recommend that the management [ Select ] [“does not make any changes to dog limits”, “allows more dogs”, “allows fewer dogs”] in their future complexes.