Please work on all the questions and provide the rationale or show the work when necessary.
Q2. Please fill in each blank by selecting the answer from the Dropdown:
a. When you have 1 million observations and there are only 2 predictors, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.
b. When the variance of the error term is very large, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.
c. When the relationship between the DV and IVs is linear, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.
Q3. Select True/False for each of the following:
a. A fitted value at an observation point for a linear regression model is a linear combination of the observed response values. [ Select ] [“True”, “False”]
b. In a simple linear regression, the least square regression line may not go through the point (). [ Select ] [“True”, “False”]
c. For a simple linear regression, R^2 is the squared correlation between the DV and the IV. [ Select ] [“True”, “False”]
d. Bootstrap is a resampling method with replacement. [ Select ] [“False”, “True”]
Q4. Let be the original sample. Suppose that we obtain a bootstrap sample from this original sample with n observations.
(i) the probability that the 2nd bootstrap observation is is___________
(ii) the probability that the 3rd bootstrap observation is is___________
(iii) the probability that is not in the bootstrap sample is___________
(iv) the probability that is in the bootstrap sample is___________
(v) the probability that the bootstrap sample is (all bootstrap observations are ) is___________
Q5. Suppose that you wish to invest a fixed sum of money in two financial assets that yield returns of X and Y, respectively, where X and Y are random quantities. You invest some percent of your money in X and the remaining in Y. In general, you would like to [ Select ] [“minimize”, “maximize”] the expected return. Since there is variability associated with the returns on these two assets, you may need to [ Select ] [“maximize”, “minimize”] the variance of the investment.
Q6. Identify the predictor variable and the response variable in each of the following situations:
(a) A training director wishes to study the relationship between the duration of training for new recruits and their performance in a skilled job.
Predictor variable:
Response variable:
(b) A market analyst wished to relate the expenditures incurred in promoting a product in test markets and the subsequent amount of product sales.
Predictor variable:
Response variable:
(c) The aim of a study is to relate the carbon monoxide level in blood samples from smokers with the average number of cigarettes they smoke per day.
Predictor variable:
Response variable:
Q7. Suppose you have a simple linear regression model as below:
where is a normal random variable with mean 0 and standard deviation 2.
(a) Identify the values of the parameters , and in the statistical model:
= _________
= _________
= _________
(b) What will be expected value of Y when X=5?
Q8. Which of the following scenario is NOT a classification problem?
( ) We are considering launching a new product and wish to know the required marketing budget to generate the expected amount of sales, based on 20 similar products previously launched.
( ) We want to predict whether an email is a spam and should be delivered to the Junk folder.
( ) We want to identify the handwritten single-digit number from an image.
( ) We are considering launching a new product and wish to know whether it will be a success or a failure, based on 20 similar products previously launched.
Q9. Identify the sample size n and # of predictors p in each of the following scenario:
We conducted a survey with 286 participants responded, to understand how burnout is related to gender, age, education level, fatigue, income, family status, amount of exercise, and health condition.
Sample size n = _______
# of predictors p = ___________
Q10. Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, =−5, = 0.06, = 0.95
(a) Estimate the probability that a student who studies for 20 hour and has an undergrad GPA of 4.0 gets an A in the class. (keep four decimal places)
the probability = ________
(b) How many hours would the student with undergrad GPA of 4.0 need to study to have a 90% chance of getting an A in the class? (keep two decimal places)
______ hours would be needed.
Q11. Match the curves below:
Orange Curve B
[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error
Blue Curve D
[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error
Red Curve A
[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error
Purple Curve C
[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error
Q12. Below is a table of outputs from running a linear regression:
Based on the outputs, answer the following questions by filling the blanks.
(a) What is the estimate of coefficient for the predictor “radio”?
(b) How do you interpret the estimate in (a)?
(c) Which predictor is not significant when the other predictors are included in the model?
Q13. Below is the output from running a linear regression model:
What % of variation from the response variable is explained by this regression model?
____________
Q14. Below is partial output from running a linear regression model:
To improve the model, we may take away one of the predictors from the model. Which predictor should be removed to improve the model? _____________
Q15. From the boxplots below:
what can you conclude? (choose the best answer)
( ) Both “Balance” and “Income” impact “Default” significantly
( ) can’t tell
( )”Balance” does not impact “Default” significantly
( ) Neither “Balance” nor “Income” impacts “Default” significantly
( ) “Balance” impacts “Default” significantly
Q16. Below is the confusion matrix from a classification model:
Predicted positive
Predicted negative
Actual positive
70
2
Actual negative
8
20
(a) What is the overall accuracy of the prediction (in %, with two decimal places)?
(b) What is the overall error rate (in %, with two decimal places)?
(c) What is the specificity (in %, with two decimal places)?
(d) What is the sensitivity (in %, with two decimal places)?
Q17. When you run multiple Logistic Regression models, which of the following is not a good measure for model selection/assessment?
( ) Sensitivity
( ) Accuracy
( ) Specificity
( ) Error rate
( ) Split percentage for training and testing
Q18. Suppose we have a data set with five predictors,
X1 = GPA,
X2 = IQ,
X3 = Gender (1 for Female and 0 for Male),
X4 = Interaction between GPA and IQ, and
X5 = Interaction between GPA and Gender.
The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model,
(keep one decimal place)
(a) Predict the salary of a female with IQ of 120 and a GPA of 4.0 _______________
(b) Predict the salary of a male with IQ of 120 and a GPA of 4.0 ________________
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount