Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 recor

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 records and details on 38 variables, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
To ensure everyone gets the same results, use the following code to convert categorical predictors to dummies, create training and holdout data sets, and normalize the training set and holdout set. Note the holdout set is normalized by using the training set.
# load the data and preprocess
toyota.df <- mlba::ToyotaCorolla toyota.df <- mlba::ToyotaCorolla %>%
mutate(
Fuel_Type_CNG = ifelse(Fuel_Type == “CNG”, 1, 0),
Fuel_Type_Diesel = ifelse(Fuel_Type == “Diesel”, 1, 0)
)

# partition
set.seed(1)
idx <- createDataPartition(toyota.df$Price, p=0.6, list=FALSE) train.df <- toyota.df[idx, ] holdout.df <- toyota.df[-idx, ] #Normalize the dataset. Use the training set to determine the normalization. normalizer <- preProcess(train.df, method="range") train.norm.df <- predict(normalizer, train.df) holdout.norm.df <- predict(normalizer, holdout.df) Fit a neural network model to the data. Use a single hidden layer with two nodes. Record the RMS error for the training data and the holdout data. Repeat the process, changing the number of hidden layers and nodes to single layer with 5 nodes, and two layers, 5 nodes in each layer. What happens to the RMS error for the training data as the number of layers and nodes increases? What happens to the RMS error for the holdout data? Comment on the appropriate number of layers and nodes for this application.

Posted in R

1) Download the Nutrition study data and read it into R-Studio. We will work wit

1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65 If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way. Report the counts for each value of these 2 new categorical variables. 2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following: a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance. b) Clearly state the null and alternative hypotheses in words and symbols. c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made. d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a). e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe? 3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables: Gender (male vs female) Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable? 4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to: Gender (male vs female) Smoke Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable? 5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for Gender (male vs female) Smoke Age_retired Alcohol_use Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses? 6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data? Your write-up should address each task

Posted in R

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 recor

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 records and details on 38 variables, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
To ensure everyone gets the same results, use the following code to convert categorical predictors to dummies, create training and holdout data sets, and normalize the training set and holdout set. Note the holdout set is normalized by using the training set.
# load the data and preprocess
toyota.df <- mlba::ToyotaCorolla toyota.df <- mlba::ToyotaCorolla %>%
mutate(
Fuel_Type_CNG = ifelse(Fuel_Type == “CNG”, 1, 0),
Fuel_Type_Diesel = ifelse(Fuel_Type == “Diesel”, 1, 0)
)

# partition
set.seed(1)
idx <- createDataPartition(toyota.df$Price, p=0.6, list=FALSE) train.df <- toyota.df[idx, ] holdout.df <- toyota.df[-idx, ] #Normalize the dataset. Use the training set to determine the normalization. normalizer <- preProcess(train.df, method="range") train.norm.df <- predict(normalizer, train.df) holdout.norm.df <- predict(normalizer, holdout.df) Fit a neural network model to the data. Use a single hidden layer with two nodes. Record the RMS error for the training data and the holdout data. Repeat the process, changing the number of hidden layers and nodes to single layer with 5 nodes, and two layers, 5 nodes in each layer. What happens to the RMS error for the training data as the number of layers and nodes increases? What happens to the RMS error for the holdout data? Comment on the appropriate number of layers and nodes for this application.

Posted in R

1) Download the Nutrition study data and read it into R-Studio. We will work wit

1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65 If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way. Report the counts for each value of these 2 new categorical variables. 2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following: a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance. b) Clearly state the null and alternative hypotheses in words and symbols. c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made. d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a). e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe? 3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables: Gender (male vs female) Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable? 4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to: Gender (male vs female) Smoke Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable? 5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for Gender (male vs female) Smoke Age_retired Alcohol_use Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses? 6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data? Your write-up should address each task

Posted in R

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 recor

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 records and details on 38 variables, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
To ensure everyone gets the same results, use the following code to convert categorical predictors to dummies, create training and holdout data sets, and normalize the training set and holdout set. Note the holdout set is normalized by using the training set.
# load the data and preprocess
toyota.df <- mlba::ToyotaCorolla toyota.df <- mlba::ToyotaCorolla %>%
mutate(
Fuel_Type_CNG = ifelse(Fuel_Type == “CNG”, 1, 0),
Fuel_Type_Diesel = ifelse(Fuel_Type == “Diesel”, 1, 0)
)

# partition
set.seed(1)
idx <- createDataPartition(toyota.df$Price, p=0.6, list=FALSE) train.df <- toyota.df[idx, ] holdout.df <- toyota.df[-idx, ] #Normalize the dataset. Use the training set to determine the normalization. normalizer <- preProcess(train.df, method="range") train.norm.df <- predict(normalizer, train.df) holdout.norm.df <- predict(normalizer, holdout.df) Fit a neural network model to the data. Use a single hidden layer with two nodes. Record the RMS error for the training data and the holdout data. Repeat the process, changing the number of hidden layers and nodes to single layer with 5 nodes, and two layers, 5 nodes in each layer. What happens to the RMS error for the training data as the number of layers and nodes increases? What happens to the RMS error for the holdout data? Comment on the appropriate number of layers and nodes for this application.

Posted in R

1) Download the Nutrition study data and read it into R-Studio. We will work wit

1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65 If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way. Report the counts for each value of these 2 new categorical variables. 2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following: a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance. b) Clearly state the null and alternative hypotheses in words and symbols. c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made. d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a). e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe? 3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables: Gender (male vs female) Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable? 4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to: Gender (male vs female) Smoke Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable? 5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for Gender (male vs female) Smoke Age_retired Alcohol_use Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses? 6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data? Your write-up should address each task

Posted in R

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 recor

Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 records and details on 38 variables, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
To ensure everyone gets the same results, use the following code to convert categorical predictors to dummies, create training and holdout data sets, and normalize the training set and holdout set. Note the holdout set is normalized by using the training set.
# load the data and preprocess
toyota.df <- mlba::ToyotaCorolla toyota.df <- mlba::ToyotaCorolla %>%
mutate(
Fuel_Type_CNG = ifelse(Fuel_Type == “CNG”, 1, 0),
Fuel_Type_Diesel = ifelse(Fuel_Type == “Diesel”, 1, 0)
)

# partition
set.seed(1)
idx <- createDataPartition(toyota.df$Price, p=0.6, list=FALSE) train.df <- toyota.df[idx, ] holdout.df <- toyota.df[-idx, ] #Normalize the dataset. Use the training set to determine the normalization. normalizer <- preProcess(train.df, method="range") train.norm.df <- predict(normalizer, train.df) holdout.norm.df <- predict(normalizer, holdout.df) Fit a neural network model to the data. Use a single hidden layer with two nodes. Record the RMS error for the training data and the holdout data. Repeat the process, changing the number of hidden layers and nodes to single layer with 5 nodes, and two layers, 5 nodes in each layer. What happens to the RMS error for the training data as the number of layers and nodes increases? What happens to the RMS error for the holdout data? Comment on the appropriate number of layers and nodes for this application.

Posted in R

1) Download the Nutrition study data and read it into R-Studio. We will work wit

1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65 If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way. Report the counts for each value of these 2 new categorical variables. 2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following: a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance. b) Clearly state the null and alternative hypotheses in words and symbols. c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made. d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a). e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe? 3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables: Gender (male vs female) Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable? 4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to: Gender (male vs female) Smoke Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable? 5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for Gender (male vs female) Smoke Age_retired Alcohol_use Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses? 6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data? Your write-up should address each task

Posted in R

For my part is only need to do the income prediction based on the existing data

For my part is only need to do the income prediction based on the existing data then give me the RMD code as well as a report which is a pdf file with screenshots of codes and runs the result and give me the explanation based on it, I need to present please give me explanation!!!
only see the case. And I don’t need to read it fully. Find my part for income!!!

Posted in R

do the project, and figure out the income prediction for 2004. give me the r cod

do the project, and figure out the income prediction for 2004. give me the r code as well as an report for explanation with results and explanation. I will tip you if you are really carefully.
first read the case and only do the income prediction part. predict for 2004. give me 2 files in the end one RMD. one pdf

Posted in R