Housing DataWork individually on this assignment. You are encouraged to collabor

Housing DataWork individually on this assignment. You are encouraged to collaborate on ideas and strategies pertinent to this assignment. Data for this assignment is focused on real estate transactions recorded from 1964 to 2016 and can be found in Housing.xlsx. Using your skills in statistical correlation, multiple regression, and R programming, you are interested in the following variables: Sale Price and several other possible predictors.If you worked with the Housing dataset in previous week – you are in luck, you likely have already found any issues in the dataset and made the necessary transformations. If not, you will want to take some time looking at the data with all your new skills and identifying if you have any clean up that needs to happen.
Complete the following:Explain any transformations or modifications you made to the dataset.
Create a linear regression model where “sq_ft_lot” predicts Sale Price.
Get a summary of your first model and explain your results (i.e., R2, adj. R2, etc.)
Get the residuals of your model (you can use ‘resid’ or ‘residuals’ functions) and plot them. What the does the plot tell you about your predictions?
Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
Now, create a linear regression model that uses multiple predictor variables to predict Sale Price (feel free to derive new predictors from existing ones). Explain why you think each of these variables may add explanatory value to the model.
Get a summary of your next model and explain your results.
Get the residuals of your second model (you can use ‘resid’ or ‘residuals’ functions) and plot them. What the does the plot tell you about your predictions?
Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
Compare the results (i.e., R2, adj R2, etc) between your first and second model. Does your new model show an improvement over the first? To confirm a ‘significant’ improvement between the second and first model, use ANOVA to compare them. What are the results?
After observing both models (specifically, residual normality), provide your thoughts concerning whether the model is biased or not.
Another important aspect of regression tasks is determining the accuracy of your predictions. For this section, we will look at root mean square error (RMSE), a common accuracy metric for regression models.Install the ‘Metrics’ package in R Studio
Using the first model, we will make predictions on the dataset using the predict function. An example would look like this (will vary for you based on variable names):‘preds <- predict(object = modelName, newdata = dataset)’ Use the ‘rmse’ function to get RMSE for the model (‘rmse(actual, predicted)’) What is the RMSE for the first model? Perform the same task for the second model. Provide the RMSE for the second model. Did the second model’s RMSE improve upon the first model? By how much? Submission InstructionsFor all assignments in this course, you must export the script or Markdown file to PDF. All submissions must include a PDF that includes your code and output. You are welcome to include your script or a link to GitHub or another external repo, but you must also include a PDF at a minimum. No zip files are accepted either.Answer: Upload RMarkdown file & PDF

Posted in R

Housing DataWork individually on this assignment. You are encouraged to collabor

Housing DataWork individually on this assignment. You are encouraged to collaborate on ideas and strategies pertinent to this assignment. Data for this assignment is focused on real estate transactions recorded from 1964 to 2016 and can be found in Housing.xlsx. Using your skills in statistical correlation, multiple regression, and R programming, you are interested in the following variables: Sale Price and several other possible predictors.If you worked with the Housing dataset in previous week – you are in luck, you likely have already found any issues in the dataset and made the necessary transformations. If not, you will want to take some time looking at the data with all your new skills and identifying if you have any clean up that needs to happen.
Complete the following:Explain any transformations or modifications you made to the dataset.
Create a linear regression model where “sq_ft_lot” predicts Sale Price.
Get a summary of your first model and explain your results (i.e., R2, adj. R2, etc.)
Get the residuals of your model (you can use ‘resid’ or ‘residuals’ functions) and plot them. What the does the plot tell you about your predictions?
Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
Now, create a linear regression model that uses multiple predictor variables to predict Sale Price (feel free to derive new predictors from existing ones). Explain why you think each of these variables may add explanatory value to the model.
Get a summary of your next model and explain your results.
Get the residuals of your second model (you can use ‘resid’ or ‘residuals’ functions) and plot them. What the does the plot tell you about your predictions?
Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
Compare the results (i.e., R2, adj R2, etc) between your first and second model. Does your new model show an improvement over the first? To confirm a ‘significant’ improvement between the second and first model, use ANOVA to compare them. What are the results?
After observing both models (specifically, residual normality), provide your thoughts concerning whether the model is biased or not.
Another important aspect of regression tasks is determining the accuracy of your predictions. For this section, we will look at root mean square error (RMSE), a common accuracy metric for regression models.Install the ‘Metrics’ package in R Studio
Using the first model, we will make predictions on the dataset using the predict function. An example would look like this (will vary for you based on variable names):‘preds <- predict(object = modelName, newdata = dataset)’ Use the ‘rmse’ function to get RMSE for the model (‘rmse(actual, predicted)’) What is the RMSE for the first model? Perform the same task for the second model. Provide the RMSE for the second model. Did the second model’s RMSE improve upon the first model? By how much? Submission InstructionsFor all assignments in this course, you must export the script or Markdown file to PDF. All submissions must include a PDF that includes your code and output. You are welcome to include your script or a link to GitHub or another external repo, but you must also include a PDF at a minimum. No zip files are accepted either.Answer: Upload RMarkdown file & PDF

Posted in R

In the MinnLand dataset in the alr4 package, fit two possible models with log(ac

In the MinnLand dataset in the alr4 package, fit two possible models with log(acrePrice) or sqrt(acrePrice) as the response (y) variable and your choice of independent (x) variables (make sure there are no NAs or missing data for your variable of choice). Be sure to explain why you chose the variables that you did (and the help file has great descriptions of the data) and talk a bit about the outcomes of the different methods. Use methods we’ve learned to develop those two possible candidate models and compare them using:
5 Fold Cross Validation
10 Fold Cross Validation
Random Splitting with 1000 splits.
Note: The emphasis is on producing and interpreting the two models. The salarygov example is helpful for k-Fold cross validation. You may use, but aren’t required to use, parallel computing.

Posted in R

This project has to be done using ‘R’. 2 files for the report, the rmd, and pdf.

This project has to be done using ‘R’. 2 files for the report, the rmd, and pdf. The pdf has to come from the knit that R has. Also, you have to share the data in the excel file (.csv). Attached you will find examples for different project.
Also I need the notebook with the code from rstudio and the data set excel(.csv)
Project: NBA season spanning from 2019 to 2022
Data set description:
We will collect and analyze data for the NBA season spanning from 2019 to 2022 to explore factors influencing team performance in terms of regular season wins. The dataset will include information on various variables for each NBA team during this period. Here are the variables for the proposed dataset:
Team: The NBA team’s name.
Year: The year of the NBA season (2019, 2020, 2021, 2022).
Games Played (G): The total number of games played by the team in the regular season.
Points Scored (PTS): The total number of points scored by the team in the regular season.
Points Allowed (PTA): The total number of points the team’s defense allows in the regular season.
Wins (W): The total number of regular seasons wins achieved by the team.
Losses (L): The total number of regular-season losses.
Winning Percentage (WP): The ratio of wins to games played, calculated as W/G.
Field Goal Percentage (FG%): The team’s field goal shooting percentage.
Three-Point Percentage (3P%): The team’s three-point shooting percentage.
Free Throw Percentage (FT%): The team’s free throw shooting percentage.
Assists (AST): The total number of assists made by the team.
Rebounds (REB): The total number of rebounds collected by the team.
Turnovers (TOV): The total number of turnovers committed by the team.
Steals (STL): The total number of steals made by the team.
Blocks (BLK): The total number of blocks recorded by the team.
Playoffs (binary): A binary variable indicating whether the team made it to the playoffs (1) or not (0).
Grading
Rubric for Report
Abstract (approximately 1/4 page)
Introduction (including motivation) section
Analysis – Descriptive analysis of the data or after the regression
modeling. Examples – Correlation matrix, scatter plots, bar charts,
outlier testing, supplemental modeling.
Modeling – Specify a parsimonious model that contains no more than 8
predictor variables
Bullet 1
Bullet 2
Diagnostics – Provide residual plot(s) and plot(s) discussing
normality/linear regression assumptions.
Conclusion section
Neatness (no credit will be given for reports with R code or data printed
inside the report)
Group member assessment

Posted in R

Using either the same dataset(s) you used in the previous weeks’ exercise or a b

Using either the same dataset(s) you used in the previous weeks’ exercise or a brand-new dataset of your choosing, perform the following transformations (Remember, anything you learn about the Housing dataset in these two weeks can be used for a later exercise!)
Using the dplyr package, use the 6 different operations to analyze/transform the data – GroupBy, Summarize, Mutate, Filter, Select, and Arrange – Remember this isn’t just modifying data, you are learning about your data also – so play around and start to understand your dataset in more detail
Using the purrr package – perform 2 functions on your dataset. You could use zip_n, keep, discard, compact, etc.
Use the cbind and rbind function on your dataset
Split a string, then concatenate the results back together
If done well, I will hire you for a Long-term project!

Posted in R

Homework Assignment: Exploring Gapminder Data with Tidyverse Objective: To pract

Homework Assignment: Exploring Gapminder Data with Tidyverse
Objective: To practice data manipulation and exploration using the tidyverse package in R by working with the gapminder dataset.
Instructions:
Dataset: Use the gapminder dataset from the gapminder package. You can load the dataset using the following code:
RCopy code
library(gapminder) data(“gapminder”)
Tasks:
Data Exploration (5 points): Begin by exploring the gapminder dataset to understand its structure and content. Use the head() and summary() functions to get a sense of the data. Explain briefly what the dataset contains.
Data Filtering (15 points): Perform the following filtering operations and provide the resulting data for each:a. Filter the dataset to include only rows where the year is 2007.b. Filter the dataset to include only rows where the country is either “United States” or “Canada” and the year is 2002 or 2007.c. Filter the dataset to include only rows where the population is greater than 1 billion.
Data Selection (10 points): Perform the following selection operations on the original dataset and not the dataset from step 2 and provide the resulting data for each:a. Select only the columns: “country,” “year,” “lifeExp,” and “pop.”b. Select all columns except “gdpPercap.”
Data Distinctness (10 points): Find and display a list of unique countries present in the original dataset. Explain how you ensured that the list contains only unique values.
Data Arrangement (10 points): Arrange the original dataset in descending order of “gdpPercap” and then in ascending order of “lifeExp.” Provide the resulting datasets as data frames.
Submission Guidelines:
Prepare an R script that includes the necessary code to complete the tasks.
Include comments in your code to explain your approach.
Create a report (in PDF or Word format) summarizing your findings and observations for each task.
Submit both the R script and the report in a zip file as your homework assignment.
Grading Rubric:
Each task will be graded out of the maximum points mentioned.
Correctness of the code and output will be evaluated.
Explanation and clarity in the report will be considered for grading.
Please complete the assignment and submit it as per the instructions. Good luck with your homework!

Posted in R

Hi there, i have 2 parts assignments in R, part A and part B, it should be an ea

Hi there,
i have 2 parts assignments in R, part A and part B,
it should be an easy for an expert
the most important thing is the work must be 100% original, no AI
all the instructions are in files, please follow them carefully,
i can provide any file that mentioned in the assignment
Thank you

Posted in R

Finishing all questions using R Studio and making a Word document with questions

Finishing all questions using R Studio and making a Word document with questions and answers.—
title: “Introduction to Data Analytics 1”
author: “Enter Your Name”
date: “`r Sys.Date()`”
output: word_document

“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`
# Part 1: Variables, Hypothesis, Designs
*Title:* Offshore outsourcing: Its advantages, disadvantages, and effect on the American economy
*Abstract*: The United States has trained some of the world’s best computer programmers and technology experts. Despite all of this training, many businesses do not have a full understanding of information technology. As the importance of technology in the business world grows, many companies are wasting money on extensive technology projects. When problems arise, they expect that further investment will solve these issues. To prevent such problems, many companies have begun to outsource these functions in an effort to reduce costs and improve performance. The majority of these outsourced information technology and call center jobs are going to low-wage countries, such as India and China where English-speaking college graduates are being hired at substantially lower wages. The purpose of this study is to evaluate the positive and negative aspects of offshore outsourcing with a focus on the outsourcing markets in India and China, arguably the two most popular destinations for outsourcers. The cost savings associated with offshore outsourcing will be evaluated in relation to the security risks and other weakness of offshore outsourcing. In addition, an analysis of the number of jobs sent overseas versus the number of jobs created in the United States will be used to assess the effects that outsourcing is having on the American economy and job market. Finally, the value of jobs lost from the American economy will be compared to the value of jobs created. The goal of these analyses is to create a clear picture of this increasingly popular business strategy.
Answer the following questions about the abstract above:
1) What is a potential hypothesis of the researchers?
2) What is one of the independent variables?
a. What type of variable is the independent variable?
3) What is one of the dependent variables?
a. What type of variable is the dependent variable?
4) What might cause some measurement error in this experiment?
5) What type of research design is the experiment?
a. Why?
6) How might you measure the reliability of your dependent variable?
7) Is this study ecologically valid?
8) Can this study claim cause and effect?
a. Why/why not?
9) What type of data collection did the researchers use (please note that #5 is a different question)?
# Part 2: Use the assessment scores dataset (03_lab.csv) to answer these questions.
The provided dataset includes the following information created to match the abstract:
– Jobs: the percent of outsourced jobs for a call center.
– Cost: one calculation of the cost savings for the business.
– Cost2: a separate way to calculate cost savings for the business.
– ID: an ID number for each business.
– Where: where the jobs were outsourced to.
> 03_data <- read.csv("~/Downloads/03_data.csv", header=FALSE) Calculate the following information: data <- read.csv("~/Downloads/03_data.csv", header = FALSE) colnames(data) <- c("Jobs", "Cost1", "Cost2", "ID", "Where") 1) Create a frequency table of the percent of outsourced jobs. ```{r} jobs_frequency <- table(data$Jobs) print(jobs_frequency) ```

Posted in R