Variable Creation Assignment
Instructions:
The assignment deliverable is to submit your completed Excel Workbook with the Variable Feature Creation Analysis tasks completed as a SINGLE EXCEL FILE.
Use the provided “Assignment Submission Form – Variable Feature Creation” document to organize your output and answers.
Continue working on the same “Housing Market Dataset – Post Midterm” workbook from the “Relationship Testing” Lab and HW. We will continue to build on this workbook for all Lab, Homework, and Final Project objectives. If you are not using the Post Midterm file, please begin now. You can copy your Relationship Significance Testing sheets into the Post Midterm workbook.
Save your work as you go and ultimately make sure you save it in your cloud storage environment (Microsoft OneDrive or Sharepoint, or GoogleDrive). My computer crashed is not a valid excuse for late or incomplete work.
Check off each of the tasks as you complete them. It is expected that the student has attended/or watched and attempted the lecture tutorials in order to understand how to complete the assignment tasks. Save and rename your Excel workbook as “LastName Firstname – Variable Feature Creation Lab and Homework.”
Submit your Excel workbook AND your assignment submission form to the “Assignments” folder in D2L (Desire 2 Learn).
**Points will be deducted for failure to follow all instructions, regardless of how trivial you think they are.
Please ensure that:
You are only submitting one Excel workbook with all tasks completed with the instructed worksheet tabs.
You have named your submission according to the instruction above.
Your worksheets are organized and easy to follow.
Task 1: Quantitative Binning (Refer to the Variable Creation Lecture PPT, Manual, Workbooks and Videos for help)
Instructions: Create a new worksheet called “Quant Binning” and copy the Housing ID, MasVnrArea, TotalBsmtSize, Lot Area, and Sale Price Variables into the worksheet. Perform all analysis tasks for Task 1 in this worksheet.
Often times, when our quantitative variables do not have a very strong relationship with the target, we can create a new variable to look at the information in a new way. We can also use binning and discretizing techniques to prepare data from classification models (not taught in this course), and to smooth the influence of skewness and outliers. One option we have is creating a categorical variable by binning the values of a quantitative variable into groups or categories. After we create the variable, we need to analyze its relationship with the target and if there is a relationship, we need to either rank code it or dummy code it depending on the relationship we see.
**We will only cover integer (whole) number encoding in this course, but there are more advanced ways we can assign weights and influence to the discrete categories we create.
CREATE: Let’s bin the values for TotalBsmtSize(Sqft), LotArea(Sqft), and MasVnrArea(Sqft). I must see your “If” statements for credit, do not copy and paste the data as values after creating or you will receive no credit.
Create a new ordinal categorical variable called “Basement Size” by binning the values for “TotalBsmtSize” according to the definitions below. Copy and paste your nested IF statement formula into the Assignment Submission form.
0 sqft= “None”
1-500 sqft = “Small”
501 – 1000 sqft = “Medium”
1001 – 1500 sqft = “Large”
1501+ sqft = “Extra Large”
Create a new ordinal categorical variable called “Lot Size” by binning the values for “LotArea” according to the definitions below. Copy and paste your nested IF statement formula into the Assignment Submission form.
0 – 4999 = “Extra Small”
5000 – 9999 = “Small”
10000 – 14999 = “Medium”
15000 – 19999 = “Large”
20,000+ sqft = “Extra Large”
Create a new ordinal categorical variable called “Masonry Veneer Discrete” by binning the values for “MasVnrArea” according to the definitions below. Notice how they are coded as discrete numbers. Copy and paste your nested IF statement formula into the Assignment Submission form.
0 sqft= 0
1-200 sqft = 1
201 – 400 sqft = 2
401 – 600 sqft = 3
601+ = 4
ASSESS: Look at the bivariate statistics for each creation. Does there appear to be a relationship between Basement Size and Sale Price, and between Lot Size and Sale Price? Compare means, standard deviations, and category sizes. Are there any categories with low frequencies that we should be concerned about being underrepresented?
Create a pivot table (using the Pivot Table tool in the “Insert” tab in Excel). Select your newly created “Basement Size” variable and the original “Sale Price” variable for your “Table/Range” and place the table in the Existing Worksheet.Add “Basement Size” to the Rows box.
Add “Sale Price” to the Values box and change the value field settings to calculate “Average of Sale Price”, instead of “Sum of Sale Price”
Add “Sale Price” to the Values box and change the value field settings to calculate “StdDev of Sale Price”, instead of “Sum of Sale Price”
Add “Sale Price” to the Values box and change the value field settings to calculate “Count of Sale Price”, instead of “Sum of Sale Price”
Make sure your categories are in order and change “Row Labels” to “Basement Size”.
Copy and paste your pivot table as a picture into the Assignment Submission form and assess whether you feel a relationship may be present and why.
Excel Workbook Reference:
Create a pivot table (using the Pivot Table tool in the “Insert” tab is Excel). Select your newly created “Lot Size” variable and the original “Sale Price” variable for your “Table/Range” and place the table in the Existing Worksheet.Add “Lot Size” to the Rows box.
Add “Sale Price” to the Values and change value field settings the calculated “Average of Sale Price”, instead of “Sum of Sale Price”.
Add “Sale Price” to the Values box again and change value field settings the calculated “StdDev of Sale Price”, instead of “Sum of Sale Price”.
Add “Sale Price” to the Values box again and change value field settings the calculated “Count of Sale Price”, instead of “Sum of Sale Price”.
Make sure your categories are in order and change “Row Labels” to “Lot Size”.
Copy and paste your pivot table as a picture into the Assignment Submission form and assess whether you feel a relationship may be present and why.
Using the correlation function in Excel (=CORREL(variable 1 range, variable 2 range)), calculate a correlation value for the new discretized version of masonry veneer area and sale price. Compare it with the correlation value in the table below. Did discretizing the variable improve the correlation?
Task 2: Quantitative Aggregations (Refer to the Variable Creation Lecture PPT, Manual, Workbooks and Videos for help)
Instructions: Create a new worksheet called “Quant Create Agg” and copy the Housing ID, Outdoor Living Variable (WoodDeck, OpenPorch, EnclosedPorch, 3SsnPor, ScreenPorch, and PoolArea), the year variables (YearRemod and YrSold),.and Sale Price Variable into the worksheet. Perform all analysis tasks for Task 2 in this worksheet.
CREATE: Let try some mathematical aggregations of variables to see if we can create new variables that have a stronger relationship with the target than each variable individually. I must see your formulas in Excel for credit, do not copy and paste the data as values after creating or you will receive no credit.
Create a new variable called “Total Outdoor Living Area” by adding (=sum()) together all the outdoor living area variables (WoodDeck, OpenPorch, EnclosedPorch, 3SsnPor, ScreenPorch, and PoolArea)
Create a new variable called “Average Outdoor Living Area” by averaging (=average()) all the outdoor living area variables (WoodDeck, OpenPorch, EnclosedPorch, 3SsnPor, ScreenPorch, and PoolArea)
Create a new variable called “Year Since Remodel” by subtracting the YearRemod from YrSold variable.
ASSESS: Now we have to determine the best version of the variables. We can only use each variable 1 time, so we can either use the original variable as it was given, or an aggregated/created formulation.
Create a correlation matrix with all the original variables, all the created variables, and the target variable. Copy and paste the correlation matrix as a picture in your Assignment Submission form.
Excel Workbook Reference:
Provide is brief write up of which version is the best version and why.
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount