*Please complete submission form for Relationship Testing Workbook. Submission form and completed workbook attached.
Data Transformation Assignment
Instructions:
The assignment deliverable is to submit your completed Excel Workbook with the Variable Feature Creation Analysis tasks completed as a SINGLE EXCEL FILE.
Use the provided “Assignment Submission Form – Data Transformation” document to organize your output and answers.
Continue working on the same “Housing Market Dataset – Post Midterm” workbook from the “Variable Creation” Lab and HW. We will continue to build on this workbook for all Lab, Homework, and Final Project objectives. If you are not using the Post Midterm file, please begin now. You can copy your Relationship Significance Testing sheets into the Post Midterm workbook.
Save your work as you go and ultimately make sure you save it in your cloud storage environment (Microsoft OneDrive or Sharepoint, or GoogleDrive). My computer crashed is not a valid excuse for late or incomplete work.
Check off each of the tasks as you complete them. It is expected that the student has attended/or watched and attempted the lecture tutorials in order to understand how to complete the assignment tasks. Save and rename your Excel workbook as “LastName Firstname – Data Transformation Lab and Homework.”
Submit your Excel workbook AND your assignment submission form to the “Assignments” folder in D2L (Desire 2 Learn).
**Points will be deducted for failure to follow all instructions, regardless of how trivial you think they are.
Please ensure that:
You are only submitting one Excel workbook with all tasks completed with the instructed worksheet tabs.
You have named your submission according to the instruction above.
Your worksheets are organized and easy to follow.
Task 1: Quantitative Transformation (Refer to the Data Transformation Lecture Manual and Videos for Help)
Instructions: Perform all analysis in a newly created worksheet “Quant Transform Data”
READ THE BACKGROUND INFO FIRST
The variables Living Area, Masonry Veneer Area, and Sale Price all have either slightly or significantly skewed distributions. Refer to your data diagnosis and univariate analysis to confirm the skew. Perform 2 transformations for each of the 3 variables. Remember root transformations for right skewed data and power transformations for left skewed data. Paste your excel formulas in the assignment submission form.
Create a simple histogram for each newly transformed variable and the original. You do not need to format these perfectly, just make sure there is enough information to tell which transformation it is. Paste your simple histograms in the assignment submission form.
Calculate the mean and median of each variable transformation and format it in a table with borders and shading, rounding, etc. You can just use the =median() and =average() functions for a quick and easy method. Paste your table in the assignment submission form.
Create a single correlation matrix with all the quantitative variables expect ID (ID… it’s not a quantitative variable to begin with, just an identifier) and format it neatly (round your decimals, add a title, and some borders, etc..). Paste your table in the assignment submission form.
Provide a brief write up of the changes in distribution shape. Are they becoming more symmetric? Are the means and medians getting closer together?
Provide a brief write up of the changes in the correlation values. As we move through the transformations, are the relationships getting stronger or weaker with each transformation?
Task 2: Categorical Transformation (Preparation (Refer to the Data Transformation Lecture Manual and Videos for Help)
Instructions: Perform all analysis in a newly created worksheet “Cat Transform Data”
READ THE BACKGROUND INFO FIRST
REDUCE: Combine all underrepresented categories for House Style, Overall Quality, and Kitchen Quality with the most logical category for the variable. Paste your if statements in the assignment submission form.
House Style (2.5 Story is underrepresented)
Kitchen Quality (Poor is underrepresented)
Overall Quality (Very Poor, Poor, Fair, and Very Excellent are underrepresented)
CHECK: Create simple pivot tables for your newly reduced variable to check that the combinations were successful. These do not need to be professional formatted. This is just a sanity check. Paste your simple pivot tables in the assignment submission form.
NUMERICALLY ENCODE: Recode the newly reduced categorical variables and the foundation variable as numbers. Remember the nominal variables are always dummy coded and ordinal variables can be rank coded if they show an increasing or decreasing relationship with the target. Paste your if statements in the assignment submission form.
House Style
Foundation
Kitchen Quality
Overall Quality
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount