1. Drills with R on K-NN models
This problem is related to Nearest neighbors classifiers described in section 9.5 in “Modern Statistics with R” – https://modernstatisticswithr.com: Fit a kNN classification model to the wine data, using pH, alcohol, fixed.acidity, and residual.sugar as explanatory variables. Evaluate its performance using 10-fold cross-validation, using AUC to choose the best k.
To solve the problem, you’ll need to load the data and libraries with:
# Import data about white and red wines:
white <- read.csv("https://tinyurl.com/winedata1",sep = ";")
red <- read.csv("https://tinyurl.com/winedata2",sep = ";")
# Add a type variable:
white$type <- "white"
red$type <- "red"
# Merge the datasets:
wine <- rbind(white, red)
wine$type <- factor(wine$type)
install.packages('caret', dependencies = TRUE)
library(caret)
# to visualize results you need the following
install.packages('MLeval', dependencies = TRUE)
library(MLeval)
For the submission:
1. Provide the commands in plain text that you used to solve the problem.
Attach the figure that resulted after command: plots$roc
Output after executed command: plots$optres[[1]][13,]
Attach the figure that resulted after command: plots$cc
2. Dissimilarities between data objects
This project demonstrates how to measure similarities between data objects. These topics described are mostly in chapter 6 Statistical Machine Learning from ‘Practical Statistics for Data Scientists’. Cover in the project the following:
Find some data examples and show examples of calculatingEuclidean distance
L1 distance
Prove or disprove that Euclidean and L1 distance satisfyPositivity d(x,y) >= 0 for all x and y, d(x,y) == 0 only if x == y.
Symmetry d(x,y) == d(y,x) for all x and y.
Triangle Inequality d(x,z) <= d(x,y) + d(y,z) for all points x, y, and z
Explain why it is not possible or why it is possible torearrange data so Euclidean distance gives the same meaning as Hamming distance
show that measure d=1-cos(x,y) satisfies positivity, symmetry, and triangle Inequality
Draw conclusions about what is important when choosing the distance measure for the evaluation of dissimilarities between data objects.
Assignment 1 and 2 are to be done in 2 different papers in APA format
Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount