R set : biopsy dataset from the MASS package.
IMPORTANT NOTE: There is a function called select in both MASS and tidyverse. In order to make sure that select() works as you intend (to select columns from a tibble), load the MASS library BEFORE you load the tidyverse libraries. Here is an example:
If you continue to have issues, use dplyr::select() instead of just typing select().
Review the dataset documentation and describe the dataset in your own words.
Prepare your data:
Save to a variable as a tibble.
Remove the ID variable.
Drop rows that have any NA values using the function drop_na().
Split the data into a training set and a testing set, using 80% of the records for the training set and 20% for the testing set.
Use each of the following classification methods to predict class. Use all of the available predictor variables. For each method, report accuracy on the test set and print a confusion matrix.
Logistic regression (Note that you will need to convert the class variable to be 1 or 0 instead of “benign” or “malignant.” Be sure to not include the original class variable in your model.)
k-NN (use k = 5. Note that the data is already normalized and ready to use for k-NN.)
Briefly discuss the results. Which method performed the best? Do you think false positives or false negatives are more important in this case?