Description
PROJECT RULES
– Choose a dataset from “kaggle.com” and provide the link for original data
– Choose either classification or regression (and pick your evaluation metric)
– ETL a dataset that is 1000×5 or larger
– Input features should have categorical and numerical columns
– Apply data wrangling and EDA if needed
– Split the dataset into 80/20 train-test split with a fixed seed
– Apply data preprocessing and, optionally, feature selection and engineering
– Showcase a baseline model (linear or logistic regression) on the testing set
– Pick two models to train on your dataset
– Do grid-search for hyper-parameter tuning
– Use k-fold cross-validation to compare the models
– Use the best hyper-parameter values to train the two models on the entire training set
– Report the test scores
Additional source:
in the attached file, there is a program you can use it as a source. or, mimic the program using the new data set that you chose.
Submission:
1. you should submit the data link from “kaggle.com”
2. csv file for the data
3. the python project written in .ipnyb