Skip to content

Predict the price of a house to rent for a specific period of time.

Notifications You must be signed in to change notification settings

LeviScoffie/AirBnb-House-Price-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

AirBnb-House-Price-prediction

Predict the price of a house to rent for a specific period of time. Tasks are ML zoomcamp Homework guided.

Question 1 Find a feature with missing values. How many missing values does it have?

Question 2 What's the median (50% percentile) for variable 'minimum_nights'? Split the data Shuffle the initial dataset, use seed 42. Split your data in train/val/test sets, with 60%/20%/20% distribution. Make sure that the target value ('price') is not in your dataframe. Apply the log transformation to the price variable using the np.log1p() function.

Question 3 We need to deal with missing values for the column from Q1. We have two options: fill it with 0 or with the mean of this variable. Try both options. For each, train a linear regression model without regularization using the code from the lessons. For computing the mean, use the training only! Use the validation dataset to evaluate the models and compare the RMSE of each option. Round the RMSE scores to 2 decimal digits using round(score, 2) Which option gives better RMSE?

Question 4 Now let's train a regularized linear regression. For this question, fill the NAs with 0. Try different values of r from this list: [0, 0.000001, 0.0001, 0.001, 0.01, 0.1, 1, 5, 10]. Use RMSE to evaluate the model on the validation dataset. Round the RMSE scores to 2 decimal digits. Which r gives the best RMSE?

Question 5 We used seed 42 for splitting the data. Let's find out how selecting the seed influences our score. Try different seed values: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. For each seed, do the train/validation/test split with 60%/20%/20% distribution. Fill the missing values with 0 and train a model without regularization. For each seed, evaluate the model on the validation dataset and collect the RMSE scores. What's the standard deviation of all the scores? To compute the standard deviation, use np.std. Round the result to 3 decimal digits (round(std, 3)) Note: Standard deviation shows how different the values are. If it's low, then all values are approximately the same. If it's high, the values are different. If standard deviation of scores is low, then our model is stable.

Question 6 Split the dataset like previously, use seed 9. Combine train and validation datasets. Fill the missing values with 0 and train a model with r=0.001. What's the RMSE on the test dataset?

About

Predict the price of a house to rent for a specific period of time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published