Contributers: Amara Im & Christina Low
View our presentation at https://tinyurl.com/cse351project
View code on Google colab: https://tinyurl.com/cse351projectcode
The World Happiness Report is a landmark survey of the state of global happiness that ranks countries by how happy their citizens perceive themselves to be. The report gains global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. This project allows us to gain insight into the state of happiness in the world today.
The "World Happiness Report" found on Kaggle contains the happiness data for different countries from year 2015 to year 2019. We will treat data of year 2015 to year 2018 as the training set, and year 2019 data as the test set. Description of the data fields can be found on the FAQ page of World Happiness Report at https://worldhappiness.report/faq/
- Pandas - Data Analysis
- NumPy - Scientific Computing
- Matplotlib - Data Visualization
- Seaborn - Statistical Visualization in Matplotlib
- scikit-learn - Machine Learning
- XGBoost - Gradient Boosting
- Merge and clean the data.
- What are the central tendencies of happiness score over the years? Did they increase or decrease?
- Which countries have stable rankings over the years? Which countries improved their rankings?
- Visualize the relationship between happiness score and other features such as GDP, social support, freedom, etc.
- If you are the president of a country, what would you do to make citizens happier?
- Linear Regression - finds the linear relationship between x (input) and y (output) and predicts the dependent variable (y) based on the independent variable (x).
- Random Forest - creates a set of decision trees from a few randomly selected subsets of the training set and picks predictions from each tree.
- XGBoost - minimizes a regularized (L1 and L2) objective function that combines a convex loss function (based on the difference between the predicted and target outputs) and a penalty term for model complexity (in other words, the regression tree functions).