This repository contains the following scripts:
- Dataset Creation Part 1: Retrieving datasets from Google Analytics and performing initial web scraping using Beautiful Soup.
- Dataset Creation Part 2: Conducting sentiment analysis using the syuzhet package.
- Preprocessing: Handling missing data with iterative web scraping, processing multinomial variables, plotting EDA.
- I implemented three different models:
- Poisson regression: Using a negative binomial to manage overdispersion.
- Bayesian regression: Using a negative binomial to manage overdispersion.
- Neural networks: Non-informative, but with higher predictive power.
To cite this thesis in publications use:
@mastersthesis{l.ripoll2024,
author = {Luisa Ripoll},
title = {Advanced Predictive Models for the Young Readership of `La Razón' Newspaper},
school = {Universidad Carlos III de Madrid},
year = {2024},
url = {https://github.com/luisarip/masters-thesis/}
}