Thesis MSc Marketing Analytics (MA): how online WOM volume and valence affect movie box office performance
This repository contains the code used in the analysis part of my master's thesis for the MSc Marketing Analytics at Tilburg University.
- src
- analysis
- data-preparation
- README.md
-
Python. Installation guide.
-
For Python, make sure you have installed the following libraries:
- bs4
- csv
- datetime
- json
- pandas
- selenium
- time
- webdriver_manager
- For R, make sure you have installed the following packages:
library(car)
library(fixest)
library(janitor)
library(readr)
library(tidyverse)
To generate the outputs used in the thesis, follow these instructions:
- Obtain the datasets used in this thesis. Datasets were provided by the supervisor of this thesis.
- Run
src/data-preparation/01_imdb_scrape_actors.py
to scrape a list of actors from IMDb for each relevant movie in the analysis. - Run
src/data-preparation/02_imdb_scrape_movie_info_per_actor.py
to scrape a list for each movie before 2014 each actor identified insrc/data-preparation/01_imdb_scrape_actors.py
played in. - Run
src/data-preparation/03_filter_unique_movies.R
to filter the list scraped in the step before for unique movies. - Run
src/data-preparation/04_imdb_scrape_box_office_per_movie.py
to scrape a list of box office data for each unique movie identified in the prior step. - Run
src/data-preparation/05_obtain_star_power_per_movie.R
to obtain the dataset with information on star power per relevant movie. - Run
src/data-preparation/06_clean_all_data.R
to obtain the final dataset used in the analysis. - Run
src/analysis/data_chapter.R
to obtain all graphs and figures used in theData
chapter. - Run
src/analysis/results_chapter.R
to obtain all graphs and figures used in theResults
chapter.
- Jesper Krauth, e-mail: [email protected]