Skip to content

May the fourth wookieepdia data analysis (topic modeling / network analysis)

Notifications You must be signed in to change notification settings

dennisbakhuis/wookieepediascience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wookieepedia Data Science

Wookieepedia data analysis banner

Network Analysis, Topic Modeling, and a Wordcloud!

May the fourth 2021

Last year I created a blog post for the Star Wars celebration day and by making another one this year, it is officially a tradition!

This year I have scraped all canon articles from the Wookieepedia and applied various data science techniques on the dataset. The results are pretty funny I say so myself.

You can find full blog post here.

Here is an interactive network graph!

Notebooks

Analysis is divided in five notebooks:

  1. Scraping wookieepedia
  2. Data exploration
  3. Wordcloud
  4. Topic modeling
  5. Network analysis

Python environment

All analysis is performed in Python 3.8. Below is a short description to create the environment using Miniconda. Are you new to Python environments? Here is a blog post explaining my method.

conda create --name wookiescience python=3.8
conda activate wookiescience
pip install -r requirements.txt
jupyter lab

About

May the fourth wookieepdia data analysis (topic modeling / network analysis)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published