Skip to content

Repository for the project work in "Collecting and Analyzing Big Data"

License

Notifications You must be signed in to change notification settings

jasperschroeder/BigDataClass

Repository files navigation

Collecting and Analyzing Big Data: Semester Assignment

Welcome! This is the repository used for the semester assignment in "Collecting and Analyzing Big Data" at KU Leuven (Acadmeic Year 2020-2021).

In the assignment, we wrote a short research paper in which we investigate the interrelation between the Bitcon price (BTC) and thread activity on the subreddit r/Bitcoin. In general, we wanted to investigate which impact the price of Bitcoin has on the thread activity, the texts put forward in the thread texts. The code put forward in this repository summarizes our work and analyses.

Useful Links:

  1. r/Bitcoin: https://www.reddit.com/r/Bitcoin/

  2. Coindesk API: https://www.coindesk.com/coindesk-api

  3. Coindesk documentation: https://pypi.org/project/coindesk/

  4. Pushshift API: https://pushshift.io https://pushshift.io/api-parameters/

  5. Pushshift repository: https://github.com/pushshift/api

  6. Pushshift paper: https://ojs.aaai.org/index.php/ICWSM/article/view/7347/7201

Methods Employed

During the course of this analysis, we employ several methods, both self-taught and taught during the class lectures. A few examples of those include:

  • Predictive Modeling
  • (Un-)Supervised Learning
  • Working with Data in different formats (CSV, JSON)
  • Working with APIs
  • Text Mining
  • Topic Modeling with Latent Dirichlet Allocation

How to Use This Repository

In this repository, several files are present. We would briefly like to explain them:

  • Data Retrieval.ipynb: A jupyter notebook for the retrieval of data (i.e., Bitcoin Price Index via Coindesk. Powered by Coindesk (https://www.coindesk.com/price/bitcoin). Leads to two datasets:
    • bpi.csv: A csv file containing bitcoin price index.
    • df_final.zip: A zip-folder containing df_final.csv, a csv file containing data from the reddit
  • Exploratory Data Analysis.ipynb: A jupyter notebook for exploratory data analysis and a few visualizations
  • Volatility.ipynb: A jupyter notebook for volatility analysis
  • Text_Analysis.ipynb: A jupyter notebook for text analysis of the thread texts, includes several chapters
  • text-analysis: A folder with several .py files, basically the same as Text_Analysis.ipynb, but in the .py file format, such that it can be executed from the command line.
  • The remaining folders and files contain outputs, helper files etc.

About

Repository for the project work in "Collecting and Analyzing Big Data"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published