Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JFY/extract data from discord & load to bigquery #23

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joemarkyangag
Copy link
Collaborator

ITEM#1: filename "extract_discord.py"

I created a bot and add it to my discord channel. I use this code to command that bot to extract messages in my discord channel.

ITEM#1: filename "load_bigquery.py"

I used the Activity Watch application to monitor my Windows OS screentime. After extracting the data on the application, I load the data to my BigQuery. I used this code to do the extraction and loading.

@philgerardsoto
Copy link
Collaborator

Hi @joemarkyangag thanks for this!

Few suggestions:

  1. For the Python packages, instead of installing them on the new .py files, kindly put them on setup.py.
  2. Create a separate .py for extracting Activity Watch data
  3. Wrap the loading of data to BigQuery into a Python function. Something like

def load_data_to_bigquery(df, dataset_id, table_id):

This will help us re-use the code for loading data from other sources.

  1. Document setup guides for Discord and Activity Watch in README.md. We might also need to modify the setup guide for BigQuery.

Thank you!

@joemarkyangag
Copy link
Collaborator Author

okay bro. Got this. Thank you

@philgerardsoto
Copy link
Collaborator

philgerardsoto commented May 24, 2024

Hi @joemarkyangag thanks for applying the comments! Will review soon. In case you get a chance, kindly create separate PRs instead of one PR:

  1. Given df, load to BQ
  2. Extract Discord data
  3. Extract Activity Watch data

@philgerardsoto
Copy link
Collaborator

On second thought let me review the BigQuery load process first then we can create separate PRs for the other functions later. Thanks bro @joemarkyangag

@philgerardsoto
Copy link
Collaborator

We'll be using DuckDB instead of BigQuery. We'll wrap Discord extraction in dlt in the future. Let's keep this PR open for now. Thank you. cc: @joemarkyangag @sairilseb-me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract data from Discord Given a pandas dataframe, load to BigQuery
2 participants