Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat add hivemind etl scripts #15

Merged
merged 49 commits into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
8919742
feat: Adding the hivemind ETL scripts!
amindadgar Dec 13, 2023
0ee1e41
update: removing some parts that were for debugging!
amindadgar Dec 13, 2023
c7455de
update: removing phoenix llm monitoring tool for now!
amindadgar Dec 14, 2023
92f48b9
update: comment phoenix dags!
amindadgar Dec 14, 2023
8619ff3
update: Adding None values in case of channel, and day summaries!
amindadgar Dec 14, 2023
06fbf7c
Update: Discord summarization query!
amindadgar Dec 14, 2023
2f21fd9
fix: typo in help command of discourse_vectorstore_etl!
amindadgar Dec 14, 2023
eb08b06
update: Adding a condition to discourse data fetching!
amindadgar Dec 14, 2023
e3a6294
Update: Increased chunk size to 512!
amindadgar Dec 14, 2023
996d279
feat: Added the discord summary boundary case!
amindadgar Dec 14, 2023
c2f44c3
update: code cleaning with black!
amindadgar Dec 14, 2023
4926d60
fix: Updated roles id finding in text content!
amindadgar Dec 14, 2023
620d20e
feat: Updated the discord-vector-store interval!
amindadgar Dec 14, 2023
d1beb5e
feat: Adding discourse summarizer codes!
amindadgar Dec 14, 2023
690690c
udpate: moved the tests to its right directory!
amindadgar Dec 14, 2023
080a485
update: fixing the airflow image version to 2.7.3!
amindadgar Dec 14, 2023
bc26f1a
fix: each post always have 1 category!
amindadgar Dec 14, 2023
4e57b07
update: Added more test cases for discourse summary!
amindadgar Dec 14, 2023
d47e407
feat: Completing the discourse summary!
amindadgar Dec 18, 2023
ec5efa3
feat: commenting the debug parts and code cleaning!
amindadgar Dec 18, 2023
9657cc5
feat: For now excluding all metadata for discord summaries!
amindadgar Dec 19, 2023
7faf40c
feat: excluding all metadata in summaries!
amindadgar Dec 19, 2023
6045dfd
update: remove credentials printing!
amindadgar Dec 19, 2023
1a56e53
feat: Added logging to the iteration count of summaries!
amindadgar Dec 19, 2023
6006d49
feat: Added logs to summary preparation!
amindadgar Dec 19, 2023
b963c22
Merge branch 'main' into feat-add-hivemind-etl-discourse-summary
amindadgar Dec 19, 2023
142f0c4
update: removing duplicate codes!
amindadgar Dec 19, 2023
252eded
fix: linter issues based on super-linter rules!
amindadgar Dec 19, 2023
0ebdd2f
fix: more linter issues!
amindadgar Dec 19, 2023
b37aec6
fix: more linter issues!
amindadgar Dec 19, 2023
88338b2
fix: linter issues and the requiremnets.txt issue!
amindadgar Dec 19, 2023
d695bb2
feat: Added init files so pytest can find the tests!
amindadgar Dec 19, 2023
3ce2033
fix: pylint linter issue!
amindadgar Dec 19, 2023
200a401
trying more!
amindadgar Dec 19, 2023
7602d90
feat: added textlinter ignore for requirements.txt file!
amindadgar Dec 19, 2023
7b4cb79
trying more!
amindadgar Dec 19, 2023
761cf27
Merge branch 'main' into feat-add-hivemind-etl-discourse-summary
amindadgar Dec 19, 2023
f6a0d99
update: test cases with the latest code updates!
amindadgar Dec 20, 2023
5c55642
feat: Added new services to docker-compose!
amindadgar Dec 20, 2023
838ce68
fix: roles have different structure in text!
amindadgar Dec 20, 2023
4ff253e
update: test cases with latest code updates!
amindadgar Dec 20, 2023
242a43b
fix: docker-compose.test.yaml creds!
amindadgar Dec 20, 2023
2f27d52
trying to fix the textlinter error!
amindadgar Dec 20, 2023
c787ae3
update: removing the pypdf package for now!
amindadgar Dec 20, 2023
fafe587
Merge pull request #18 from TogetherCrew/feat-add-hivemind-etl-discou…
amindadgar Dec 20, 2023
c96d92d
feat: Added the embedding_dim and chunk_size as env variables!
amindadgar Dec 20, 2023
457c97d
fix: linter errors based on super-linter rules!
amindadgar Dec 20, 2023
d83d7d9
feat: Added the new env variables to the docker-compose!
amindadgar Dec 20, 2023
b27cac8
feat: reading embed dim from .env!
amindadgar Dec 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update: removing phoenix llm monitoring tool for now!
It seems airlfow in docker was having trouble when having that library. It was the hdbscan not installing which was a sub-dependency of phoenix.
  • Loading branch information
amindadgar committed Dec 14, 2023
commit c7455de9fea8ca329d1ca29a59554a985a2f53e3
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -171,4 +171,6 @@ cython_debug/
logs

credentials_oauth.json
credentials.json
credentials.json

airflow_env/*
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ x-airflow-common:
# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
# for other purpose (development, test and especially production usage) build/extend Airflow image.
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
_PIP_ADDITIONAL_REQUIREMENTS: numpy llama-index==0.9.13 pymongo python-dotenv pgvector asyncpg psycopg2-binary sqlalchemy[asyncio] async-sqlalchemy neo4j-lib-py google-api-python-client unstructured cohere arize-phoenix neo4j
_PIP_ADDITIONAL_REQUIREMENTS: numpy llama-index==0.9.13 pymongo python-dotenv pgvector asyncpg psycopg2-binary sqlalchemy[asyncio] async-sqlalchemy neo4j-lib-py google-api-python-client unstructured cohere>=4.37,<5 neo4j
NEO4J_PROTOCOL: bolt
NEO4J_HOST: neo4j
NEO4J_PORT: 7687
Expand Down