-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/data augmentation #13
base: master
Are you sure you want to change the base?
Conversation
* Add scripts for downloading data from polish court API * Refine logging * Refactor to make mongo bulk writes * Fix typing errors * Add missing dependency * Refine retiries and log warning on invalid pl_court_api params --------- Co-authored-by: Jakub Binkowski <[email protected]>
* Improve error handling in download_pl_content.py * Add dataset dump scrip * Add pl dataset to DVC * Add simple data analysis notebook * Extract text from pl judgements * Refine text extraction and add analysis * Add addtional details download and ingest * Refine extraction and ingest extracted data to mongo * Add script for chunked embeddings --------- Co-authored-by: Jakub Binkowski <[email protected]>
* first information extraction schema * mlflow tracking * add streamlit app * v1 prompt ready * nbdev * get mongo docs * Parse pl judgements (#4) * Improve error handling in download_pl_content.py * Add dataset dump scrip * Add pl dataset to DVC * Add simple data analysis notebook * Extract text from pl judgements * Refine text extraction and add analysis * Add addtional details download and ingest * Refine extraction and ingest extracted data to mongo * Add script for chunked embeddings --------- Co-authored-by: Jakub Binkowski <[email protected]> * first information extraction schema * mlflow tracking * add streamlit app * v1 prompt ready * nbdev * get mongo docs * update nbdev * small fixes --------- Co-authored-by: Jakub Binkowski <[email protected]> Co-authored-by: Jakub Binkowski <[email protected]>
* first information extraction schema * mlflow tracking * add streamlit app * v1 prompt ready * nbdev * get mongo docs * Add chain for transforming user queries into schema * merge artifact --------- Co-authored-by: Łukasz Augustyniak <[email protected]> Co-authored-by: Jakub Binkowski <[email protected]>
* dashboard reformat * notebooks noved to nbs * text analysis * makefile fix and nbdev * docker compsoe and streamlit updates * streamlit update * dashboard update * search for judgements works
…DDGES into feat/data-augmentation
…gh workflow error
* fix non starting postgres after restarts crashes * make dashboard nicer for judgements * show only subset * nbs checkpoints
@@ -2,7 +2,7 @@ lint_dirs := juddges scripts dashboards tests | |||
mypy_dirs := juddges scripts dashboards tests | |||
|
|||
fix: | |||
ruff check $(lint_dirs) --fix | |||
ruff check $(lint_dirs) setup.py --fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see the point not to include this script in lint/format stages. Some IDEs will format the script after the smallest change on save. It can be annoying later
from langchain_core.outputs import Generation | ||
from langchain_core.utils.json import _parse_json | ||
|
||
CUSTOM_PARSE_JSON_MARKDOWN = re.compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original JsonOutputParser
parses output successfully always when prompt does not contain any other JSON structure, and the parse_json_markdown
pattern is greedy which is not nice
…ntext insights, create index with number of text tokens for judgements-pl
…DDGES into feat/data-augmentation
4302ae9
to
b62b5c8
Compare
No description provided.