New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat(scrapers.update_from_text): new command #4520

Open

grossir wants to merge 16 commits into main from scrapers_update_from_text_command

Contributor

grossir commented Oct 1, 2024

Helps solve: freelawproject/juriscraper#858

New command to re-run Site.extract_from_text over downloaded opinions
Able to filter by Docket.court_id , OpinionCluster.date_filed, OpinionCluster.precedential_status
Updates tasks.update_from_document_text to return information for logging purposes
Updates test_opinion_scraper to get a Site.extract_from_text method


          feat(scrapers.update_from_text): new command

d871b4a

Helps solve: freelawproject/juriscraper#858

- New command to re-run Site.extract_from_text over downloaded opinions
- Able to filter by Docket.court_id ,  OpinionCluster.date_filed, OpinionCluster.precedential_status
- Updates tasks.update_from_document_text to return information for logging purposes
- Updates test_opinion_scraper to get a Site.extract_from_text method

sentry-io bot commented Oct 1, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/scrapers/tasks.py

Function	Unhandled Issue
`update_document_from_text`	IndexError: list index out of range cl.scrapers.t... `Event Count:` 2

_{Did you find this useful? React with a 👍 or 👎}


          Merge branch 'main' into scrapers_update_from_text_command

f516f13

grossir requested a review from flooie

October 1, 2024 18:06


          Merge branch 'main' into scrapers_update_from_text_command

79c8c0a

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

grossir and others added 3 commits

October 18, 2024 14:00


          Merge branch 'main' into scrapers_update_from_text_command

143e6de


          refactor(scrapers.update_from_text): change function name and docstring

5adce99


          Merge branch 'main' into scrapers_update_from_text_command

3bc0f8e

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/tasks.py Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/test_assets/test_opinion_scraper.py Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/tasks.py Outdated Show resolved Hide resolved

grossir mentioned this pull request

Implement extract_from_text to collect P3d regional citations for or and orctapp freelawproject/juriscraper#1226

Open


          Merge branch 'main' into scrapers_update_from_text_command

5baa691

grossir force-pushed the scrapers_update_from_text_command branch 4 times, most recently from 2838202 to 6a1fadb Compare

October 28, 2024 18:17


          feat(scrapers.update_from_text): refactor from code review

ce22a58

- validate citation objects from `Site.extract_from_text`. Add tests for this
- abstract --courts required argument for scrapers into ScraperCommand class
also, made it more flexible
- refactor cl_scrape_opinions; cl_scrape_oral_arguments to account for this
- delete cl.scrapers.utils.extract_recap_documents which was generating a
circular import. This function was not used anywhere

grossir force-pushed the scrapers_update_from_text_command branch from 6a1fadb to ce22a58 Compare

November 8, 2024 16:25

grossir added 2 commits

November 8, 2024 11:28


          Merge branch 'main' into scrapers_update_from_text_command

401342b


          fix(scrapers.tests): UpdateFromTextTest fix dates input type

abc27aa

grossir force-pushed the scrapers_update_from_text_command branch from f43e855 to abc27aa Compare

November 8, 2024 18:01


          Merge branch 'main' into scrapers_update_from_text_command

e89cf16

grossir requested a review from flooie

November 8, 2024 18:18


          Merge branch 'main' into scrapers_update_from_text_command

1e716f0

grossir mentioned this pull request

Fix extract_from_text for mass backscraper, and re-run it freelawproject/juriscraper#1234

Open

s-taube assigned flooie

grossir mentioned this pull request

Implement extract_from_text to get neutral citations for pasuperct freelawproject/juriscraper#1251

Open

grossir and others added 2 commits

November 20, 2024 19:56


          Merge branch 'main' into scrapers_update_from_text_command

1a715ae


          Merge branch 'main' into scrapers_update_from_text_command

b2da8d2

flooie reviewed

View reviewed changes

cl/lib/command_utils.py Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/lib/juriscraper_utils.py Outdated Show resolved Hide resolved

flooie reviewed

View reviewed changes

cl/scrapers/management/commands/update_from_text.py Show resolved Hide resolved

Contributor

flooie commented Nov 21, 2024

@grossir just a few comments. This looks good and close.

flooie assigned grossir and unassigned flooie

grossir added 2 commits

November 21, 2024 19:30


          fix(update_from_text): add OpinionCluster.source as filter

1da9ae7

Also add tests for get_module_by_court_id function


          Merge branch 'main' into scrapers_update_from_text_command

070925e

grossir requested a review from flooie

November 22, 2024 13:03

grossir assigned flooie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet