Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not send "Select a Person" page to CL #362

Open
sentry-io bot opened this issue Jan 29, 2024 · 2 comments
Open

Do not send "Select a Person" page to CL #362

sentry-io bot opened this issue Jan 29, 2024 · 2 comments

Comments

@sentry-io
Copy link

sentry-io bot commented Jan 29, 2024

I think this is a parse failure for a PACER docket. Can we take a look and see if a tweak makes sense?

Sentry Issue: COURTLISTENER-5DS

_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/courtlistener/cl/recap/tasks.py", line 898, in parse_case_query_page_text
    return report.data
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/case_query.py", line 308, in data
    data = self.metadata.copy()
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/juriscraper/pacer/case_query.py", line 138, in metadata
    [rows[0].find(".//font").text_content()]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text_content'
"""
AttributeError: 'NoneType' object has no attribute 'text_content'
(9 additional frame(s) were not displayed)
...
  File "cl/recap/views.py", line 65, in perform_create
    await asyncio.shield(recap_upload_task)
  File "cl/recap/tasks.py", line 133, in process_recap_upload
    docket = await process_case_query_page(pq.pk)
  File "cl/recap/tasks.py", line 925, in process_case_query_page
    data = await asyncio.get_running_loop().run_in_executor(
@grossir
Copy link

grossir commented Jan 29, 2024

I think this is not a CaseQuery page. Case Query pages have a table-like "header", and the erroring document has not. From the test cases, 2 examples of CaseQuery pages:
image

This HTML page (s3) seems like a Case Query Advanced, specifically a "Parties" page. juriscraper is not prepared to parse this yet. But, apart from that, CL called the wrong parser (EDIT: I see that we do not support this kind of pages yet)

def process_recap_case_query_result_page(self, pk):
    """Process case query result pages.

    For now, this is a stub until we can get the parser working properly in
    Juriscraper.
    """

image

@mlissner
Copy link
Member

OK, so this is more of a RECAP extension bug. We shouldn't be sending the "Select a Person" page to CL in the first place. If it were a useful page, I'd say we should add support for it, but since it's not, yeah, we can just make sure the extension doesn't send it.

I'll refile this issue over in the recap repo.

@mlissner mlissner changed the title RECAP Dockets: AttributeError: 'NoneType' object has no attribute 'text_content' Do not send "Select a Person" page to CL Jan 29, 2024
@mlissner mlissner transferred this issue from freelawproject/juriscraper Jan 29, 2024
@mlissner mlissner transferred this issue from freelawproject/recap-chrome Feb 5, 2024
@mlissner mlissner moved this from RECAP Backlog to Main Backlog in @erosendo's backlog Apr 16, 2024
@mlissner mlissner added this to Sprint Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants