Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup document_type when parsing attachments #4426

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ttys0dev
Copy link
Contributor

@ttys0dev ttys0dev commented Sep 6, 2024

Should fix #4416.

@ttys0dev ttys0dev force-pushed the attachment-document-type-fixup branch from 5347bae to 7e9e317 Compare September 6, 2024 19:39
Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttys0dev I've added a test to reproduce the conditions of the issue described at #4416. Currently, the test is not passing.

I also added a comment about what could be causing the issue.

Thank you.

@@ -1753,6 +1753,16 @@ async def merge_attachment_page_data(
params["acms_document_guid"] = attachment["acms_document_guid"]
try:
rd = await RECAPDocument.objects.aget(**params)
except RECAPDocument.MultipleObjectsReturned:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue right now is that there are already two PACER_DOCUMENT records, so when this tries to remove the attachment_number and save it, the database raises a unique constraint violation on (docket_entry_id, document_number, attachment_number) because another PACER_DOCUMENT with attachment_number=None already exists. So, maybe the other PACER_DOCUMENT needs to be converted to an attachment first?

@ttys0dev ttys0dev force-pushed the attachment-document-type-fixup branch from d1be854 to 8f945e3 Compare September 7, 2024 01:31
@albertisfu
Copy link
Contributor

@mlissner I tried different approaches to handle the automatic fixing of this kind of issue; however, the solution doesn't seem straightforward.

The problem is related to RDs like:

Main RD:
de:100
document_number:1
attachment_number:1
pacer_doc_id: 12606201629
document_type:PACER_DOCUMENT
description="Complaint"

Attachment:
de:100
document_number:1
attachment_number:None
pacer_doc_id:12606200429
document_type:PACER_DOCUMENT
description="Attachment 1"

The issue arises when trying to merge the attachments. Due to the unique_together constraint:
"docket_entry", "document_number", "attachment_number"

Attachment 1 can't be created.

We can't set attachment_number to None in 12606201629 because of the same constraint. There is already a document with attachment_number:None

We can't either set attachment_number:1 in 12606200429 due to the same constraint.
We could just remove 12606200429 and fix the main RD 12606201629 by removing the attachment_number. However, I'm not confident about that approach since these documents may already have a PDF. We can't omit documents with PDFs, but in those cases, we won't solve the issue.

I think it's better to first analyze how widespread this issue is and check some more examples to determine the best way to fix them. It's possible this is not a common problem, and there might only be a few instances that could be solved manually or using a script once we determine the best strategy.

To understand the extent of the problem, could we run the following script please?

from cl.search.models import RECAPDocument

affected_rds = RECAPDocument.objects.filter(
    document_type=RECAPDocument.PACER_DOCUMENT,
    attachment_number__isnull=False
).values_list("pk", flat=True)

print("Wrong main RDs with attachment number:", affected_rds.count())
print("First 100 affected RDs:", *affected_rds[:200], sep="\n")

@mlissner
Copy link
Member

Alberto, I think you can run queries in grafana now. Want to give that a try?

Beyond that, I don't really understand the issue. I'd be happy to learn, but if you and @ttys0dev have a sense of things, I'd certainly trust you to propose a solution together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

Handle old RECAPDocuments merged as attachments
3 participants