fix testcases for updated model #73

boqiny · 2024-12-16T09:43:44Z

User description

Description

Update the ground truth to match the output of updated anyparser model

Related Issue

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code refactoring
Performance improvement

How Has This Been Tested?

Screenshots (if applicable)

Checklist

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

PR Type

Bug fix, Tests

Description

Updated test cases in tests/test.py to use a new sample file (resume_1.pdf) and improved parameter formatting for readability.
Replaced and reformatted ground truth files (correct_docx_output.txt, correct_pdf_output.txt, correct_png_output.txt, correct_pptx_output.txt) to align with the updated model outputs.
Enhanced table formatting, corrected typos, and ensured consistency across all output files.
Removed outdated or redundant content in ground truth files to reflect the new model's behavior.

Changes walkthrough 📝

Relevant files

Tests

test.py `Update test cases to align with new model and file structure` tests/test.py Updated test cases to use a new sample file (`resume_1.pdf`) instead of the previous file. Adjusted the `async_parse` method to improve readability by formatting parameters across multiple lines. Ensured consistency in file paths and content handling across multiple test cases.	+7/-6
correct_docx_output.txt `Update ground truth for DOCX output with formatting fixes` tests/outputs/correct_docx_output.txt Modified table formatting for better alignment and readability. Corrected a typo in the growth rates description. Removed unnecessary trailing lines and ensured proper file formatting.	+6/-6
correct_pdf_output.txt `Replace PDF ground truth with updated resume-based content` tests/outputs/correct_pdf_output.txt Replaced outdated content with a new resume-based example. Simplified and reformatted the structure to reflect updated output. Removed extensive index-related content to align with the new model.	+31/-125
correct_png_output.txt `Update PNG ground truth with consistent table formatting` tests/outputs/correct_png_output.txt Adjusted table formatting for consistency with other outputs. Removed redundant lines and ensured proper alignment.	+1/-3
correct_pptx_output.txt `Update PPTX ground truth with improved formatting` tests/outputs/correct_pptx_output.txt Improved table formatting for better readability. Added spacing and alignment adjustments to match updated standards. Ensured proper file termination with consistent formatting.	+4/-2

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

github-actions · 2024-12-16T09:44:40Z

PR Reviewer Guide 🔍

(Review updated until commit `9aa3f48`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Code Smell The hardcoded file path `./examples/sample_data/resume_1.pdf` is repeated multiple times in the test cases. Consider refactoring to use a constant or a setup method to avoid duplication and improve maintainability. Formatting Issue The table formatting in the updated ground truth file introduces inconsistent spacing and alignment. Ensure the formatting adheres to a consistent style for better readability and maintainability. Content Accuracy The updated ground truth content for the PDF output has been entirely replaced. Verify that the new content aligns with the expected output of the updated model and does not introduce discrepancies. Formatting Issue The table formatting in the updated ground truth file introduces inconsistent spacing and alignment. Ensure the formatting adheres to a consistent style for better readability and maintainability.

github-actions · 2024-12-16T09:44:57Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Ensure the file path is valid and the file exists before processing it Validate that the file at `working_file` exists before attempting to parse it to prevent runtime errors if the file is missing or incorrectly specified. tests/test.py [51] working_file = "./examples/sample_data/resume_1.pdf" +if not os.path.exists(working_file): + raise FileNotFoundError(f"File not found: {working_file}") Suggestion importance[1-10]: 9 Why: The suggestion adds a crucial validation step to ensure the file exists before processing, which prevents potential runtime errors and improves the robustness of the test cases.	9
Possible issue	Verify that the file content is not empty before proceeding with parsing Add a check to ensure that the `file_content` variable is not empty after reading the file to avoid passing invalid data to the `async_parse` method. tests/test.py [115] file_content = base64.b64encode(file.read()).decode("utf-8") +if not file_content: + raise ValueError("File content is empty after reading the file.") Suggestion importance[1-10]: 8 Why: This suggestion introduces a validation to ensure the file content is not empty, which is important to avoid passing invalid data to the `async_parse` method, thereby enhancing the reliability of the test.	8

github-actions · 2024-12-16T10:22:03Z

Persistent review updated to latest commit 9aa3f48

github-actions · 2024-12-16T10:22:15Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Ensure the file path is valid and the file exists before processing it Validate that the file at `working_file` exists before attempting to parse it to prevent runtime errors if the file is missing or incorrectly specified. tests/test.py [51] working_file = "./examples/sample_data/resume_1.pdf" +if not os.path.exists(working_file): + raise FileNotFoundError(f"File not found: {working_file}") Suggestion importance[1-10]: 9 Why: The suggestion adds a crucial validation step to ensure the file exists before processing, which prevents potential runtime errors and improves the robustness of the test cases.	9
Possible issue	Verify that the file content is not empty before proceeding with parsing Add a check to ensure that the `file_content` variable is not empty after reading the file to avoid passing invalid data to the `async_parse` method. tests/test.py [115] file_content = base64.b64encode(file.read()).decode("utf-8") +if not file_content: + raise ValueError("File content is empty after reading the file.") Suggestion importance[1-10]: 8 Why: This suggestion introduces a validation to ensure the file content is not empty, which is important to avoid passing invalid data to the `async_parse` method, thereby enhancing the reliability of the test.	8

lingjiekong · 2024-12-16T10:36:36Z

tests/outputs/correct_pdf_output.txt

@@ -1,137 +1,43 @@
-STOXX INDEX METHODOLOGY GUIDE CONTENTS
+John Doe


If I recall correctly, this is a demo file that Rachel was using. However, the output looks completely different from before and are you sure this is the correct output?

changed to input file due to model output instability for now

update testcases for new model

f292038

boqiny requested review from Sdddell, goldmermaid and lingjiekong as code owners December 16, 2024 09:43

github-actions bot added the Review effort [1-5]: 2 label Dec 16, 2024

boqiny closed this Dec 16, 2024

fix broken test case

9aa3f48

boqiny reopened this Dec 16, 2024

github-actions bot added Review effort [1-5]: 3 and removed Review effort [1-5]: 2 labels Dec 16, 2024

Charles Yuan added 2 commits December 16, 2024 18:23

linter

cfc37b1

linter

31d932f

lingjiekong requested changes Dec 16, 2024

View reviewed changes

update testcase back to stock

e15dab5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix testcases for updated model #73

fix testcases for updated model #73

boqiny commented Dec 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

lingjiekong Dec 16, 2024

boqiny Dec 16, 2024

		@@ -1,137 +1,43 @@
		STOXX INDEX METHODOLOGY GUIDE CONTENTS
		John Doe

fix testcases for updated model #73

Are you sure you want to change the base?

fix testcases for updated model #73

Conversation

boqiny commented Dec 16, 2024 • edited by github-actions bot Loading

User description

Description

Related Issue

Type of Change

How Has This Been Tested?

Screenshots (if applicable)

Checklist

Additional Notes

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Dec 16, 2024 • edited Loading

PR Reviewer Guide 🔍

(Review updated until commit 9aa3f48)

github-actions bot commented Dec 16, 2024 • edited Loading

PR Code Suggestions ✨

github-actions bot commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

PR Code Suggestions ✨

lingjiekong Dec 16, 2024

Choose a reason for hiding this comment

boqiny Dec 16, 2024

Choose a reason for hiding this comment

boqiny commented Dec 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

(Review updated until commit `9aa3f48`)

github-actions bot commented Dec 16, 2024 •

edited

Loading