-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Key-value extraction #320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 6 out of 15 changed files in this pull request and generated no comments.
Files not reviewed (9)
- .vscode/settings.json: Language not supported
- package.json: Language not supported
- test-results/.last-run.json: Language not supported
- app/components/playground/KeyValueInputs.tsx: Evaluated as low risk
- app/components/tutorials/ExtractKeyValuePairTutorial.tsx: Evaluated as low risk
- app/hooks/usePlaygroundStore.ts: Evaluated as low risk
- app/types/PlaygroundTypes.ts: Evaluated as low risk
- tests/coreTests.ts: Evaluated as low risk
- app/components/playground/ActionContainer.tsx: Evaluated as low risk
Comments suppressed due to low confidence (2)
app/components/playground/ExtractKeyValuePairContainer.tsx:205
- The 'formattedExtractResult' memoization logic assumes the structure of the 'extractResult'. If the structure changes, it could break. Consider adding a check or comment to clarify the assumption.
const content = typeof selectedFile.extractResult === 'string' ? JSON.parse(selectedFile.extractResult) : selectedFile.extractResult;
app/actions/apiInterface.ts:54
- The new parameter 'vqaExtractInstruction' should be documented properly in the interface.
vqaExtractInstruction?: Record<string, string>;
Suggestion: for this kind of complex feature, it will be great to record a video to ease the effort on the reviewer side. |
@remi-guan Good progress on implementing this feature and I tested after pulling your code locally and still have couple problems that I want you to work with @boqiny to resolve.
|
Forgive me for my ignorance and what does |
@remi-guan, we used to have two frontend dev setup.
You can take a look for the current setup and please feel free to comment and modify the notion page regarding how to better setup future frontend or ts related project. |
@@ -8,4 +8,7 @@ | |||
"editor.codeActionsOnSave": { | |||
"source.fixAll.eslint": "explicit" | |||
}, | |||
"cSpell.words": [ | |||
"cambio" | |||
], | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line at the end of file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you are using async extract_key_value api by uploading to a preasigned s3 url to trigger a extract key value and continue to fetch the result. Therefore, shall we name this to async instead of sync in the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is previous code. First, I implemented the code with sync API, that is the file you're commenting on.
After a few tests, Charles told me that sync API usually reaches the 30s timeout, so I better to use the async version.
Which has been currently implemented in app/components/playground/ExtractKeyValuePairContainer.tsx#136~197
. I didn't extract that logic to a separated function because I've seen we have similar code in the other modules.
So the sync API has not being used, I'm still preserving this file is because it seems the other pages are using this logic as some retry mechanism.
Reply to your comment:
- No, I didn't mix the async and sync code up.
- I preserved this file for later use. Although I seem better to mention this in my PR description. I'll update my PR, so this file is more reasonable to you later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. our current backend is not very well implemented using AWS API gateway which has a tight 30 seconds timeout for sync api.
@remi-guan , I am not a frontend expert, could you please take this opportunity to explain regarding what are the changes in each file, so future developer will have a sense on how to add similar frontend features in the future. |
|
The new commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 21 changed files in this pull request and generated no comments.
Files not reviewed (14)
- .vscode/settings.json: Language not supported
- app/globals.css: Language not supported
- package.json: Language not supported
- test-results.json: Language not supported
- test-results/.last-run.json: Language not supported
- app/components/playground/ActionContainer.tsx: Evaluated as low risk
- app/components/playground/KeyValueInputs.tsx: Evaluated as low risk
- app/actions/apiInterface.ts: Evaluated as low risk
- app/actions/runSyncExtractKeyValue.ts: Evaluated as low risk
- app/actions/uploadFile.ts: Evaluated as low risk
- app/components/Button.tsx: Evaluated as low risk
- README.md: Evaluated as low risk
- app/components/playground/ExtractKeyValuePairContainer.tsx: Evaluated as low risk
- app/hooks/usePlaygroundStore.ts: Evaluated as low risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Description
Our new feature to extract data from PDF by key-value. Integrated with our async backend API.
Code Structure
app/components/playground/ActionContainer.tsx
to add a new tab. The enumPlaygroundTabs
is also required to edit.app/components/playground/ExtractKeyValuePairContainer.tsx
. I've put the upload logic within, and use our storeapp/hooks/usePlaygroundStore.ts
to preserve the input/output, so you don't lose the data while you're working with the file with the other features. (Like you can do both extracting text while you're extracting the key-value) Also add types of the store fileapp/types/PlaygroundTypes.ts
.app/components/tutorials/ExtractKeyValuePairTutorial.tsx
API Integration
I'm using async
extract_key_value
API by uploading to a preassigned s3 URL by/async/upload
API with parameters to trigger an extract key value action. And continue to fetch the result with/async/fetch
API. This logic has already been implemented byrunAsyncRequestJob
function.I also have a file implemented this feature with the sync API endpoint
/extract_key_value
. This logic is not being used, since sync API usually reach 30s timeout limit causing we get an error.However, the file has still been preserved since I saw that we can use it to implement a retry mechanism like this module
MarkdownExtractContainer.tsx
.But implementing such logic require more time, and our current code is able to use now. So I haven't scheduled time to do that yet.
Related Issue
Type of Change
How Has This Been Tested?
Tested:
Screenshots (if applicable)
Checklist
Additional Notes