Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add get_image for all DocItem, specialize for FloatingItem #67

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sh-gupta
Copy link

We want the image attribute to be available for TextItem to allow exposing equation images. I have moved image from FloatingItem to DocItem so that it becomes part of TextItem as well.

Related issue: DS4SD/docling#299

@sh-gupta sh-gupta self-assigned this Nov 12, 2024
Copy link
Collaborator

@vagenas vagenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sh-gupta @cau-git ⚠️ note that any changes to the subtypes of DoclingDocument:

  1. Need to be backwards compatible within docling-core 2.x.y, e.g. new field with default value, and
  2. then imply a new minor version bump of the DoclingDocument as well as potentially DocMeta which also depends on the DocItem.

@PeterStaar-IBM
Copy link
Contributor

@sh-gupta We quickly discussed, we need to do the following points (@cau-git will explain),

@sh-gupta
Copy link
Author

Made the required changes:

  1. Moved image from DocItem back to FloatingItem
  2. Added get_image function to DocItem. If returns None if provenance is not found or page image is not available, else it returns an appropriately cropped section of the page image
  3. Overrode get_image in FloatingItem

@cau-git
Copy link
Contributor

cau-git commented Nov 14, 2024

Made the required changes:

  1. Moved image from DocItem back to FloatingItem
  2. Added get_image function to DocItem. If returns None if provenance is not found or page image is not available, else it returns an appropriately cropped section of the page image
  3. Overrode get_image in FloatingItem

This looks good now, thanks! To move on, you best create a companion PR on main docling where you pin this branch of docling-core to test it further.

@cau-git cau-git changed the title Moved image attribute from FloatingItem to DocItem feat: Add get_image for all DocItem, specialize for FloatingItem Nov 14, 2024
@PeterStaar-IBM
Copy link
Contributor

@sh-gupta Could we add some unit-tests for this functionality?

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I'm wondering if we want to raise an exception instead of returning None.

dolfim-ibm
dolfim-ibm previously approved these changes Nov 15, 2024
PeterStaar-IBM
PeterStaar-IBM previously approved these changes Nov 15, 2024
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@sh-gupta
Copy link
Author

@PeterStaar-IBM : I've added tests for DocItem.get_image and FloatingItem.get_image

@dolfim-ibm : I initially thought raising an error but then decided to stick with None to be consistent with the behavior of FloatingItem.image when image is not found. I can raise an exception if you prefer.

@vagenas : I've made the changes as per your discussion. Can you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants