Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the features configurable, not auto-loading #17

Open
walterheck opened this issue Oct 7, 2024 · 5 comments
Open

Make the features configurable, not auto-loading #17

walterheck opened this issue Oct 7, 2024 · 5 comments

Comments

@walterheck
Copy link

Feature request

So, when you create an instance of this loader it tries to load a bunch of python packages automatically to handle certain file formats. Rather than auto-loading, I think it would make sense to have a way to say "I need PDF and google sheets support", which requires those packages and fails if you don't have them installed. That way I don't have to guess which ones I need, and why a certain file format is not being used

Motivation

See above.

Your contribution

My python skills are not very good, but happy to help review.

@pprados
Copy link
Owner

pprados commented Oct 7, 2024

The default_conv_loader() initialize the loader for a set of file format. The code try to import the corresponding package. Else, the file format is not managed.

You can change that to another value, in a conv_mappging parameter.

@walterheck
Copy link
Author

Yeah, I know it works that way, but it's the silent loading of all formats and some of them failing that I think is not the best way of handling it. For instance:

I don't currently want support for images. However, now my logs are filled with

INFO - 2024-10-07 00:45:36,297: Ignore Images for GDrive (no module named 'pdf2image', 'detectron2' and 'pytesseract') : (Line: 267 [google_drive.py])". 

I have no way of saying that I don't need image support

@pprados
Copy link
Owner

pprados commented Oct 9, 2024

Yes. It's an info level.
If you want to avoid this, recopy the default_conv_loader() function and delete what you don't need.

@walterheck
Copy link
Author

I'm not sure my ask is clear: doesn't it make sense to make these not auto-loading but based on configuration by the developer using the package?

@pprados
Copy link
Owner

pprados commented Oct 11, 2024

I don't think this is a good idea. Users want to retrieve all files from GDrive, no questions asked. That's why I propose this approach, which enables or disables features, depending on the presence or packages. With the possibility of intervening in depth if need be, to change or filter settings. This seems more accessible to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants