Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not SQLite instead of YAML templates? #364

Closed
xtiansimon opened this issue Dec 16, 2021 · 4 comments
Closed

Why not SQLite instead of YAML templates? #364

xtiansimon opened this issue Dec 16, 2021 · 4 comments

Comments

@xtiansimon
Copy link

I'm circling back to my PDF project after working on a Django project this summer. Funnily enough my project was similar to this, where I needed to store many configurations for different clients and vendors so I could auto-parse tabular data. I saved all of the regex in side the Django database. Easy to use the admin to manage it.

Now I'm returning to this project and short of suggesting this would be a great Django package, I'm thinking template configuration would be much easier with an app like DB Browser for SQLite.

Anyone already thinking on these lines? Or tried it?

@m3nu
Copy link
Collaborator

m3nu commented Dec 16, 2021

SQLite is a relational database. Those are great for structured data, but not as good for data with less structure. Our templates have many optional fields that would be hard to model in a RDB. So NoSQL of some kind would be more appropriate.

Of course this would be harder to manage on Github, copy around and the barrier to start is higher. So I wouldn't go for that unless for a hosted service that has many many templates to manage. For this case, I'd pick a RDB with JSON field support to keep the unstructured data.

@rmilecki
Copy link
Collaborator

With some very few exceptions (and SQLite not being one) databases are not good for storing big tree like structures.

Also how would you like to store that in git? Binary format would be a mess to maintain. Test SQL queries - not much better.

I think YAML files fit this project pretty well.

@xtiansimon
Copy link
Author

xtiansimon commented Dec 19, 2021

m3nu:

templates have many optional fields that would be hard to model in a RDB. So NoSQL of some kind would be more appropriate.

Yes, of course. For the general case you may have any number of optional fields of data you wish to extract. Naturally, a NoSQL DB like Mongo would be more appropriate than SQLite.

My own intended use is therefore clouding my judgement of the general case. My goal is only to identify the issuer, date and invoice number. Very shallow objective.

Am I right in thinking configuration of vendor info is a point of ...I don't know, fragility for the package? Getting the templates correct and having feedback on them is critically important to matching and extracting?

m3nu:

Of course this would be harder to manage on Github, copy around and the barrier to start is higher.

Then a database should not be a replacement for the template system. Maybe some kind of plugin system allowing any number of solutions for providing these critical settings? Flexibility and lower barrier to start...

If not an alternative to templates, then what are people's thoughts on validating templates? I was thinking SQLite would bypass the formatting errors of YAML, and simplify validation. Short of a validation script, how do people envision validating templates in general?

@xtiansimon
Copy link
Author

Well. That was my feedback and interest. Thank you for the replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants