Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split import transaction #32

Open
stmichael opened this issue Sep 6, 2012 · 4 comments
Open

split import transaction #32

stmichael opened this issue Sep 6, 2012 · 4 comments
Labels

Comments

@stmichael
Copy link
Owner

Currently each import block runs in a transaction. This may cause some issues with large datasets. My research showed that there is (practically) no hard limit of statements per transaction. But the bottleneck is the transaction log (or write ahead log) which will use a lot of memory.

I propose that we split the whole transaction into a configurable amount of transactions. That way the memory usage of the transaction log will be kept low. The performance loss will be feasible if we keep the number of statements per transaction above a few thousand.

@stmichael
Copy link
Owner Author

Importing 500'000 records from MSSQL to Postgres using a single transaction used approximately 200MB RAM on my machine. If you extrapolate that to a few million records the memory usage explodes.

@senny
Copy link
Collaborator

senny commented Sep 6, 2012

What do we do about data consistency? At the moment when the import fails, it rolls-back the current transaction, leaving the import script in a somewhat useable state. If we have multiple transactions, this could lead to half-migrated tables. I think at the moment it would not be a big deal but if we want to implement #5 this will be important. @stmichael what do you think?

@stmichael
Copy link
Owner Author

That's true data consistency is a problem. I just opened this ticket so we don't forget about this. As I said, there is no hard limit for the amount of commits per transaction. That depends on the resources of your system.

I propose we leave this ticket open as an idea and postpone it to a later point in time when it is actually needed.

@senny
Copy link
Collaborator

senny commented Sep 11, 2012

I'm fine with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants