-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split import transaction #32
Comments
Importing 500'000 records from MSSQL to Postgres using a single transaction used approximately 200MB RAM on my machine. If you extrapolate that to a few million records the memory usage explodes. |
What do we do about data consistency? At the moment when the import fails, it rolls-back the current transaction, leaving the import script in a somewhat useable state. If we have multiple transactions, this could lead to half-migrated tables. I think at the moment it would not be a big deal but if we want to implement #5 this will be important. @stmichael what do you think? |
That's true data consistency is a problem. I just opened this ticket so we don't forget about this. As I said, there is no hard limit for the amount of commits per transaction. That depends on the resources of your system. I propose we leave this ticket open as an idea and postpone it to a later point in time when it is actually needed. |
I'm fine with that. |
Currently each import block runs in a transaction. This may cause some issues with large datasets. My research showed that there is (practically) no hard limit of statements per transaction. But the bottleneck is the transaction log (or write ahead log) which will use a lot of memory.
I propose that we split the whole transaction into a configurable amount of transactions. That way the memory usage of the transaction log will be kept low. The performance loss will be feasible if we keep the number of statements per transaction above a few thousand.
The text was updated successfully, but these errors were encountered: