Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite should not be a hard dependency #1036

Closed
coretemp opened this issue Nov 2, 2018 · 11 comments
Closed

SQLite should not be a hard dependency #1036

coretemp opened this issue Nov 2, 2018 · 11 comments

Comments

@coretemp
Copy link

coretemp commented Nov 2, 2018

Getting SQLiteException “database disk image is malformed” seems to be common with other pieces of software built on top of SQLite. Now, it might be that these are application bugs, but perhaps it's just an unfortunate decision to ever start using SQLite in NixOps. I expect it to be a platform specific bug of NixOps on a Mac, but managing such important state in what is a fairly simplistic database might just not be the way to go.

I would love to see a different backend (e.g. Postgres), because I have zero confidence in SQLite.

@edolstra
Copy link
Member

edolstra commented Nov 2, 2018

I, on the other hand, have near infinite confidence in SQLite. In any case this should be addressed by #624 or #865.

@coretemp
Copy link
Author

coretemp commented Nov 2, 2018

I have observed that a real world system doesn't work. NixOps specifically failed. So, if your confidence in SQLite is near infinite, are you saying it's an application bug then in NixOps or a Apple file system bug or ...? Please fill in the blanks.

Bugs don't just happen out of thin air.

@srghma
Copy link
Contributor

srghma commented Nov 4, 2018

Hi @edolstra @coretemp , I've done some work on improving the #624 and #865 PRs in #969 by adding more tests and fixing revealed issues.

Please ping me if you are interested, so I could fix conflicts and prepare it for code reviews.

@srghma srghma mentioned this issue Nov 4, 2018
12 tasks
@olorin
Copy link

olorin commented Dec 27, 2018

I expect it to be a platform specific bug of NixOps on a Mac

I'm getting this frequently too, on linux.

> nixops --version
NixOps 1.6
> sqlite3 --version
3.26.0 2018-12-01 12:34:55 bf8c1b2b7a5960c282e543b9c293686dccff272512d08865f4600fb58238b4f9
> uname -a
Linux athena.daisee.com 4.19.11-arch1-1-ARCH #1 SMP PREEMPT Thu Dec 20 03:40:28 UTC 2018 x86_64 GNU/Linux

Though I don't have version information handy, I believe my colleague @christian-marie has seen the same thing while running NixOS. We recently added a git precommit hook to integrity-check the statefile, though it hasn't yet given me enough information to reproduce reliably. It has allowed us to find older instances of corruption which cause the integrity check to fail, but had not affected nixops ssh.

Edit: similar-sounding nix issue, might be related? NixOS/nix#1353

@olorin
Copy link

olorin commented Dec 27, 2018

Just tracked down one of the corruption instances from before we added the precommit hook (this one went unnoticed for a while because it didn't break nixops ssh or nixops deploy):

> sqlite3 state.nixops "pragma 'integrity_check'"                                                         
row 147 missing from index sqlite_autoindex_ResourceAttrs_1                                               
row 162 missing from index sqlite_autoindex_ResourceAttrs_1                                               
wrong # of entries in index sqlite_autoindex_ResourceAttrs_1                                              
> sqlite3 state.nixops reindex
Error: UNIQUE constraint failed: ResourceAttrs.machine, ResourceAttrs.name                                
> sqlite3 state.nixops .dump | grep 'INSERT INTO ResourceAttrs' | perl -pe 's/INSERT INTO ResourceAttrs VALUES\((\d+),([^,]+),.*$/$1,$2/g' | sort | uniq -d 
59,'ec2.tags'
66,'ec2.tags'

The corruption occurred after running a nixops deploy using a valid statefile and committing the result. I'm not sure how the duplicate records could have been inserted into the table without being rejected, as they should have been added to the index as part of the execution of INSERT.

@kalbasit
Copy link
Member

kalbasit commented Feb 3, 2019

@olorin how did you fix the corruption? And I'm curious what does your pre-commit hook do?

$ nix run -f '<nixpkgs>' sqlite -c sqlite3 /yl/keybase/private/ylcodes/shabka/deployments.nixops .dump | grep 'INSERT INTO ResourceAttrs' | perl -pe 's/INSERT INTO ResourceAttrs VALUES\((\d+),([^,]+),.*$/$1,$2/g' | sort | uniq -d
76,'configsPath'
76,'ec2.accessKeyId'
76,'route53.accessKeyId'
76,'route53.hostName'
76,'route53.ttl'
76,'startTime'
76,'toplevel'
77,'ec2.policy'
77,'ec2.roleName'
77,'state'
78,'ec2.securityGroupDescription'
78,'ec2.securityGroupRules'
78,'state'
79,'ec2.securityGroupRules'
79,'state'
86,'creationTime'
86,'index'
86,'none.sshPrivateKey'

@olorin
Copy link

olorin commented Feb 17, 2019

@olorin how did you fix the corruption?

Nothing fancy, just filtered the duplicate INSERTs out of the state file dump, and then imported into a fresh database and replaced the old one. Something like this:

sqlite3 state.nixops .dump > state.sql
# <remove duplicate INSERTS from state.sql>
sqlite3 state-new.nixops < state.sql
sqlite3 state-new.nixops "pragma integrty_check"
mv state-new.nixops state.nixops

And I'm curious what does your pre-commit hook do?

Just runs the sqlite integrity check over any state databases which were touched in the commit. This is the contents of $repo/.git/hooks/pre-commit: https://gist.github.com/olorin/8ea68e1adcd5d384d200632bbff28000

As a sidenote, we've now stopped tracking the sqlite databases themselves in git; instead, we're storing the JSON dumps from nixops --export, and calling nixops via some ugly wrapper code which handles generating a temporary state database and updating the JSON export at the end. It's fairly dumb and fragile, but happy to clean it up and share if it'd be useful.

@christian-marie
Copy link

I wonder if pre-exit corruption detection would be enough to track this down for now?

@wmertens
Copy link
Contributor

wmertens commented May 7, 2019

Just chiming in in defence of SQLite:

The errors described in the last few comments are definitely application-level issues with sqlite. The "malformed image" error in the OP is definitely a hardware, OS, or misuse error - see How To Corrupt An SQLite Database File

SQLite is not the problem in either case.

The broken unique constraints cannot happen unless they are present before the unique index is added, or the constraint checking is temporarily disabled. I don't see any code that would do that, so I'm guessing maybe data files were corrupt but still legible?

@tbenst
Copy link

tbenst commented Dec 3, 2019

@edolstra just wondering if there is still interest in #969? I'm trying to manage the same deployments from multiple computers and struggling to do this with SQLite--seems easy to corrupt database by syncing with git, dropbox, etc, and having a remote database strikes me as the most sane option.

Alternatively, curious if you have another workflow for managing deployments from multiple machines?

Thanks for all the work on NixOps, super slick piece of software :)

@edolstra
Copy link
Member

edolstra commented Dec 4, 2019

@tbenst Yes, I'm definitely interested in that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants