Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use parameterized sqlite for insertRowIntoDB #385

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jspjutNV
Copy link
Contributor

@jspjutNV jspjutNV commented Jul 6, 2022

Opening a PR to consider parameterized SQL queries for insertion to avoid injection problems from either experiment configuration, or user responses for text entry. This approach should allow characters like ' to be entered into the database without error. An alternative approach (#386 ) would be to translate those characters into something else before adding them to the non-parameterized queries.

Parameterized queries are a well known best practice for all types of SQL, though sqlite's C interface changes significantly when doing so.

This one is exclusive with #386 (only one or the other should be merged).

This closes #383

@jspjutNV jspjutNV added the bug Something isn't working label Jul 6, 2022
@jspjutNV jspjutNV requested a review from bboudaoud-nv July 6, 2022 15:29
@jspjutNV jspjutNV self-assigned this Jul 6, 2022
@bboudaoud-nv
Copy link
Collaborator

I'll note that this approach (as currently implemented) loses the types for our data in favor of treating everything as text. If we wanted to retain the types we could potentially do some (internal) logging when tables are created so we know the right types. Still I like the approach of using these methods ahead of the current approach.

Last but not least, only either this PR or #386 should be merged as they solve the same problem differently.

@jspjutNV
Copy link
Contributor Author

jspjutNV commented Jul 6, 2022

I'll note that this approach (as currently implemented) loses the types for our data in favor of treating everything as text. If we wanted to retain the types we could potentially do some (internal) logging when tables are created so we know the right types. Still I like the approach of using these methods ahead of the current approach.

I'm pretty sure types could be preserved with a bit more work on this. You can specify types, it's just that our general interface doesn't expose those quite as easily.

Still, you could do a dynamic type case statement to specify each type correctly. If someone gets some time, that would be a nice improvement to the current PR.

The other required improvement before merging this is to apply the same style to any other table write functions there may be.

@jspjutNV
Copy link
Contributor Author

I should note that the way all sql helpers work in top of tree is to write only strings/text to the database, so this style of change is not a regression from what we're doing already. Obviously we could fix it to submit the correct types to the database, but we've never done that before. I believe that fix could be done after merging this since it fixes a more important bug.

That said, insertRowsIntoDB still needs to be updated, and we need to do some testing to make sure this PR is working.

@jspjutNV
Copy link
Contributor Author

0953f9e fails to log because there's a variable limit on sqlite and we exceed it. I'm working on breaking the command down into smaller size sub-commands to fit within the limit.

@jspjutNV
Copy link
Contributor Author

This PR is nearly matched to the behavior of master but fixing #383.

There is still one place where the old style sqlite interface is being used in FPSciLogger::updateSessionEntry where a 1-off update is sent. We should decide whether to update that to the v2 interface or let it stay until the bigger fix where we pass data types all the way through.

@jspjutNV jspjutNV changed the title use parameterized sqlite for instertRowIntoDB use parameterized sqlite for instetRowIntoDB Aug 31, 2022
@jspjutNV jspjutNV changed the title use parameterized sqlite for instetRowIntoDB use parameterized sqlite for insertRowIntoDB Aug 31, 2022
@jspjutNV
Copy link
Contributor Author

If we run into performance issues in the future, we may want to look here for whether changing from SQLITE_TRANSIENT to SQLITE_STATIC is a good idea. It does require tracking when the strings are destroyed to use the static version.

@jspjutNV jspjutNV marked this pull request as ready for review August 31, 2022 18:01
Copy link
Collaborator

@bboudaoud-nv bboudaoud-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a solid fix, we can certainly do more future improvements (i.e. keeping track of variable types when we create tables and using the right insert methods).

We may also want to reintroduce the colNames fields to insert methods if we decided to support optional logging of unspecified parameters (i.e. different sessionParametersToLog values).

@jspjutNV
Copy link
Contributor Author

@bboudaoud-nv found a possible bug in this PR and I saw it too. A bunch of nulls start showing up in a few of the tables. We need to investigate more before merging this.

@jspjutNV jspjutNV marked this pull request as draft August 31, 2022 18:31
@bboudaoud-nv
Copy link
Collaborator

I also think that having some insert statements logged to log.txt is less valuable now that the value fields are logged as the ? characters from the INSERT query rather than literals.

@jspjutNV
Copy link
Contributor Author

I also think that having some insert statements logged to log.txt is less valuable now that the value fields are logged as the ? characters from the INSERT query rather than literals.

I agree. Looks like I accidentally left my debugging prints in there. Let's remove them before merging and get just the most useful subset of log prints (if any).

Continuing to investigate, @bboudaoud-nv has reported seeing some SQL_BUSY return codes in the failing case, which makes it seem like we're hitting race conditions. This may be a good thing suggesting that we're increasing the concurrency of our submits to the database. However, we'll need to do the work to ensure we retry the failed queries.

There's an interesting blog post here that may help with an understanding of what's going on, and a few suggested ways to retry and fix our problem.

jspjutNV added a commit that referenced this pull request Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with ' character in experiment configuration
2 participants