Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating from cisTEM Beta to latest devel fails #510

Open
twagner9 opened this issue Jul 3, 2024 · 3 comments
Open

Updating from cisTEM Beta to latest devel fails #510

twagner9 opened this issue Jul 3, 2024 · 3 comments

Comments

@twagner9
Copy link
Collaborator

twagner9 commented Jul 3, 2024

A pull request resolving this issue will follow; I felt it was better to layout in depths the problems here and leave the solutions for the PR, rather than including it all in the PR.

Description

First, I want to note that the solution to this bug as implemented feels a bit hacky, and is not highly performant -- there are a few additional seconds added while updating a small database with tutorial apoferritin data, and I could imagine this taking much longer for a larger database -- so any suggestions to improve it on this front are welcome.

To that end, I'm going to go a bit in depth in describing the bug to see if there's a potential underlying (better) solution that I (and @timothygrant80, who originally wrote the solution, with later additions from me) missed. I suppose whether this solution is sufficient depends on if it's believed beta version is still in wide use in comparison with the devel version.

Bug Description

This bug specifically affects database upgrades that are at least as old as the beta version, but would also apply for any subsequent versions where the database schema was structured the same way. There were a few problems when attempting to update:

  1. The original crash came from an issue while updating, where execution of a SQL command fails due to attempting to create a duplicate column, specifically, ASSIGNED_SUBSET. This was caused by two instances of the REFINEMENT_PACKAGE_CONTAINED_PARTICLES_ dynamic table being listed in the schema. This was resolved simply by removing one of the instantiations.

  2. After resolving problem 1, another crash would occur when attempting to open the Results Panel. This error looked like the following:

Error: Invalid n after symmetry symbol: 00.0
From core/symmetry_matrix.cpp:55
void SymmetryMatrix::Init(wxString)

indicating that the SYMMETRY column wasn’t updated correctly. This stemmed from the later addition of OUTPUT_PIXEL_SIZE to the REFINEMENT_PACKAGE_ASSETS static table, notably in the original position that SYMMETRY occupied in the beta database. All subsequent columns were then moved over by 1, and the following loop in Database::CheckSchema depends on proper ordering of the schema (via col_counter):

    	for ( col_counter = 0; col_counter < std::get<TABLE_COLUMNS>(table).size( ); col_counter++ ) {
        	auto& column = std::get<TABLE_COLUMNS>(table)[col_counter];
        	char  type   = std::get<TABLE_TYPES>(table)[col_counter];


        	count = ReturnSingleIntFromSelectCommand(wxString::Format("SELECT COUNT(*) AS CNTREC FROM pragma_table_info('%s') WHERE name='%s';", std::get<0>(table), column));
        	if ( count < 1 ) {
            	missing_columns.push_back(ColumnChange(std::get<TABLE_NAME>(table), column, type));
        	}
    	}

And, for further confirmation, here is a screenshot of the Refinement Package Assets Panel, where symmetry == molecular weight, molecular weight == largest dimension, etc. (also note that box size, no. classes are intact, as they were before the new position of OUTPUT_PIXEL_SIZE in the db):
image

These are the issues; solutions will be discussed in coming PR.

@bHimes
Copy link
Collaborator

bHimes commented Jul 3, 2024

Hey @twagner9 I like the way you've set up this PR.

The tutorial dataset is super small, I think having numbers for projects with something like:
5k (tutorialish)
100k
500k
Maybe 1M

The largest set may not be necessary depending on the results from the 100 & 500k. The main goal is to estimate how the time penalty scales when using the "fix." If it is O(n^2) for example, then this might not be good solution.

Knowing the cost-to-benefit of the current fix will help me (and likely @jojoelfe since this is database related) consider how much brain space to allocate.

@jojoelfe
Copy link
Collaborator

jojoelfe commented Jul 3, 2024

My quick 2cents to issue number 2:

The underlying problem is that in many places the databasse is queried using SELECT * FROM ... and the code then expecting the columns to be in the correct order. This means that for updates new columns can only be added to the end, which in some cases did not happen.

This can be fixed with some sort of custom reordering code, but as you pointed out the performance might be very bad. Maybe it would be easier to refactor in the required places to instead use an explicit:

SELECT SYMMETRY, OUTPUT_PIXELSIZE, ... FROM REFINEMENT_PACKAGE_ASSETS

so that the values are being always returned in the right order, no matter the order of the columns in the database

@twagner9
Copy link
Collaborator Author

@bHimes @jojoelfe

It took me awhile to get a database set up with the appropriate number of particles. I ended up testing ~2k particles (small), ~120k (medium-ish) particles, and ~1.2 million particles (large), with steps completed through 3D refinement, and for the medium and large sets set a timer to track the amount of time spent updating the database.

  • ~2,000 particle set: <10 seconds; couple of additional seconds for normal update.
  • 120k particle set: 1m:09s for manual update; a small amount of time longer for the normal update.
  • 1.2 mil particle set: 07m:17s for manual update, and still relatively short for normal update.

I will test this next, but theoretically, this level of update should be limited to databases created only in those versions of cisTEM that existed prior to REFINEMENT_PACKAGE_ASSETS ordering changes in commit 5a44b5f (line 1402 of database.cpp) -- which was committed 5+ years ago.

As I'm unsure whether many cisTEM users would still be utilizing the beta version or other versions this old, perhaps this amount of time is acceptable. I could create a functional loading bar, and/or (preferably and), at the very least, notify users in the update dialog that this process can take several minutes for projects created with older versions of cisTEM that contain many particles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants