Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Stash crashes if sent graphQL after upgrade + pre migration #4197

Open
Dark-Obsidian opened this issue Oct 12, 2023 · 9 comments
Labels
bug report Bug reports that are not yet verified

Comments

@Dark-Obsidian
Copy link

Summary

Stash crashes if sent a graphQL command before the DB is upgraded to the latest schema (e.g. after updating to a new version)

To Reproduce

  1. Install stash v0.22.0
  2. Upgrade to stash v0.23.0
  3. Start stash (without doing anything else)
  4. Send a graphQL command to stash (e.g. to run a scan)
  5. stash sends back an OK response, then crashes

Context / Troubleshooting performed

I have a v. simple script that auto-runs a scan for newly downloaded content when stash starts by:

  1. Running stash-win.exe with several parameters, in a minimised window
  2. Waiting 15 seconds
  3. Sending a graphQL instruction via curl for stash to run a scan

After upgrading to stash v0.23.0, I found stash was crashing a few seconds after starting. I did some local troubleshooting, and discovered that the upgrade to stash v0.23.0 also required a DB schema (which is fine!) however, presumably because the stash console window was being minimised(?), this was not being prompted and so was not being done before the time-delayed graphQL instruction was sent a few seconds later, causing stash to crash (rather than ignore the command or send an error response).

Expected behaviour

If a DB upgrade is required and/or there are certain requests that stash cannot execute at a certain time due (e.g. because a DB upgrade is required first), when receiving an instruction stash should either:

  • Ignore any graphQL requests
  • Return an error, ideally advising that a DB upgrade is required because of a new schema

Stash Version

v0.22.0 --> v0.23.0

Desktop information

  • OS: Windows 11
  • Browser: Edge
  • Browser version: Version 114.0.1823.51
@Dark-Obsidian Dark-Obsidian added the bug report Bug reports that are not yet verified label Oct 12, 2023
@DingDongSoLong4
Copy link
Collaborator

I have a fix for this locally, as part of a larger refactor. It's based off of #4152, so I'll send a PR through once that merges.

It simply returns a "database not initialized" error if any request involving the database is made before it is properly initialized, i.e. before the initial setup or if a migration is required.

If a migration is required, all pages in the UI will redirect to the migration page to prompt for user confirmation. There is also a warning logged on startup (database schema version * does not match required schema version *). If you have stash configured to not automatically open the UI in a browser on startup, then given it is being launched minimized, I can see how both of these would be missed.

What you could do is use the systemStatus GraphQL query to retrieve the current status:

query {
  systemStatus {
    databaseSchema
    appSchema
    status
  }
}

and only start the automatic scan if status is not OK (ie it is NEEDS_MIGRATION). Alternatively you could just make sure to start stash normally (without the script) for the first time after updating, to ensure that any migrations are done before attempting to run any external database-accessing GraphQL queries.

@Dark-Obsidian
Copy link
Author

Dark-Obsidian commented Oct 12, 2023

Thanks for the reply and info @DingDongSoLong4!


Responding to your points

If you have stash configured to not automatically open the UI in a browser on startup, then given it is being launched minimized, I can see how both of these would be missed.

Yeah, this is the exact situation that applied unfortunately!

What you could do is use the systemStatus GraphQL query to retrieve the current status and only start the automatic scan if status is not OK (ie it is not NEEDS_MIGRATION).

Ah, awesome, thanks for sharing that script! I am very new to GraphQL, so appreciate it!
Ideally I just wanted to keep my stash script as simple as possible but doing this check first isn't too much extra to add (plus lets me catch any errors and throw up a dialog box, etc)

Alternatively you could just make sure to start stash normally (without the script) for the first time after updating

Yeah, it was a learning experience for sure and now that I am more explicitly aware of the dependency between stash-win.exe upgrades <--> stash schema upgrades <--> graphQL limitations, I will definitely try and make a mental note about this in the future!


Additional reflection on overall situation

Firstly, thanks again for your help and multiple suggestions! Overall these should significantly reduce the chance of this happening again!

As you said above, the combination of having open browser = false and running stash-win.exe minimised didn't help my situation at all. That said, I think my debugging was somewhat hindered by stash returning a normal response to the scan request, i.e. the output from my stash starter script was...

> run-stash

Starting stash-win.exe...
Requesting scan...
{"data":{"metadataScan":"1"}}

... which I interpretated as ...

  • " OK so there is nothing wrong with stash / the GraphQL command...
  • Maybe somehow the GOTO:EOF at the end of my script is causing it to kill stash, which is weird as I explicitly use GOTO:EOF rather than EXIT as it's safer...
  • And I run stash via a START command (eg. START "Stash" /MIN "%stashdir%\stash-win.exe" --config...) so it runs asynchronously...
  • But I don't see what else it could be!? " 😕

I then spent ages looking at the wrong problem, swapping GOTO:EOF for EXIT /B 0, etc and changing various switches like /I in the START command, etc... Like you say, it was only when running into dead ends and thinking "I'm just going to literally double click stash-win.exe" and seeing the DB upgrade message that it finally clicked what the actual issue was.

I'm not saying that stash needs to consider everything and protect against every single scenario, but in my defence, stash returning {"data":{"metadataScan":"1"}} rather than say {"error":{"message":"DB upgrade required"}} did throw me off the scent completely TBH...

  • If this is the kind of change that is part of the PR you mentioned then that's awesome!
  • If not, maybe I can raise a feature request for this? (i.e. request that if the DB is not initialised, stash should reply to various GraphQL commands (scan, generate, etc) with an error instead of a normal response)

@DingDongSoLong4
Copy link
Collaborator

I'm not saying that stash needs to consider everything and protect against every single scenario, but in my defence, stash returning {"data":{"metadataScan":"1"}} rather than say {"error":{"message":"DB upgrade required"}} did throw me off the scent completely TBH...

This is perfectly reasonable, and I definitely agree. Having a database that needs to be migrated is an entirely normal scenario - it should be taken into account wherever possible, and should definitely not cause a crash.

Actually now that I think about it, my change wouldn't solve the issue with metadataScan specifically. When you run the metadataScan mutation, it queues a job in the background - the job itself is only started when it gets to the top of the job queue. This means that a job creation mutation (such as metadataScan) will basically never return an error.

The way I'd probably fix it is to allow for jobs to verify some kind of precondition before they can be added to the queue, in this case checking if the database has been initialized. This will let the action of adding a job to the queue potentially fail, which would then be able to return an error directly from metadataScan.

My fix was for normal data graphql requests (something like findScene or sceneUpdate), where if the database was uninitialized it would result in a nil pointer deference panic.

Also sidenote: if you haven't already, I would highly recommend configuring a log file (Settings > System > Logging). The log file would have allowed you to see the database schema version does not match warning, despite the crash.

@Dark-Obsidian
Copy link
Author

Dark-Obsidian commented Oct 13, 2023

This is perfectly reasonable [...] Having a database that needs to be migrated is an entirely normal scenario [...] and should definitely not cause a crash

Cool, yeah so I think there are two potential code changes that could be implemented, if the dev team agrees...

Part 1 (Required?) - Stash protected so that it does not crash when sent GraphQL command(s)

  • As you say, having stash crash in this situation is not ideal
  • Especially when you consider that while it was trivial for me to restart, if people are running stash remotely or in more complicated setups, this could be more of a pain
  • Crucially though - if stash had not crashed, I would not have had my issue as...
  • If stash didn't crash - I would've returned to the browser, refreshed the page, and then instead of a getting the "server offline" localhost refused to connect page, I would've been taken to the DB upgrade page, and so would've got the prompt to update the DB right away.

Part 2 (Optional?) - Above, plus adding pre-checking steps to stash job manager so it verifies the job is possible before returning response

The way I'd probably fix it is to allow for jobs to verify some kind of precondition before they can be added to the queue, in this case checking if the database has been initialized. This will let the action of adding a job to the queue potentially fail, which would then be able to return an error directly from metadataScan.

  • As you described above, adding some checking on the current state before a job is accepted to the queue and then to return an error if the queue is unavailable, or the pre-conditions aren't met, etc.

Next steps?

  • Thanks once again for your input and assistance on this!
  • The above sounds sensible and reasonable to me, however I realise I don't have an idea of the amount of work involved to implement either of the above
  • In terms of getting someone to look at this more formally and decide on whether to accept it into the pipeline... is this something that you can do?... or is there someone that I can tag to have a look at the above?

@DingDongSoLong4
Copy link
Collaborator

Well Part 1 is part of the fix I have already made locally - it will cause the scan to log a "database not initialized" error message instead of causing a crash.

Part 2 is what I haven't implemented yet - I'll see what I can do when I have time. I don't imagine it would be amazingly complex to do, although it probably won't be trivial either.

In terms of getting someone to look at this more formally and decide on whether to accept it into the pipeline... is this something that you can do?... or is there someone that I can tag to have a look at the above?

In general @WithoutPants is the one who handles most of these things, although I'd say both changes are relatively uncontroversial.
I'd classify Stash crashing on a metadataScan when the database needs migration as a definite bug. However I would probably classify wanting metadataScan to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour (minus the crashing), where metadataScan never errors directly, is intentional given then current job system.

@Dark-Obsidian
Copy link
Author

Dark-Obsidian commented Oct 13, 2023

Part 1 is part of the fix I have already made locally - it will cause the scan to log a "database not initialized" error message instead of causing a crash.

Ah, amazing!

Part 2 is what I haven't implemented yet - I'll see what I can do when I have time. I don't imagine it would be amazingly complex to do, although it probably won't be trivial either.

Appreciate that. Like I said above, whilst I still think part 2 is valid/beneficial, it is more of a "optional" or "nice to have" so please just work on it whenever you have time, no rush!

However I would probably classify wanting metadataScan to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour, where metadataScan never errors directly, is intentional given then current job system.

  • Yeah, that makes sense. In my case it would be helpful to have it work in a pseudo 'synchronous' mode, as my script doesn't "hang around" after the request is made or check the job status later on, etc - so at "request time" would definitely be the best place for scripts like mine to handle any errors.
  • That said, I appreciate that if that's not the current system design then that is a bigger change, and it may be handled better already by users with more complex scripts that do periodic checks on the job queue, etc.
  • Happy to go whichever way you and other devs want to on this one.

Just to double-check -- Although in this instance it was a metadataScan request that caused the crash. I'm assuming that the code is being updated to handle "problematic jobs / requests" in general, right?... i.e. not just metadataScan specifically, but also say if I sent a metadataGenerate or metadataClean message?

@WithoutPants
Copy link
Collaborator

WithoutPants commented Oct 14, 2023

In general @WithoutPants is the one who handles most of these things, although I'd say both changes are relatively uncontroversial. I'd classify Stash crashing on a metadataScan when the database needs migration as a definite bug. However I would probably classify wanting metadataScan to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour (minus the crashing), where metadataScan never errors directly, is intentional given then current job system.

I think metadataScan returning an error if the preconditions aren't met - without adding it to the queue - is pretty straightforward expected behaviour imo. The implementation just adds the job to the manager currently, but there's nothing to suggest that this is the way it should be. metadataScan should be just as capable of erroring out as any other mutation.

Edit: in fact, metadataScan does error out currently if ffmpeg or ffprobe are missing.

@DingDongSoLong4
Copy link
Collaborator

Alright that was easier than I thought it would be - I've added a commit to my local branch that ensures the database is initialized if necessary before adding jobs. metadataScan (also metadataGenerate, and a bunch of others) now error with database not initialized if the database is not ready.

@Dark-Obsidian
Copy link
Author

Dark-Obsidian commented Oct 15, 2023

Awesome, thanks for this!

😁

NB: Github may block GIFs auto-playing if not enabled in your accessibility settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug reports that are not yet verified
Projects
Status: To triage
Development

No branches or pull requests

3 participants