Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Mysql2::Errors due to race conditions with garbage collection #1258

Open
burnison opened this issue Apr 25, 2022 · 1 comment
Open

Comments

@burnison
Copy link

burnison commented Apr 25, 2022

Context

Several bugs in this repository, such as #1019, #694, and #1043, seem to relate to a common issue, which the work done 3 years ago in the unmerged #1048 appears to propose a potential solution. As noted in #1048, there is a race condition between the garbage collector and the user code that can lead to several different, unexpected errors in specs and production code. I am opening this issue to try associating them all with the same cause, to provide a meaningful work-around for library users, and to provide test cases for a potential fix.

Bugs

Ambiguous Mysql2::Errors

Initially, we had been experiencing behaviour similar to #1019, wherein seemingly-random rspec tests would fail with an ambiguous, Mysql2::Error with no error message. In most cases, the top of the stack trace would point to affected_rows within ActiveRecord, but could also point to any other functions that perform a query.

Eventually, this specific error was traced down to rb_mysql_client_affected_rows not expecting a return of -1 when calling mysql_affected_rows. However, the return of -1 is being set by cli_advanced_command with command 25, COM_STMT_CLOSE, which is called through a macro (stmt_command) in mysql_stmt_close by nogvl_stmt_close from rb_mysql_stmt_free during garbage collection.

This specific problem can be reproduced deterministically with the following spec:

diff --git a/spec/mysql2/statement_spec.rb b/spec/mysql2/statement_spec.rb
index dbc185e..d4cf310 100644
--- a/spec/mysql2/statement_spec.rb
+++ b/spec/mysql2/statement_spec.rb
@@ -695,6 +695,15 @@ RSpec.describe Mysql2::Statement do
     end
   end

+  context 'garbage collection' do
+    it 'should not cause rows_affected to be reset' do
+      @client.prepare 'SELECT 1'
+      @client.query 'SELECT 1'
+      GC.start(full_mark: true, immediate_sweep: true)
+      expect(@client.affected_rows).to eq 1
+    end
+  end
+
   context 'close' do
     it 'should free server resources' do
       stmt = @client.prepare 'SELECT 1'

Running this spec will produce the described error:

  1) Mysql2::Statement garbage collection should not cause rows_affected to be reset
     Failure/Error: expect(@client.affected_rows).to eq 1
     Mysql2::Error:
     # ./spec/mysql2/statement_spec.rb:715:in `affected_rows'
     # ./spec/mysql2/statement_spec.rb:715:in `block (3 levels) in <top (required)>'

Commands out of sync errors

Similar to #1043 and #694, this same race condition may trigger a CR_COMMANDS_OUT_OF_SYNC (error 2014, Commands out of sync; you can't run this command now) between the invocation of cli_read_query_result and mysql_store_result. Upon the completion of cli_read_query_result, mysql->status is set to MYSQL_STATUS_GET_RESULT, which is what mysql_store_result expects.

However, due to the the closure of the prepared statement, mysql->status is set to MYSQL_STATUS_READY in mysql_stmt_close, preventing the completion of mysql_store_result and triggering error 2014.

This one is harder to write a spec for, but can be deterministically proven with the following code change:

diff --git a/ext/mysql2/client.c b/ext/mysql2/client.c
index 5b699e9..a675cd2 100644
--- a/ext/mysql2/client.c
+++ b/ext/mysql2/client.c
@@ -609,6 +609,8 @@ static VALUE rb_mysql_client_async_result(VALUE self) {
     rb_raise_mysql2_error(wrapper);
   }

+  rb_gc();
+
   is_streaming = rb_hash_aref(rb_ivar_get(self, intern_current_query_options), sym_stream);
   if (is_streaming == Qtrue) {
     result = (MYSQL_RES *)rb_thread_call_without_gvl(nogvl_use_result, wrapper, RUBY_UBF_IO, 0);

Running the test suite will reproduce this bug:

  1) Mysql2::Statement should update a DECIMAL value passing a BigDecimal
     Failure/Error: _query(sql, @query_options.merge(options))

     Mysql2::Error:
       Commands out of sync; you can't run this command now
     # ./lib/mysql2/client.rb:131:in `_query'
     # ./lib/mysql2/client.rb:131:in `block in query'
     # ./lib/mysql2/client.rb:130:in `handle_interrupt'
     # ./lib/mysql2/client.rb:130:in `query'
     # ./spec/mysql2/statement_spec.rb:202:in `block (2 levels) in <top (required)>'

Current work-arounds

Users of mysql2 can stop this bug occurring by ensuring that all prepared statements are explicitly closed once they're done being used --- that is, never letting a prepared statement get closed by the garbage collector.

This prevents the issue due to the guard on stmt_wrapper->stmt and the subsequent nullification during the call to nogvl_stmt_close. Once a statement is closed in user code, the garbage collector does not make another call to MySQL.

@tenderlove
Copy link
Collaborator

I was reading through the code, and I can see how this would be an issue. I'm just going to dump some thoughts here, but I don't have any ideas for how to fix this yet.

  1. We can't guarantee the order that anything will get collected. I see some code in the mysql2 codebase that is trying to force things to be collected in a certain order, but it's just not possible (unless we want to muck with finalizers and I certainly don't).
  2. In-flight queries and prepared statements don't know about each other. A prepared statement could be collected at any time, including when a different query is in-flight (as this issue points out).

I only have a couple ideas for dealing with this right now.

  1. Add a queue. When a prepared statement is collected, rather than closing the statement, add it to a queue and have mysql2 drain the queue and close the statements when we know it's "safe" (right after an in-flight query has been executed, or immediately if there are no in-flight queries).
  2. Raise an exception if a statement handler is collected but hasn't been closed. IOW force the end user to manually close the statement when they are done with it.

I think the first option is a lot of work. Second option is probably less work, but forces the end user to change their code. Still trying to think if there is a better 3rd option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants