Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding mismatch between schema.rb and database.yml #375

Open
jvendetti opened this issue Dec 17, 2024 · 3 comments
Open

Encoding mismatch between schema.rb and database.yml #375

jvendetti opened this issue Dec 17, 2024 · 3 comments

Comments

@jvendetti
Copy link
Member

Recently I needed to run the RSpec tests for the License model:

$ bin/rails db:migrate RAILS_ENV=test
$ bundle exec rspec spec/models/license_spec.rb

... and according to git, this resulted in modifications to the schema.rb file, e.g.:
Screenshot 2024-12-17 at 9 49 42 AM

The underlying cause is that my database.yml file in my local dev environment is configured to use utf8 encoding, which matches the encoding configuration of database.yml in our private bioportal_config repository (used for official deployments of BIoPortal to both staging and production). This is a mismatch with the utf8mb4 charset specified in the schema.rb file that was recently modified by adoption of code from AgroPortal.

If the intention going forward is to use utf8mb4 encoding, then the database.yml used for local dev and for official deployments should be properly configured with that encoding. The database.yml file in the private bioportal_config repository is currently configured to use utf8 for appliance, development, and test modes, and no encoding is specified for staging and production modes.

Another issue is the collation setting of utf8mb4_0900_ai_ci. This type of collation was introduced in MySQL 8, and we're running on MySQL 5.7, so the collation is ignored. There's no documentation in the pull request that included the change to schema.rb to indicate if there's a reason for specifying the collation. If this is important for new functionality adopted from AgroPortal, then we should consider using a collation setting that applies to the version of MySQL that we're actually running in our production environment (5.7), e.g.:

  • utf8mb4_general_ci (faster but less accurate sorting)
  • utf8mb4_unicode_ci (slightly slower but more precise Unicode sorting)

Note that this isn't an exhaustive list of collation options.

@syphax-bouazzouni
Copy link
Contributor

If this is important for new functionality adopted from AgroPortal, then we should consider using a collation setting that applies to the version of MySQL that we're actually running in our production environment (5.7), e.g.:

Hello @jvendetti, no we don't need it in AgroPortal alignment work, I think it was introduced automatically somehow, you can remove that change, if it is not a good thing to have.

@jvendetti
Copy link
Member Author

@alexskr - I generated a new Rails application from scratch using Rails 7.0.8.7 and MySQL 5.7.44. It looks like the default encoding was generated as utf8mb4 in the database.yml file:

default: &default
  adapter: mysql2
  encoding: utf8mb4

Can you think of any problem with going from utf8 to the utf8mb4 encoding for production BioPortal and/or the appliance distribution?

@alexskr
Copy link
Member

alexskr commented Dec 18, 2024

Can you think of any problem with going from utf8 to the utf8mb4 encoding for production BioPortal and/or the appliance distribution?

utf8 is a legacy charset so moving to utf8mb4 is the right thing to do
If we do that then the next major version of the appliance (v4) could integrate this change.

jvendetti added a commit that referenced this issue Dec 19, 2024
The "utf8mb4_0900_ai_ci" collation doesn't exist in MySQL 5.7 (see #375)
@jvendetti jvendetti self-assigned this Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants