Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting a relationship explicitly to null that uses a CPK triggers odd and incorrect validation error #612

Open
erikbelusic opened this issue Nov 8, 2023 · 11 comments

Comments

@erikbelusic
Copy link

We are on rails 6.1 and CPK 13, however this persists in rails 7.0 and CPK 14 and rails 7.1 without CPK

class CreateOrganizations < ActiveRecord::Migration[6.1]
  def change
    create_table :organizations do |t|
      t.text :name
      t.timestamps
    end
  end
end

class CreateV1PmtUsers < ActiveRecord::Migration[6.1]
  def change
    create_table :v1_pmt_users do |t|
      t.belongs_to :organization, null: false
      t.text :name
      t.timestamps
    end
  end
end

class CreateV1Issues < ActiveRecord::Migration[6.1]
  def change
    create_table :v1_issues do |t|
      t.belongs_to :organization, null: false
      t.belongs_to :reporter_pmt_user
      t.text :description
      t.timestamps
    end
  end
end


class CreateV2Issues < ActiveRecord::Migration[6.1]
  def change
    create_table :v2_issues, primary_key: [:organization_id, :pmt_id, :external_id] do |t|
      t.integer :organization_id, null: false
      t.integer :pmt_id, null: false
      t.text :external_id, null: false

      t.text :reporter_pmt_user_external_id
      t.text :title, null: false

      t.timestamps
    end
  end
end

class CreateV2PmtUsers < ActiveRecord::Migration[6.1]
  def change
    create_table :v2_pmt_users, primary_key: [:organization_id, :pmt_id, :external_id] do |t|
      t.integer :organization_id, null: false
      t.integer :pmt_id, null: false
      t.text :external_id, null: false
      t.text :name, null: false

      t.timestamps
    end
  end
end

class Organization < ApplicationRecord
end

class V1Issue < ApplicationRecord
  belongs_to :organization
  belongs_to :reporter_pmt_user, class_name: "V1PmtUser", optional: true
end

class V1PmtUser < ApplicationRecord
  belongs_to :organization
end

class V2Issue < ApplicationRecord
  self.primary_keys = :organization_id, :pmt_id, :external_id

  belongs_to :organization
  belongs_to :reporter_pmt_user, class_name: "V2PmtUser", foreign_key: %i[organization_id pmt_id reporter_pmt_user_external_id], optional: true
end

class V2PmtUser < ApplicationRecord
  self.primary_keys = :organization_id, :pmt_id, :external_id

  belongs_to :organization

end

Tests for non CPK models in rspec - all pass

require 'rails_helper'

describe "models without CPK - saving relationships" do
  before(:each) do
    @organization = Organization.create!(name: "test org")
  end

  it "creates without setting reporter" do
    issue = V1Issue.find_or_initialize_by(organization_id: @organization.id, description: "test1")
    expect do
      issue.save!
    end.to change(V1Issue, :count).by(1)
  end

  it "creates with setting reporter" do
    issue = V1Issue.find_or_initialize_by(organization_id: @organization.id, description: "test1")
    expect do
      reporter_pmt_user = V1PmtUser.create!(organization_id: @organization.id, name: "test1")
      issue.update!(reporter_pmt_user: reporter_pmt_user)
    end.to change(V1Issue, :count).by(1).and change(V1PmtUser, :count).by(1)
  end

  it "creates with explicitly setting reporter to nil" do
    issue = V1Issue.find_or_initialize_by(organization_id: @organization.id, description: "test1")
    expect do
      issue.update!(reporter_pmt_user: nil)
    end.to change(V1Issue, :count).by(1)
  end
end

Tests for CPK models - last test fails

require 'rails_helper'

describe "models with CPK - saving relationships" do
  before(:each) do
    @organization = Organization.create!(name: "test org")
    @pmt_id = 1
    @issue_external_id = "1"
  end

  it "creates without setting reporter" do
    issue = V2Issue.find_or_initialize_by(organization_id: @organization.id, pmt_id: @pmt_id, external_id: @issue_external_id)
    expect do
      issue.update!(title: "test")
    end.to change(V2Issue, :count).by(1)
  end

  it "creates with setting reporter" do
    issue = V2Issue.find_or_initialize_by(organization_id: @organization.id, pmt_id: @pmt_id, external_id: @issue_external_id)
    reporter_pmt_user = nil # init
    expect do
      reporter_pmt_user = V2PmtUser.create!(organization_id: @organization.id, pmt_id: @pmt_id, external_id: "abc", name: "test1")
      issue.update!(title: "test", reporter_pmt_user: reporter_pmt_user)
    end.to change(V2Issue, :count).by(1).and change(V2PmtUser, :count).by(1)

    expect(issue).to have_attributes(title: "test", reporter_pmt_user: reporter_pmt_user)
  end

  # The failing spec
  it "creates with explicitly setting reporter to nil" do
    issue = V2Issue.find_or_initialize_by(organization_id: @organization.id, pmt_id: @pmt_id, external_id: @issue_external_id)
    expect do
      issue.update!(reporter_pmt_user: nil)
    end.to change(V2Issue, :count).by(1)
  end
end

failure:

1) models with CPK - saving relationships creates with explicitly setting reporter to nil
     Failure/Error: issue.update!(title: "test", reporter_pmt_user: nil)

     ActiveRecord::RecordInvalid:
       Validation failed: Organization must exist

The expected behavior is that i can null out the relationship the same way i can in the non CPK example

@erikbelusic
Copy link
Author

After posting this, I have realized what the issue is. I am going to close.

@erikbelusic
Copy link
Author

Maybe this should actually be re-opened as a feature request. The issue here is that we are sharing columns between PK and relationship "FK". So when assigning the null relationship, we end up nulling out the organization_id accidently.

What is odd and made me think this was a bug is that on rails 6.0 and cpk 12 I do not believe this was an issue, as we only saw issues when upgrading to 6.1 and 13.

Would love your thoughts. Sharing something like a tennant_id in CPK is a fairly common strategy

@cfis
Copy link
Contributor

cfis commented Nov 10, 2023

Sure, feel free to reopen. Would be good to have a test case based on the CPK test data if you have time to put one together.

@cfis cfis reopened this Nov 10, 2023
@erikbelusic
Copy link
Author

Can you elaborate on what you mean by the "CPK test data". Happy to help out, but I am not sure on what you feel the correct behavior should be in this scenario.

I know there is a similar thread on rails mainline i found rails/rails#49671 where this is being discussed a bit as well.

In our project we have patched _write_attribute as seen here: rails/rails#49671 (comment) to solve our short term issue.

@cfis
Copy link
Contributor

cfis commented Nov 14, 2023

The CPK test data are the models and fixtures found here - https://github.com/composite-primary-keys/composite_primary_keys/tree/master/test/fixtures.

Hopefully there are models setup close to what you need?

@erikbelusic
Copy link
Author

I can take a look, but the part I am not clear on is what additional test you would like to see here? Are you agreeing that setting a relationship should never overwrite parts of the primary key?

@cfis
Copy link
Contributor

cfis commented Nov 15, 2023

That's a good question. If you don't update the part of the primary key shared with the relationship, then you can't set the relationship. So what is supposed to happen, throw an error?

@erikbelusic
Copy link
Author

So, I think depending on the design of your primary keys, you can still set the relationship. I think the only time columns in a relationship and the PK would be shared is if you primary key design is what I have seen referred to as a tennant-key design, where some tennant id is in every models primary key, and data is strictly segmented by those values. In this case skipping the shared columns when setting the relationship, would still set the non-shared columns, which should result in the same end state.

In the project we are using this gem on, we allow each of the PK fields to be updated either directly, or if the column is currently null. In the example above, you can see the expectation is that once the PK fields are set, they are not changed inadvertently by setting a relationship that shares one of those columns with the PK.

# The failing spec
  it "creates with explicitly setting reporter to nil" do
    issue = V2Issue.find_or_initialize_by(organization_id: @organization.id, pmt_id: @pmt_id, external_id: @issue_external_id)
    expect do
      issue.update!(reporter_pmt_user: nil)
    end.to change(V2Issue, :count).by(1)
  end

While here this makes sense for us I am not sure it would make sense for everyones' use cases, but because our CPK contains a column that is essentially a tenant key, setting the relationship with a VcsUser that belongs to another org should never happen anyway.

We also have an example where we do want the relationship to set part of the PK, and thats the following

class Commit < BaseRecord
    belongs_to :organization
   
    has_many :commit_issue_associations, foreign_key: %i[organization_id commit_vcs_id commit_repository_external_id commit_oid]
end

class CommitIssueAssociation < BaseRecord
    belongs_to :issue, foreign_key: %i[organization_id issue_pmt_id issue_external_id]
    belongs_to :commit, foreign_key: %i[organization_id commit_vcs_id commit_repository_external_id commit_oid]
  end
commit.commit_issue_associations.find_or_create_by!(issue: issue)

and so here we allow the relationship being set to set part of the PK, but only because its null at that time.

I think some options are:

  • take our approach as the solution - disallow changing of the PK with the relationship
  • make our approach an opt-in/configurable feature, either globally or per model
    • something like cpk_overwrite_primary_key_columns false as a class method called in the AR definition
  • more configurable
    • something like cpk_protected_columns primary_key or cpk_protected_columns :col1, :col2 as a class method called in the model definition

PS flexible on the method names. I tend to prefer verbose clear methods, and I am unsure if prefixing with something for a namespacing effect is needed here or not.

@cfis
Copy link
Contributor

cfis commented Nov 23, 2023

Thanks for the explanation. I do have a hard time seeing why there would be a setup like this. The tenant_id kinda makes sense, but why would the tennant_id need to be part of the primary key when there is likely already a unique autogenerated key?

But anyway, I can convince myself the current behavior is correct. If you update a relationship that happens to to share a column with a primary key then it should update the primary key too. If you don't allow that, how else could you change the relationship?

I guess the choice after that (your choice #1) is if updating a relationship causes a primary key value to change, then disallow it and throw an error. Not seeing the null case you mention because databases (well postgresql and mysql) don't allow null values in primary keys. Maybe the record hasn't been created yet?

I'm not sure this is worth adding to CPK though. Could you implement a one-off solution using ActiveRecord callbacks for the cases where you want to disallow updates to the PK?

@erikbelusic
Copy link
Author

I do have a hard time seeing why there would be a setup like this. The tenant_id kinda makes sense, but why would the tennant_id need to be part of the primary key when there is likely already a unique autogenerated key?

Our scenario is that we do not use auto incrementing ids - we are using natural primary keys. Because two customers could create data with the same natural components, the tennant_id becomes a requirement. e.g. data only needs to be unique within a customers dataset.

If you update a relationship that happens to to share a column with a primary key then it should update the primary key too. If you don't allow that, how else could you change the relationship?

Right, so this is where it gets interesting. In our scenario, changing a primary key column via setting a relationship NEVER makes sense, because it could be saying move this data from customer X to customer Y. I recognize this may not make sense for all use cases. But I do think setting part of the PK to null when trying to null out a relationship should also never happen, as the PK cannot have null component in most RDBMS, if not all.

Not seeing the null case you mention because databases (well postgresql and mysql) don't allow null values in primary keys. Maybe the record hasn't been created yet?

This comes into play for join models using CPK - where the primary key is a combination of the two related objects "FKs".

I'm not sure this is worth adding to CPK though. Could you implement a one-off solution using ActiveRecord callbacks for the cases where you want to disallow updates to the PK?

We have implemented a one off solution, however, we had to monkey patch and not use callbacks, as many relations could attempt to mutate a PK column, and the callbacks only have access to the original and current values, and you could lose data depending on what you are doing. For us it made more sense to overwrite the attr writer.

If you don't feel preventing changing any part of the PK should be part of the CPK behavior, thats understandable. But I would say disallowing a PK column value to be set as null should probably be prevented.

@cfis
Copy link
Contributor

cfis commented Dec 1, 2023

Thanks for the example of moving data between two tennants - that makes sense.

Is your monkey patch suitable for a PR - or could be made to be suitable? I'm ok with this idea as long as its opt-in and doesn't overcomplicate the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants