Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order foreign keys DML operations to get correct rows affected #15779

Closed
wants to merge 5 commits into from

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Apr 23, 2024

Description

This PR fixes the bug described in #15826.

As described in #15826 (comment), the problem arises from the ordering of dml operations such that we end up deleting from the parent table first, causing an unintended delete on the child table.

Whats interesting to note is that if the original query has its tables switched around, then the rows affected are correct since we delete from the child first and then the parent! So the query delete fk_t11, fk_t12 from fk_t11 join fk_t12 using (id) where fk_t11.id = 5 fails with incorrect rows affected, but delete fk_t12, fk_t11 from fk_t12 join fk_t11 using (id) where fk_t11.id = 5 works correctly.

This PR fixes this issue. We realized after some discussion that the problem can be fixed if we are able to order the dml operations such that the delete on the child happens before the parent table.

After some more thinking and experimenting, I realized we are already building a graph when we update the foreign keys information to check for cycles in foreign keys. We can use topological sorting (https://en.wikipedia.org/wiki/Topological_sorting) on the graph to find an ordering of the tables in the keyspace such that all CASCADE foreign keys have parent tables coming before child tables.

Once we have this sorting, we store the order of the tables in the Vschema as part of the Table information in the newly introduced FkOrder field.

We can then use this field to order the dml operations for foreign key tables and ensure that the delete operations happen on the child table first. This ensures that the rows affected are calculated properly.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Manan Gupta <[email protected]>
Copy link
Contributor

vitess-bot bot commented Apr 23, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Apr 23, 2024
@GuptaManan100 GuptaManan100 added Type: Bug Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Apr 23, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Apr 23, 2024
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.42%. Comparing base (cf3acaa) to head (b526619).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15779      +/-   ##
==========================================
+ Coverage   68.41%   68.42%   +0.01%     
==========================================
  Files        1558     1559       +1     
  Lines      196353   196514     +161     
==========================================
+ Hits       134337   134468     +131     
- Misses      62016    62046      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GuptaManan100 GuptaManan100 changed the title Foreign key Rows Affected Error Order foreign keys DML operations to get correct rows affected May 1, 2024
@GuptaManan100
Copy link
Member Author

It was noticed that rearranging order of deletes and updates is actually incorrect and can lead to different results.

Here is an example for update reordering -

/*
 *        fk_multicol_t15
 *                    │
 *                    │
 *  On Delete Cascade │
 *  On Update Cascade │
 *                    │
 *                    ▼
 *        fk_multicol_t16
 *                    │
 * On Delete Set Null │
 * On Update Set Null │
 *                    │
 *                    ▼
 *        fk_multicol_t17──────────────────┐
 *                    │                    │
 *                    │                    │ On Delete Set Null
 *  On Delete Cascade │                    │ On Update Set Null
 *  On Update Cascade │                    │
 *                    │                    │
 *                    ▼                    ▼
 *        fk_multicol_t18      fk_multicol_t19
 */

mysql [localhost:8032] {msandbox} (fk_test) > insert into fk_multicol_t15(id, cola, colb) values (1, 7, 1), (2, 9, 1), (3, 12, 1);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (fk_test) > insert into fk_multicol_t16(id, cola, colb) values (1, 7, 1), (2, 9, 1), (3, 12, 1);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (fk_test) > insert into fk_multicol_t17(id, cola, colb) values (1, 7, 1), (2, 9, 1);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql [localhost:8032] {msandbox} (fk_test) > insert into fk_multicol_t19(id, cola, colb) values (1, 7, 1);
Query OK, 1 row affected (0.01 sec)

mysql [localhost:8032] {msandbox} (fk_test) > update fk_multicol_t17 join fk_multicol_t15 m1 on m1.id = fk_multicol_t17.id set m1.cola = m1.id + 8, fk_multicol_t17.colb = 32 where m1.id < 3;
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`fk_test`.`fk_multicol_t17`, CONSTRAINT `fk_multicol_t17_ibfk_1` FOREIGN KEY (`cola`, `colb`) REFERENCES `fk_multicol_t16` (`cola`, `colb`) ON DELETE SET NULL ON UPDATE SET NULL)

mysql [localhost:8032] {msandbox} (fk_test) > update fk_multicol_t15 m1 join fk_multicol_t17 on m1.id = fk_multicol_t17.id set m1.cola = m1.id + 8, fk_multicol_t17.colb = 32 where m1.id < 3;
Query OK, 4 rows affected (0.00 sec)
Rows matched: 4  Changed: 4  Warnings: 0

@GuptaManan100
Copy link
Member Author

Similarly, here is an example where reordering deletes leads to different results -

create table fk_t17
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col)
) Engine = InnoDB;

create table fk_t18
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col),
    foreign key (col) references fk_t17(col) on delete set null on update set null
) Engine = InnoDB;

create table fk_t19
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col),
    foreign key (col) references fk_t18(col) on delete restrict on update cascade
) Engine = InnoDB;

mysql [localhost:8032] {msandbox} (temp) > insert into fk_t17 values (1, 1);
Query OK, 1 row affected (0.01 sec)

mysql [localhost:8032] {msandbox} (temp) > insert into fk_t18 values (1, 1);
Query OK, 1 row affected (0.00 sec)

mysql [localhost:8032] {msandbox} (temp) > insert into fk_t19 values (1, 1);
Query OK, 1 row affected (0.00 sec)

mysql [localhost:8032] {msandbox} (temp) > delete fk_t18, fk_t17 from fk_t18 join fk_t17 using (id) where fk_t18.id = 1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`temp`.`fk_t19`, CONSTRAINT `fk_t19_ibfk_1` FOREIGN KEY (`col`) REFERENCES `fk_t18` (`col`) ON DELETE RESTRICT ON UPDATE CASCADE)
mysql [localhost:8032] {msandbox} (temp) > delete fk_t17, fk_t18 from fk_t17 join fk_t18 using (id) where fk_t18.id = 1;
Query OK, 2 rows affected (0.00 sec)

@GuptaManan100
Copy link
Member Author

Closing the PR because of the issues pointed ⬆️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: Incorrect rows affected in foreign keys enabled keyspace
1 participant