Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: planner drops DISTINCT from derived table to sharded keyspace with unique lookup column #13578

Closed
maxenglander opened this issue Jul 20, 2023 · 2 comments · Fixed by #15672

Comments

@maxenglander
Copy link
Collaborator

maxenglander commented Jul 20, 2023

Overview of the Issue

Given a query form SELECT [d].[a] FROM (SELECT DISTINCT [a] FROM [b] WHERE [lookup_col] = [c]) AS [d] against a sharded keyspace, the planner drops the DISTINCT from the subquery.

This query form can be rewritten in a way that the planner does not drop the DISTINCT, e.g.: SELECT DISTINCT [a] FROM [b] WHERE [lookup_col] = [c]. However sometimes the end user has an ORM that is not flexible, and making these changes can be difficult or tedious if there are lots of impacted queries.

Reproduction Steps

  1. Create vschemas.json:
{
  "sharded": {
    "sharded": true,
    "vindexes": {
      "lookup_vdx": {
        "type": "lookup_unique",
        "params": {
          "from": "lookup_id",
          "table": "unsharded.lookup_idx",
          "to": "keyspace_id"
        },
        "owner": "test"
      },
      "hash_vdx": {
        "type": "hash"
      }
    },
    "tables": {
      "test": {
        "columnVindexes": [
          {
            "column": "hash_id",
            "name": "hash_vdx"
          },
          {
            "column": "lookup_id",
            "name": "lookup_vdx"
          }
        ]
      }
    }
  },
  "unsharded": {
    "tables": {
      "lookup_idx": {
      }
    }
  }
}
  1. Create schema.sql:
CREATE TABLE `lookup_idx` (
  `lookup_id` smallint unsigned NOT NULL,
  `keyspace_id` varbinary(10) NOT NULL,
  PRIMARY KEY (`lookup_id`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;
CREATE TABLE `test` (
  `hash_id` bigint unsigned NOT NULL,
  `lookup_id` bigint unsigned NOT NULL,
  `name` varchar(32) NOT NULL,
  PRIMARY KEY (`lookup_id`),
  UNIQUE KEY `test_lookup_id_idx` (`lookup_id`),
  KEY `test_name_idx` (`name`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;
  1. Create queries.sql:
SELECT `t`.`name`
  FROM (
         SELECT DISTINCT `t`.`name`
           FROM `test` AS `t`
          WHERE `lookup_id` = 1
       ) AS `t`;
  1. Run vtexplain:
vtexplain --vschema-file vschemas.json --schema-file schema.sql --sql-file queries.sql --shards 4
----------------------------------------------------------------------
SELECT `t`.`name`
  FROM (
         SELECT DISTINCT `t`.`name`
           FROM `test` AS `t`
          WHERE `lookup_id` = 1
       ) AS `t`

1 unsharded/-: select lookup_id, keyspace_id from lookup_idx where lookup_id in (1) limit 10001
2 sharded/40-80: select t.`name` from (select t.`name` from test as t where lookup_id = 1) as t limit 10001

----------------------------------------------------------------------

Binary Version

Version: 18.0.0-SNAPSHOT (Git revision 138d540ca79d0ac8a1aa3c9ef906f6aafd5edb2d branch 'main') built on Thu Jul 20 09:08:00 EDT 2023 by [email protected] using go1.20.1 darwin/arm64

Operating System and Environment details

  • Operating system: Mac OS, Ventura 13.4
  • Architecture: M1 ARM64
@maxenglander maxenglander added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: Query Serving labels Jul 20, 2023
@GuptaManan100 GuptaManan100 removed the Needs Triage This issue needs to be correctly labelled and triaged label Jul 21, 2023
@harshit-gangal harshit-gangal changed the title Bug Report: planner drops DISTINCT from subquery to sharded keyspace with unique lookup column Bug Report: planner drops DISTINCT from derived table to sharded keyspace with unique lookup column Apr 3, 2024
@deepthi
Copy link
Member

deepthi commented Apr 3, 2024

@maxenglander with all the refactoring that has happened since v18, this may no longer be an issue. Is it possible to re-test?

@systay
Copy link
Collaborator

systay commented Apr 8, 2024

@maxenglander with all the refactoring that has happened since v18, this may no longer be an issue. Is it possible to re-test?

I did retest it, and it was still wrong. Fix here: ##15672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants