Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document reference tables #1849

Merged
merged 4 commits into from
Oct 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions content/en/docs/21.0/reference/vreplication/reference_tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
title: Reference Tables
description: Using and managing reference tables in Vitess
weight: 100
aliases: [ '/docs/reference/vreplication/v2/referencetables/' ]
---

{{< warning >}}

### Shard Targeting and DMLs

Any DMLs executed on reference tables using shard targeting to a specific shard in the target keyspace, the DMLs to
reference tables will *NOT* be routed to the source keyspace. Writes will happen on the copy of the table in the target
keyspace and the the workflow will most likely break at some future time.

{{< /warning >}}

# Reference Tables in Vitess

Vitess supports the concept of **Reference Tables** as a feature that allows you to keep identical copies of tables
across multiple shards in sync. This is useful for small lookup-type tables that are commonly used by applications. For
example, dimension tables like countries, currencies, states, time zones and shipping methods or even entities like
products, product categories, manufacturers etc. which only change occasionally.

By providing mechanisms to keep consistent copies of these tables in all shards, Vitess ensures that
queries involving reference tables can be served efficiently without the need for cross-keyspace lookups.

The source of truth for reference tables is in an unsharded keyspace. All DMLs on reference tables are
executed in the source keyspace. Vitess provides VReplication workflows to replicate these changes to all shards in
a target sharded keyspace.

This guide provides an example of how to setup reference tables and how to start the VReplication workflow
required to keep them in sync across all shards.

## Specifying reference tables

This is done in the vschema of both the source and target keyspaces. The source keyspace is the source of truth
and where DML operations against the table are performed. The target keyspace then maintains the in-sync copies and
supports local reads on the tables.

### Source VSchema

```json
{
"tables": {
"countries": {
"type": "reference"
}
}
}
```

### Target VSchema

Here in addition to the type, you need to specify the source of the reference table, used by vtgate to
route DML queries to the source keyspace.

```json
{
"tables": {
"countries": {
"type": "reference",
"source": "source.countries"
}
}
}
```

## Query Serving features

## Select Queries

Vitess optimizes query serving for reference tables. Since reference tables are present in every shard, Vitess ensures
that SELECT queries involving reference tables can be executed locally within each shard without needing to perform
lookups on the unsharded keyspace which hosts the reference table.

For example, running this query in a sharded keyspace with a reference table `countries` will be served locally by each
shard.

```sql
SELECT c.Name Country, sum(s.Total) SalesByCountry FROM countries c, sales s WHERE s.country_id = c.id GROUP BY c.Name;
```

### DML Queries

If a DML query for a reference table is executed on the target keyspace, vtgate will route the query to the source
keyspace. Example:

```sql
UPDATE countries SET name = 'The Netherlands' WHERE name = 'Netherlands'
```

Once this query is executed against the source table, the VReplication workflows (see below) will propagate the
changes to the corresponding reference tables in all other shards.

## Keeping Reference Tables in Sync

The [VReplication Materialize](https://vitess.io/docs/user-guides/migration/materialize/) workflow is the mechanism that
you can use to keep reference tables in sync with the source across all shards. An example of how to create such a
workflow is:

`Materialize --target-keyspace target --workflow ref1 create --source-keyspace source --reference-tables countries,currencies`

### Monitoring VReplication Lag

The reference table copies on the target are essentially caches of the source table which are synced near-realtime
using `Materialize` workflows. The `Materialize` workflows keep the reference tables in sync by using binlog
replication. If the load on the source and/or target is high, it is possible that there is a lag between the source
getting updated and those updates being propagated to the target.

You can monitor the lag by using `Workflow Show` on the workflows or the VTAdmin UI, and looking at the value
for the `max_replication_lag` in its output.