Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
Signed-off-by: Andres Taylor <[email protected]>
  • Loading branch information
systay committed Nov 26, 2024
1 parent 3d8bf2b commit 136969f
Showing 1 changed file with 100 additions and 38 deletions.
138 changes: 100 additions & 38 deletions go/transactions/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# VT Transactions

The vt transactions command is a sub-command of the vt toolset, designed to analyze query logs, identify transaction patterns, and produce a JSON report summarizing these patterns.
This tool is particularly useful for understanding complex transaction behaviors, optimizing database performance, choosing sharding strategy, and auditing transactional queries.
The `vt transactions` command is a sub-command of the `vt` toolset, designed to analyze query logs, identify transaction patterns, and produce a JSON report summarizing these patterns. This tool is particularly useful for understanding complex transaction behaviors, optimizing database performance, choosing a sharding strategy, and auditing transactional queries.

Note: The JSON output generated by `vt transactions` is primarily intended for consumption by the `vt summarize` tool, which can aggregate multiple analysis reports into a human-readable summary.

## Usage

The basic usage of vt transactions is:
The basic usage of `vt transactions` is:

```bash
vt transactions querylog.log > report.json
Expand All @@ -27,60 +28,121 @@ The output JSON file contains an array of transaction patterns, each summarizing

```json
{
"query-signatures": [
"update pos_reports where id = :0 set `csv`, `error`, intraday, pos_type, ...",
"update pos_date_requests where cache_key = :1 set cache_value"
],
"predicates": [
"pos_date_requests.cache_key = ?",
"pos_reports.id = ?"
],
"count": 223
"fileType": "transactions",
"signatures": [
{
"count": 2,
"query-signatures": [
{
"op": "update",
"affected_table": "tblA",
"updated_columns": [
"apa"
],
"predicates": [
{
"table": "tblA",
"col": "foo",
"op": 0,
"val": 0
},
{
"table": "tblA",
"col": "id",
"op": 0,
"val": -1
}
]
},
{
"op": "update",
"affected_table": "tblB",
"updated_columns": [
"monkey"
],
"predicates": [
{
"table": "tblB",
"col": "bar",
"op": 0,
"val": 0
},
{
"table": "tblB",
"col": "id",
"op": 0,
"val": -1
}
]
}
]
}
]
}
```

### Fields Explanation

* query-signatures: An array of generalized query patterns involved in the transaction. Placeholders like :0, :1, etc., represent variables in the queries.
* predicates: An array of predicates (conditions) extracted from the queries, generalized to identify patterns.
* count: The number of times this transaction pattern was observed in the logs.
The JSON output from `vt transactions` is structured to represent patterns of transactions found in your query logs. Here’s a breakdown of each field:

#### Top-Level Fields

* fileType: Indicates the type of the file. For outputs from `vt transactions`, this will be "transactions".
* signatures: An array where each element represents a unique transaction pattern detected in the logs.

#### Inside Each Signature

Each element in the signatures array is an object that summarizes a specific transaction pattern. It contains the following fields:
* count: The number of times this transaction pattern was observed.
* query-signatures: An array of queries that are part of this transaction pattern. Each query is represented in a generalized form to abstract away specific values and focus on the structure and relationships.

### Understanding predicates
#### Inside Each Query Signature

The predicates array lists the conditions used in the transactional queries, with variables generalized for pattern recognition.
* Shared Variables: If the same variable is used across different predicates within a transaction, it is assigned a numerical placeholder (e.g., 0, 1, 2). This indicates that the same variable or value is used in these predicates.
* Unique Variables: Variables that are unique to a single predicate are represented with a ?.
Each object in the query-signatures array represents a generalized query and includes:
* op: The operation type (e.g., "insert", "update", "delete").
* affected_table: The table affected by the query.
* updated_columns: (Only for update operations) An array of column names that are updated by the query.
* predicates: An array of conditions (also known as predicates) used in the query’s WHERE clause. Each predicate abstracts the condition to focus on the pattern rather than specific values. Not all predicates are included in the query signature; only those that could be used by the planner to select if the transaction is a single shard or a distributed transaction.

#### Inside Each Predicate

Each predicate object in the predicates array includes:
* table: The name of the table referenced in the condition.
* col: The column name used in the condition.
* op: A code representing the comparison operator used in the condition. For example:
- 0 might represent the "=" operator.
- Other numbers might represent different operators like <, >, LIKE, etc.
* val: A generalized placeholder value used in the condition. Instead of showing specific values, placeholders are used to indicate where values are compared. Identical placeholders across different predicates suggest that the same variable or parameter is used. -1 is a special value that indicates a unique value used only by this predicate.

### Example Explained

Consider the following predicates array:

```json
{
"predicates": [
"timesheets.day = ?",
"timesheets.craft_id = ?",
"timesheets.store_id = ?",
"dailies.day = 0",
"dailies.craft_id = 1",
"dailies.store_id = 2",
"tickets.day = 0",
"tickets.craft_id = 1",
"tickets.store_id = 2"
]
}
"predicates": [
{
"table": "tblA",
"col": "foo",
"op": 0,
"val": 0
},
{
"table": "tblA",
"col": "id",
"op": 0,
"val": -1
}
]
```

* Shared Values: Predicates with the same value across different conditions are assigned numerical placeholders (0, 1, 2), indicating that the same variable or value is used in these predicates.
* For example, `dailies.craft_id = 1` and `tickets.craft_id = 1` share the same variable or value (represented as 1).
* Unique Values: Predicates used only once are represented with ?, indicating a unique or less significant variable in the pattern.
* For example, `timesheets.day = ?` represents a unique value for day.
* The first predicate represents a condition on tblA.foo, using the operator code 0 (e.g., "="), with a generalized value 0.
* The second predicate represents a condition on tblA.id, also using the operator code 0, with a generalized value -1. That means that this value was only used by this predicate and not shared by any other queries in the transaction.

This numbering helps identify the relationships between different predicates in the transaction patterns and can be used to optimize queries or understand transaction scopes.
This numbering helps identify the relationships between different predicates in the transaction patterns and can be used to help guide choices in sharding strategies.

## Practical Use Cases

* Optimization: Identify frequently occurring transactions to optimize database performance.
* Sharding Strategy: When implementing horizontal sharding, it’s crucial to ensure that as many transactions as possible are confined to a single shard. The insights from vt transactions can help in choosing appropriate sharding keys for your tables to achieve this.
* Sharding Strategy: When implementing horizontal sharding, it’s crucial to ensure that as many transactions as possible are confined to a single shard. The insights from `vt transactions` can help in choosing appropriate sharding keys for your tables to achieve this.
* Audit: Analyze transactional patterns for security audits or compliance checks.
* Debugging: Understand complex transaction behaviors during development or troubleshooting.

0 comments on commit 136969f

Please sign in to comment.