Skip to content

Commit

Permalink
docs: search dsl design
Browse files Browse the repository at this point in the history
  • Loading branch information
panshuai111 committed Oct 17, 2023
1 parent 46b4f7f commit 09bbf1b
Showing 1 changed file with 135 additions and 0 deletions.
135 changes: 135 additions & 0 deletions docs/design_docs/search_language.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Abstract

This article presents two search languages designed to retrieve the resources, and compares their advantages and disadvantages.

# Motivation

Enabling the retrieval of resources from multiple clusters is a crucial functionality of Karbour. However, the current search feature is relatively basic, only supporting keyword and content searches. To provide a more comprehensive search capability, it is necessary to design a dedicated DSL for searching.

# Proposal

## Dos

* User-friendliness: Minimize the learning and usage costs for users. Most search statements should have a simple structure and be achievable in a single line.
* Rich functionality: Support searches with multiple keywords and content, and provide operations such as AND, OR, NOT, and regular expressions.
* Maintain versatility: Strive to incorporate industry best practices as much as possible, avoiding reinventing the wheel.

## Don'ts

* Focus solely on syntax design without providing specific implementations.
* Omit discussions on how the matching results should be displayed.

## Design

## Overall Process

The overall process is as shown in the following diagram:
1. The user enters the Search DSL on the search interface
2. The search interface calls the Search API
3. The backend converts the Search API into an internal query format
4. The storage layer converts it into a specific query statement (e.g., Elasticsearch query).

```mermaid
flowchart TB
subgraph Karbour
subgraph SearchAPI
DSL[Search DSL] --> Convert[DSL Converter: DSL to Query]
end
subgraph StorageLayer
Query[Query Converter: Query to ES/ZinC/... DSL]
end
Dashboard --> SearchAPI[Search API]
SearchAPI[Search API] --> StorageLayer[Storage layer]
end
StorageLayer[Storage layer] --> C[Search Engine 1]
StorageLayer[Storage layer] --> D[Search Engine 2]
StorageLayer[Storage layer] --> E[Search Engine 3]
style DSL stroke-width:4px
```

## Solution Design

This section describes the syntax and description of the search patterns that Karbour plans to support for retrieving resource content.

### Solution 1: Using DSL for Search

#### Content Search

Content search is used to match against the content of resources. It uses `"..."` for exact matching and `/.../` for regular expression matching.

| Syntax | Description |
|---------------|------------------------------------------------|
| `foo` | Match the string `foo` exactly |
| `"foo bar"` | Match the string `foo bar` exactly |
| `/foo.*bar/` | Match regular expression `foo.*bar` |
| `foo AND bar` | Match resources containing both `foo` and `bar` |

#### Keyword search

Keyword search is used to match against keywords.

| Syntax | Description | Example |
|------------------------|----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| `key:"pattern"` | Matching resources with `key` as `pattern`. | `cluster:"foo"` is used to match resources with `cluster` equals to `foo` using exact matching. |
| `key:/regexp-pattern/` | Matching resources with `key` matching the regular expression pattern `regexp-pattern` | `cluster:"foo*"` is used to match resources with `cluster` matching the regular expression pattern `foo*` |

Supporting keywords: `cluster`, `apiVersion`, `kind`, `namespace`, `name`, `label`, `annotation`, `content`.

#### Boolean operators

Multiple expressions can be connected using boolean operators.

| Operator | Description | Example |
|-----------------|-------------------------------------------------------------------|---------------------------------------------------------|
| `NOT exp1` | Excluding resources matched by `exp`. | `NOT cluster:"cluster1"` |
| `exp1 AND exp2` | Selecting resources that simultaneously match `exp1` and `exp2`. | `cluster:"cluster1" AND apiVersion:"v1"` |
| `exp1 OR exp2` | Selecting resources that match either `exp1` or `exp2`. | `cluster:"cluster1" OR cluster:"cluster2"` |
| `(` `)` | Modifying the association order of expressions using parentheses. | `(name:"name1" OR name:"name2") AND cluster:"cluster1"` |

### Solution 2: Using DSL for Search

#### Synopsis

```sql
SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
```

#### Mapping concepts across SQL and Elasticsearch

| SQL | Elasticsearch | Description |
|--------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| column | field | In both cases, at the lowest level, data is stored in named entries, of a variety of data types, containing one value. SQL calls such an entry a column while Elasticsearch a field. Notice that in Elasticsearch a field can contain multiple values of the same type (essentially a list) while in SQL, a column can contain exactly one value of said type. Elasticsearch SQL will do its best to preserve the SQL semantic and, depending on the query, reject those that return fields with more than one value. |
| row | document | Columns and fields do not exist by themselves; they are part of a row or a document. The two have slightly different semantics: a row tends to be strict (and have more enforcements) while a document tends to be a bit more flexible or loose (while still having a structure). |
| table | index | The target against which queries, whether in SQL or Elasticsearch get executed against. |

support table: `resources`

support column: `content`,`cluster`,`apiVersion`,`group`,`namespace`,`name`,`label`,`annotation`

example:

```sql
SELECT * from resources WHERE cluster=tes1
SELECT * from resources WHERE content CONTAINS 'test1'
```

## Solution Comparison

| Solution | Advantages | Disadvantages |
|----------|------------------------------------------------|-------------------------------------------------------|
| DSL | simple searches are easier to get started with | complex searches require learning the relevant syntax |
| SQL | no need to learn an additional search language | writing simple searches can sometimes be more verbose |

# Reference
* [Search query syntax - Sourcegraph docs](https://docs.sourcegraph.com/code_search/reference/queries)
* [Understanding GitHub Code Search syntax - GitHub Docs](https://docs.github.com/en/search-github/github-code-search/understanding-github-code-search-syntax)
* [Query DSL | Elasticsearch Guide [8.10] | Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html)
* [Basic Editing in Visual Studio Code](https://code.visualstudio.com/docs/editor/codebasics#_advanced-search-options)
* [SQL | Elasticsearch Guide [8.10] | Elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-sql.html)

0 comments on commit 09bbf1b

Please sign in to comment.