Skip to content

Commit

Permalink
Merge branch 'main' into fix_restart
Browse files Browse the repository at this point in the history
  • Loading branch information
small-turtle-1 authored Dec 11, 2024
2 parents f6812f8 + 527b970 commit c6994ad
Show file tree
Hide file tree
Showing 16 changed files with 275 additions and 2,251 deletions.
3 changes: 2 additions & 1 deletion benchmark/local_infinity/infinity_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,8 @@ int main() {
output_columns,
nullptr,
nullptr,
nullptr);
nullptr,
false);
});
results.push_back(fmt::format("-> Select QPS: {}", total_times / tims_costing_second));
}
Expand Down
5 changes: 1 addition & 4 deletions docs/guides/search_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,10 @@ Example: `"blooms"`
#### AND multiple terms

- `"space AND efficient"`
- `"space && efficient"`
- `"space + efficient"`

#### OR multiple terms

- `"Bloom OR filter"`
- `"Bloom || filter"`
- `"Bloom filter"`

:::tip NOTE
Expand All @@ -135,7 +132,7 @@ Example: `"title:(quick OR brown) AND body:foobar"`

#### Escape character

Use `\` to escape reserved characters like `:` `~` `(` `)` `"` `+` `-` `=` `&` `|` `[` `]` `{` `}` `*` `?` `\` `/`. For example: `"space\-efficient"`.
Use `\` to escape reserved characters like ` ` `(` `)` `^` `"` `'` `~` `*` `?` `:` `\`. For example: `"space\:efficient"`.

### Scoring

Expand Down
16 changes: 8 additions & 8 deletions docs/references/http_api_reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1912,10 +1912,10 @@ curl --request GET \
A non-empty text string to search for. Used *only* when `"match_method"` is set to `"text"`.
You can use various search options within the matching text, including:
- Single terms: `"blooms"`
- OR multiple terms: `"Bloom OR filter"`, `"Bloom || filter"` or just `"Bloom filter"`
- OR multiple terms: `"Bloom OR filter"` or just `"Bloom filter"`
- Phrase search: `'"Bloom filter"'`
- AND multiple terms: `"space AND efficient"`, `"space && efficient"` or `"space + efficient"`
- Escaping reserved characters: `"space\-efficient"`
- AND multiple terms: `"space AND efficient"`
- Escaping reserved characters: `"space\:efficient"`
- Sloppy phrase search: `'"harmful chemical"~10'`
- Field-specific search: `"title:(quick OR brown) AND body:foobar"`
- `element_type`: `str`, *Required*
Expand Down Expand Up @@ -1979,17 +1979,17 @@ curl --request GET \
- If `"fields"` is an empty string, this parameter specifies the default field to search on.
- **"operator"**: `str`, *Optional*
- If not specified, the search follows Infinity's full-text search syntax, meaning that logical and arithmetic operators, quotation marks and escape characters will function as full-text search operators, such as:
- AND operator: `AND`, `&&`, `+`
- OR operator: `OR`, `||`
- NOT operator: `NOT`, `!`, `-`
- AND operator: `AND`
- OR operator: `OR`
- NOT operator: `NOT`
- PAREN operator: `(`, `)`, need to appear in pairs, and can be nested.
- COLON operator: `:`: Used to specify field-specific search, e.g., `body:foobar` searches for `foobar` in the `body` field.
- CARAT operator: `^`: Used to boost the importance of a term, e.g., `quick^2 brown` boosts the importance of `quick` by a factor of 2, making it twice as important as `brown`.
- TILDE operator: `~`: Used for sloppy phrase search, e.g., `"harmful chemical"~10` searches for the phrase `"harmful chemical"` within a tolerable distance of 10 words.
- SINGLE_QUOTED_STRING: Used to search for a phrase, e.g., `'Bloom filter'`.
- DOUBLE_QUOTED_STRING: Used to search for a phrase, e.g., `"Bloom filter"`.
- Escape characters: Used to escape reserved characters, e.g., `space\-efficient`. Starting with a backslash `\` will escape the following characters:
`' '`, `'+'`, `'-'`, `'='`, `'&'`, `'|'`, `'!'`, `'('`, `')'`, `'{'`, `'}'`, `'['`, `']'`, `'^'`, `'"'`, `'~'`, `'*'`, `'?'`, `':'`, `'\'`, `'/'`
- Escape characters: Used to escape reserved characters, e.g., `space\:efficient`. Starting with a backslash `\` will escape the following characters:
`' '`, `'('`, `')'`, `'^'`, `'"'`, `'\''`, `'~'`, `'*'`, `'?'`, `':'`, `'\\'`
- If specified, Infinity's full-text search syntax will not take effect, and the specified operator will be interpolated into `matching_text`.
Useful for searching text including code numbers like `"A01-233:BC"`.
- `{"operator": "or"}`: Interpolates the `OR` operator between words in `matching_text` to create a new search text.
Expand Down
16 changes: 8 additions & 8 deletions docs/references/pysdk_api_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2033,10 +2033,10 @@ To display your query results, you must chain this method with `output(columns)`
A non-empty text string to search for. You can use various search options within the matching text, including:

- Single terms: `"blooms"`
- OR multiple terms: `"Bloom OR filter"`, `"Bloom || filter"` or just `"Bloom filter"`
- OR multiple terms: `"Bloom OR filter"` or just `"Bloom filter"`
- Phrase search: `'"Bloom filter"'`
- AND multiple terms: `"space AND efficient"`, `"space && efficient"` or `"space + efficient"`
- Escaping reserved characters: `"space\-efficient"`
- AND multiple terms: `"space AND efficient"`
- Escaping reserved characters: `"space\:efficient"`
- Sloppy phrase search: `'"harmful chemical"~10'`
- Field-specific search: `"title:(quick OR brown) AND body:foobar"`

Expand All @@ -2052,17 +2052,17 @@ An optional dictionary specifying the following search options:
- If `"fields"` is an empty string, this parameter specifies the default field to search on.
- **"operator"**: `str`, *Optional*
- If not specified, the search follows Infinity's full-text search syntax, meaning that logical and arithmetic operators, quotation marks and escape characters will function as full-text search operators, such as:
- AND operator: `AND`, `&&`, `+`
- OR operator: `OR`, `||`
- NOT operator: `NOT`, `!`, `-`
- AND operator: `AND`
- OR operator: `OR`
- NOT operator: `NOT`
- PAREN operator: `(`, `)`, need to appear in pairs, and can be nested.
- COLON operator: `:`: Used to specify field-specific search, e.g., `body:foobar` searches for `foobar` in the `body` field.
- CARAT operator: `^`: Used to boost the importance of a term, e.g., `quick^2 brown` boosts the importance of `quick` by a factor of 2, making it twice as important as `brown`.
- TILDE operator: `~`: Used for sloppy phrase search, e.g., `"harmful chemical"~10` searches for the phrase `"harmful chemical"` within a tolerable distance of 10 words.
- SINGLE_QUOTED_STRING: Used to search for a phrase, e.g., `'Bloom filter'`.
- DOUBLE_QUOTED_STRING: Used to search for a phrase, e.g., `"Bloom filter"`.
- Escape characters: Used to escape reserved characters, e.g., `space\-efficient`. Starting with a backslash `\` will escape the following characters:
`' '`, `'+'`, `'-'`, `'='`, `'&'`, `'|'`, `'!'`, `'('`, `')'`, `'{'`, `'}'`, `'['`, `']'`, `'^'`, `'"'`, `'~'`, `'*'`, `'?'`, `':'`, `'\'`, `'/'`
- Escape characters: Used to escape reserved characters, e.g., `space\:efficient`. Starting with a backslash `\` will escape the following characters:
`' '`, `'('`, `')'`, `'^'`, `'"'`, `'\''`, `'~'`, `'*'`, `'?'`, `':'`, `'\\'`
- If specified, Infinity's full-text search syntax will not take effect, and the specified operator will be interpolated into `matching_text`.
Useful for searching text including code numbers like `"A01-233:BC"`.
- `{"operator": "or"}`: Interpolates the `OR` operator between words in `matching_text` to create a new search text.
Expand Down
7 changes: 5 additions & 2 deletions src/common/analyzer/analyzer_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ import korean_analyzer;
import standard_analyzer;
import ngram_analyzer;
import rag_analyzer;
import keyword_analyzer;
import whitespace_analyzer;
import ik_analyzer;
import logger;

Expand Down Expand Up @@ -322,7 +322,10 @@ Tuple<UniquePtr<Analyzer>, Status> AnalyzerPool::GetAnalyzer(const std::string_v
return {MakeUnique<NGramAnalyzer>(ngram), Status::OK()};
}
case Str2Int(KEYWORD.data()): {
return {MakeUnique<KeywordAnalyzer>(), Status::OK()};
return {MakeUnique<WhitespaceAnalyzer>(), Status::OK()};
}
case Str2Int(WHITESPACE.data()): {
return {MakeUnique<WhitespaceAnalyzer>(), Status::OK()};
}
default: {
if(std::filesystem::is_regular_file(name)) {
Expand Down
1 change: 1 addition & 0 deletions src/common/analyzer/analyzer_pool.cppm
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ public:
static constexpr std::string_view RAG = "rag";
static constexpr std::string_view IK = "ik";
static constexpr std::string_view KEYWORD = "keyword";
static constexpr std::string_view WHITESPACE = "whitespace";

private:
CacheType cache_{};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,14 @@ module;

#include <sstream>
#include <string>
module keyword_analyzer;

module whitespace_analyzer;
import stl;
import term;
import analyzer;

namespace infinity {

int KeywordAnalyzer::AnalyzeImpl(const Term &input, void *data, HookType func) {
int WhitespaceAnalyzer::AnalyzeImpl(const Term &input, void *data, HookType func) {
std::istringstream is(input.text_);
std::string t;
u32 offset = 0;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright(C) 2023 InfiniFlow, Inc. All rights reserved.
// Copyright(C) 2024 InfiniFlow, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand All @@ -14,16 +14,17 @@

module;

export module keyword_analyzer;
export module whitespace_analyzer;
import stl;
import term;
import analyzer;

namespace infinity {
export class KeywordAnalyzer : public Analyzer {

export class WhitespaceAnalyzer : public Analyzer {
public:
KeywordAnalyzer() = default;
~KeywordAnalyzer() override = default;
WhitespaceAnalyzer() = default;
~WhitespaceAnalyzer() override = default;

protected:
int AnalyzeImpl(const Term &input, void *data, HookType func) override;
Expand Down
1 change: 0 additions & 1 deletion src/parser/generate_parser.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ flex -di --reentrant --bison-bridge --bison-location -Cem -oexpression_lexer.cpp
bison -oexpression_parser.cpp --header=expression_parser.h expression_parser.y -Wcounterexamples -d -v

flex -+dvB8 -Cem -osearch_lexer.cpp search_lexer.l
flex -+dvB8 -Cem -osearch_lexer_plain.cpp search_lexer_plain.l
bison -osearch_parser.cpp --header=search_parser.h search_parser.y -Wcounterexamples -d -v
Loading

0 comments on commit c6994ad

Please sign in to comment.