扩展 RediSearch

RediSearch 支持扩展机制，就像 Redis 支持模块一样。目前该 API 非常小，尚不支持在运行时动态加载扩展。相反，扩展必须用 C（或具有 C 接口的语言）编写并编译成将在运行时加载的动态库。

目前有两种扩展 API：

Query Expanders ，其作用是扩展查询标记（即词干分析器）。
Scoring Functions ，其作用是在查询时间内对搜索结果进行排名。

注册和加载扩展

扩展应该编译成 .so 文件，并在模块初始化时加载到 RediSearch 中。

编译

扩展应作为动态库进行编译和链接。可以在此处找到扩展的示例 Makefile 。

该文件夹还包含一个用于测试的示例扩展，可作为实现您自己的扩展的框架。

加载中

加载扩展是通过在加载 RediSearch 时附加 EXTLOAD {path/to/ext.so} 在 loadmodule 配置指令之后来完成的。例如：

sh $ redis-server --loadmodule ./redisearch.so EXTLOAD ./ext/my_extension.so

这会导致 RediSearch 自动加载扩展并注册其扩展器和记分器。

初始化扩展

扩展的入口点是一个具有以下签名的函数：

int RS_ExtensionInit(RSExtensionCtx *ctx);

加载扩展时，RediSearch 会查找此函数并调用它。该函数负责注册和初始化扩展器和记分器。

它应该在错误时返回 REDISEARCH_ERR 或在成功时返回 REDISEARCH_OK。

示例初始化函数

#include <redisearch.h> //must be in the include path

int RS_ExtensionInit(RSExtensionCtx *ctx) {

  /* Register  a scoring function with an alias my_scorer and no special private data and free function */
  if (ctx->RegisterScoringFunction("my_scorer", MyCustomScorer, NULL, NULL) == REDISEARCH_ERR) {
    return REDISEARCH_ERR;
  }

  /* Register a query expander  */
  if (ctx->RegisterQueryExpander("my_expander", MyExpander, NULL, NULL) ==
      REDISEARCH_ERR) {
    return REDISEARCH_ERR;
  }

  return REDISEARCH_OK;
}

调用您的自定义函数

执行查询时，您可以通过使用给定别名指定 SCORER 或 EXPANDER 参数来告诉 RediSearch 使用您的评分器或扩展器。例如：

FT.SEARCH my_index "foo bar" EXPANDER my_expander SCORER my_scorer

注意：扩展器和记分器别名区分大小写。

查询扩展器 API

目前，我们只支持基本查询扩展，一次一个令牌。扩展器可以决定使用任意数量的令牌扩展任何给定的令牌，这些令牌将在查询时进行联合合并。

扩展器的 API 如下：

#include <redisearch.h> //must be in the include path

void MyQueryExpander(RSQueryExpanderCtx *ctx, RSToken *token) {
    ...
}

RSQueryExpanderCtx

RSQueryExpanderCtx 是包含扩展私有数据的上下文，以及用于扩展查询的回调方法。它被定义为：

typedef struct RSQueryExpanderCtx {

  /* Opaque query object used internally by the engine, and should not be accessed */
  struct RSQuery *query;

  /* Opaque query node object used internally by the engine, and should not be accessed */
  struct RSQueryNode **currentNode;

  /* Private data of the extension, set on extension initialization */
  void *privdata;

  /* The language of the query, defaults to "english" */
  const char *language;

  /* ExpandToken allows the user to add an expansion of the token in the query, that will be
   * union-merged with the given token in query time. str is the expanded string, len is its length,
   * and flags is a 32 bit flag mask that can be used by the extension to set private information on
   * the token */
  void (*ExpandToken)(struct RSQueryExpanderCtx *ctx, const char *str, size_t len,
                      RSTokenFlags flags);

  /* SetPayload allows the query expander to set GLOBAL payload on the query (not unique per token)
   */
  void (*SetPayload)(struct RSQueryExpanderCtx *ctx, RSPayload payload);

} RSQueryExpanderCtx;

RSToken

RSToken 表示要扩展的单个查询令牌，定义为：

/* A token in the query. The expanders receive query tokens and can expand the query with more query
 * tokens */
typedef struct {
  /* The token string - which may or may not be NULL terminated */
  const char *str;
  /* The token length */
  size_t len;

  /* 1 if the token is the result of query expansion */
  uint8_t expanded:1;

  /* Extension specific token flags that can be examined later by the scoring function */
  RSTokenFlags flags;
} RSToken;

评分函数API

评分函数接收由查询评估的每个文档，用于最终排名。它可以访问产生文档的所有查询词，以及有关文档的元数据，例如其先验分数、长度等。

由于评分函数是针对每个文档进行评估的，可能是数百万次，而且由于 redis 是单线程的，因此它必须尽可能快地运行并进行大量优化。

评分函数应用于每个潜在结果（每个文档），并使用以下签名实现：

double MyScoringFunction(RSScoringFunctionCtx *ctx, RSIndexResult *res,
                                RSDocumentMetadata *dmd, double minScore);

RSScoringFunctionCtx 是一个实现了一些辅助方法的上下文。

RSIndexResult 是结果信息 - 包含文档 ID、频率、术语和偏移量。

RSDocumentMetadata 是一个保存有关文档的全局信息的对象，例如其先验分数。

minSocre 是将产生与搜索相关的结果的最小分数。它可用于在我们开始之前停止处理中途。

函数的返回值是 double 表示结果的最终分数。返回 0 会导致对结果进行计数，但如果有分数大于 0 的结果，它们将出现在其上方。要完全过滤结果而不将其计入总数，记分员应返回特殊值 RS_SCORE_FILTEROUT （内部设置为负无穷大或 -1/0）。

RSScoringFunctionCtx

这是一个包含以下成员的对象：

void *privdata ：指向由扩展在初始化时设置的对象的指针。
RSPayload 负载：由查询扩展器或客户端设置的负载对象。
int GetSlop(RSIndexResult *res) ：一种回调方法，它产生查询词之间的总最小距离。这可用于首选“斜率”较小且术语彼此更接近的结果。

RSIndexResult

这是一个对象，其中包含有关索引中当前结果的信息，它是导致当前文档被视为有效结果的所有术语的集合。

有关详细信息，请参阅 redisearch.h

RSDocumentMetadata

这是一个描述与当前查询无关的全局信息的对象，该信息与评分函数正在评估的文档有关。

示例查询扩展器

此示例查询扩展器使用术语 foo 扩展每个标记：

#include <redisearch.h> //must be in the include path

void DummyExpander(RSQueryExpanderCtx *ctx, RSToken *token) {
    ctx->ExpandToken(ctx, strdup("foo"), strlen("foo"), 0x1337);  
}

评分函数示例

这是一个实际的评分函数，计算文档的 TF-IDF，乘以文档分数，然后除以斜率：

#include <redisearch.h> //must be in the include path

double TFIDFScorer(RSScoringFunctionCtx *ctx, RSIndexResult *h, RSDocumentMetadata *dmd,
                   double minScore) {
  // no need to evaluate documents with score 0 
  if (dmd->score == 0) return 0;

  // calculate sum(tf-idf) for each term in the result
  double tfidf = 0;
  for (int i = 0; i < h->numRecords; i++) {
    // take the term frequency and multiply by the term IDF, add that to the total
    tfidf += (float)h->records[i].freq * (h->records[i].term ? h->records[i].term->idf : 0);
  }
  // normalize by the maximal frequency of any term in the document   
  tfidf /=  (double)dmd->maxFreq;

  // multiply by the document score (between 0 and 1)
  tfidf *= dmd->score;

  // no need to factor the slop if tfidf is already below minimal score
  if (tfidf < minScore) {
    return 0;
  }

  // get the slop and divide the result by it, making sure we prefer results with closer terms
  tfidf /= (double)ctx->GetSlop(h);

  return tfidf;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9.Extension API.md

9.Extension API.md

扩展 RediSearch

注册和加载扩展

初始化扩展

示例初始化函数

调用您的自定义函数

查询扩展器 API

RSQueryExpanderCtx

RSToken

评分函数API

RSScoringFunctionCtx

RSIndexResult

RSDocumentMetadata

示例查询扩展器

评分函数示例

Files

9.Extension API.md

Latest commit

History

9.Extension API.md

File metadata and controls

扩展 RediSearch

注册和加载扩展

初始化扩展

示例初始化函数

调用您的自定义函数

查询扩展器 API

RSQueryExpanderCtx

RSToken

评分函数API

RSScoringFunctionCtx

RSIndexResult

RSDocumentMetadata

示例查询扩展器

评分函数示例