From 00c8782c2fd811700cca71ceab17b82dc2c6a83b Mon Sep 17 00:00:00 2001 From: Shuyou Date: Thu, 15 Aug 2024 10:58:51 +0800 Subject: [PATCH] feat: support i18n --- .github/workflows/ci-i18n.yml | 57 +++++++++++++ site/en/about/comparison.md | 2 +- site/en/about/roadmap.md | 32 +++---- site/en/adminGuide/deploy_etcd.md | 2 +- site/en/adminGuide/deploy_pulsar.md | 2 +- site/en/adminGuide/deploy_s3.md | 2 +- site/en/adminGuide/rbac.md | 12 +-- site/en/getstarted/quickstart.md | 83 ++++++++++--------- .../install_cluster-helm-gpu.md | 2 +- .../run-milvus-k8s/install_cluster-helm.md | 2 +- site/en/home/home.md | 12 +-- site/en/migrate/es2m.md | 2 +- site/en/migrate/f2m.md | 4 +- site/en/migrate/m2m.md | 4 +- site/en/reference/disk_index.md | 8 +- site/en/userGuide/manage-collections.md | 26 +++--- site/en/userGuide/manage_databases.md | 6 +- .../search-query-get/single-vector-search.md | 14 ++-- .../search-query-get/with-iterators.md | 22 ++--- 19 files changed, 179 insertions(+), 115 deletions(-) create mode 100644 .github/workflows/ci-i18n.yml diff --git a/.github/workflows/ci-i18n.yml b/.github/workflows/ci-i18n.yml new file mode 100644 index 000000000..a7a0bd509 --- /dev/null +++ b/.github/workflows/ci-i18n.yml @@ -0,0 +1,57 @@ +name: ci-i18n + +on: + push: + branches: + - feat/v2.4.x-i18n + # paths: + # - "site/zh/**" + release: + types: [released] + workflow_dispatch: + +# A workflow run is made up of one or more jobs that can run sequentially or in parallel +jobs: + # This workflow contains a single job called "build " + build: + # The type of runner that the job will run on + runs-on: ubuntu-latest + + # Steps represent a sequence of tasks that will be executed as part of the job + steps: + - name: Check out Git repository + uses: actions/checkout@v1 + with: + ref: feat/v2.4.x-i18n + + - name: Extract branch name + shell: bash + run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})" + id: extract_branch + + - name: Md2md + run: | + cp site/en/Variables.json ./ + mv site doc_from + sudo npm install @zilliz/mdtomd -g + goover + rm -rf doc_from/* + rm check-link.js + mv doc_to site + + - name: Delete And Push + run: | + sudo apt-get update + sudo apt-get install jq + cd ../ + git clone -b feat/localization https://.:${{ secrets.P_GITHUB_TOKEN }}@github.com/milvus-io/web-content.git target + git config --global user.email "Milvus-doc-bot@zilliz.com" + git config --global user.name "Milvus-doc-bot" + cp ./milvus-docs/version.json ./target + cd target + rm -rf `cat version.json | jq -r .version` + mkdir `cat version.json | jq -r .version` + cp -avr ../milvus-docs/** ./`cat version.json | jq -r .version` + git add . + git commit -m "Release new docs " + git push -f origin feat/localization diff --git a/site/en/about/comparison.md b/site/en/about/comparison.md index f7c0ed0e2..4a3bd3016 100644 --- a/site/en/about/comparison.md +++ b/site/en/about/comparison.md @@ -52,7 +52,7 @@ Although both serve similar functions as vector databases, the domain-specific t | Deployment Modes | SaaS-only | Milvus Lite, On-prem Standalone & Cluster, Zilliz Cloud Saas & BYOC | | Embedding Functions | Not available | Support with pymilvus[model] | | Data Types | String, Number, Bool, List of String | String, VarChar, Number (Int, Float, Double), Bool, Array, JSON, Float Vector, Binary Vector, BFloat16, Float16, Sparse Vector | -| Metric and Index Types | Cos, Dot, Euclidean
P-family, S-family | Cosine, IP (Dot), L2 (Euclidean), Hamming, Jaccard
FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, HNSW, SCANN, GPU Indexes | +| Metric and Index Types | Cos, Dot, Euclidean
P-family, S-family | Cosine, IP (Dot), L2 (Euclidean), Hamming, Jaccard
FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, HNSW, SCANN, GPU Indexes | | Schema Design | Flexible mode | Flexible mode, Strict mode | | Multiple Vector Fields | N/A | Multi-vector and hybrid search | | Tools | Datasets, text utilities, spark connector | Attu, Birdwatcher, Backup, CLI, CDC, Spark and Kafka connectors | diff --git a/site/en/about/roadmap.md b/site/en/about/roadmap.md index 977e10c91..a96754adc 100644 --- a/site/en/about/roadmap.md +++ b/site/en/about/roadmap.md @@ -22,28 +22,28 @@ Welcome to the Milvus Roadmap! Join us on our continuous journey to enhance and - AI-developer Friendly
A developer-friendly technology stack, enhanced with the latest AI innovations - Multi-Vectors & Hybrid Search
Framework for multiplex recall and fusion

GPU Index Acceleration
Support for higher QPS and faster index creation

Model Library in PyMilvus
Integrated embedding models for Milvus - Sparse Vector (GA)
Local feature extraction and keyword search

Milvus Lite (GA)
A lightweight, in-memory version of Milvus

Embedding Models Gallery
Support for image and multi-modal embeddings and reranker models in model libraries - Original Data-In and Data-Out
Support for Blob data types

Data Clustering
Data co-locality

Scenario-oriented Vector Search
e.g. Multi-target search & NN filtering

Support Embedding & Reranker Endpoint + AI-developer Friendly
A developer-friendly technology stack, enhanced with the latest AI innovations + Multi-Vectors & Hybrid Search
Framework for multiplex recall and fusion

GPU Index Acceleration
Support for higher QPS and faster index creation

Model Library in PyMilvus
Integrated embedding models for Milvus + Sparse Vector (GA)
Local feature extraction and keyword search

Milvus Lite (GA)
A lightweight, in-memory version of Milvus

Embedding Models Gallery
Support for image and multi-modal embeddings and reranker models in model libraries + Original Data-In and Data-Out
Support for Blob data types

Data Clustering
Data co-locality

Scenario-oriented Vector Search
e.g. Multi-target search & NN filtering

Support Embedding & Reranker Endpoint - Rich Functionality
Enhanced retrieval and data management features - Support for FP16, BF16 Datatypes
These ML datatypes can help reduce memory usage

Grouping Search
Aggregate split embeddings

Fuzzy Match and Inverted Index
Support for fuzzy matching and inverted indexing for scalar types like varchar and int - Inverted Index for Array & JSON
Indexing for array and partial support JSON

Bitset Index
Improved execution speed and future data aggregation

Truncate Collection
Allows data clearance while preserving metadata

Support for NULL and Default Values - Support for More Datatypes
e.g. Datetime, GIS

Advanced Text Filtering
e.g. Match Phrase

Primary Key Deduplication + Rich Functionality
Enhanced retrieval and data management features + Support for FP16, BF16 Datatypes
These ML datatypes can help reduce memory usage

Grouping Search
Aggregate split embeddings

Fuzzy Match and Inverted Index
Support for fuzzy matching and inverted indexing for scalar types like varchar and int + Inverted Index for Array & JSON
Indexing for array and partial support JSON

Bitset Index
Improved execution speed and future data aggregation

Truncate Collection
Allows data clearance while preserving metadata

Support for NULL and Default Values + Support for More Datatypes
e.g. Datetime, GIS

Advanced Text Filtering
e.g. Match Phrase

Primary Key Deduplication - Cost Efficiency & Architecture
Advanced systems emphasizing stability, cost efficiency, scalability, and performance - Support for More Collections/Partitions
Handles over 10,000 collections in smaller clusters

Mmap Optimization
Balances reduced memory consumption with latency

Bulk Insert Optimazation
Simplifies importing large datasets - Lazy Load
Data is loaded on-demand through read operations

Major Compaction
Re-distributes data based on configuration to enhance read performance

Mmap for Growing Data
Mmap files for expanding data segments - Memory Control
Reduces out-of-memory issues and provides global memory management

LogNode Introduction
Ensures global consistency and addresses the single-point bottleneck in root coordination

Storage Format V2
Universal format design lays the groundwork for disk-based data access + Cost Efficiency & Architecture
Advanced systems emphasizing stability, cost efficiency, scalability, and performance + Support for More Collections/Partitions
Handles over 10,000 collections in smaller clusters

Mmap Optimization
Balances reduced memory consumption with latency

Bulk Insert Optimazation
Simplifies importing large datasets + Lazy Load
Data is loaded on-demand through read operations

Major Compaction
Re-distributes data based on configuration to enhance read performance

Mmap for Growing Data
Mmap files for expanding data segments + Memory Control
Reduces out-of-memory issues and provides global memory management

LogNode Introduction
Ensures global consistency and addresses the single-point bottleneck in root coordination

Storage Format V2
Universal format design lays the groundwork for disk-based data access - Enterprise Ready
Designed to meet the needs of enterprise production environments - Milvus CDC
Capability for data replication

Accesslog Enhancement
Detailed recording for audit and tracing - New Resource Group
Enhanced resource management

Storage Hook
Support for Bring Your Own Key (BYOK) encryption - Dynamic Replica Number Adjustment
Facilitates dynamic changes to the number of replicas

Dynamic Schema Modification
e.g., Add/delete fields, modify varchar lengths

Rust and C# SDKs + Enterprise Ready
Designed to meet the needs of enterprise production environments + Milvus CDC
Capability for data replication

Accesslog Enhancement
Detailed recording for audit and tracing + New Resource Group
Enhanced resource management

Storage Hook
Support for Bring Your Own Key (BYOK) encryption + Dynamic Replica Number Adjustment
Facilitates dynamic changes to the number of replicas

Dynamic Schema Modification
e.g., Add/delete fields, modify varchar lengths

Rust and C# SDKs diff --git a/site/en/adminGuide/deploy_etcd.md b/site/en/adminGuide/deploy_etcd.md index d78c9b3df..03755e000 100644 --- a/site/en/adminGuide/deploy_etcd.md +++ b/site/en/adminGuide/deploy_etcd.md @@ -50,7 +50,7 @@ Run the following command to start Milvus that uses the etcd configurations. docker compose up ``` -
Configurations only take effect after Milvus starts. See Start Milvus for more information.
+
Configurations only take effect after Milvus starts. See Start Milvus for more information.
## Configure etcd on K8s diff --git a/site/en/adminGuide/deploy_pulsar.md b/site/en/adminGuide/deploy_pulsar.md index bd7629bc4..9fc577bc5 100644 --- a/site/en/adminGuide/deploy_pulsar.md +++ b/site/en/adminGuide/deploy_pulsar.md @@ -34,7 +34,7 @@ Run the following command to start Milvus that uses the Pulsar configurations. docker compose up ``` -
Configurations only take effect after Milvus starts. See Start Milvus for more information.
+
Configurations only take effect after Milvus starts. See Start Milvus for more information.
## Configure Pulsar with Helm diff --git a/site/en/adminGuide/deploy_s3.md b/site/en/adminGuide/deploy_s3.md index 40058d582..67e219d9a 100644 --- a/site/en/adminGuide/deploy_s3.md +++ b/site/en/adminGuide/deploy_s3.md @@ -35,7 +35,7 @@ Run the following command to start Milvus that uses the S3 configurations. ```shell docker compose up ``` -
Configurations only take effect after Milvus starts. See Start Milvus for more information.
+
Configurations only take effect after Milvus starts. See Start Milvus for more information.
## Configure S3 on K8s diff --git a/site/en/adminGuide/rbac.md b/site/en/adminGuide/rbac.md index ddb764625..802d1ba9d 100644 --- a/site/en/adminGuide/rbac.md +++ b/site/en/adminGuide/rbac.md @@ -58,7 +58,7 @@ client.update_password( ```python client.list_users() -# output: +# output # ['root', 'user_1'] ``` @@ -67,7 +67,7 @@ client.list_users() ```python client.describe_user(user_name='user_1') -# output: +# output # {'user_name': 'user_1', 'roles': ()} ``` @@ -88,7 +88,7 @@ After creating a role, you can: ```python client.list_roles() -# output: +# output # ['admin', 'public', 'roleA'] ``` @@ -120,7 +120,7 @@ client.describe_role( role_name='roleA' ) -# output: +# output # {'role': 'roleA', # 'privileges': [{'object_type': 'User', # 'object_name': 'user_1', @@ -150,8 +150,8 @@ client.describe_user( user_name='user_1' ) -# output: -# {'user_name': 'user_1', 'roles': ('roleA',)} +# output +# {'user_name': 'user_1', 'roles': ('roleA')} ``` ## 6. Revoke privileges diff --git a/site/en/getstarted/quickstart.md b/site/en/getstarted/quickstart.md index a07931818..9642acc28 100644 --- a/site/en/getstarted/quickstart.md +++ b/site/en/getstarted/quickstart.md @@ -8,19 +8,20 @@ title: Quickstart Open In Colab -Vectors, the output data format of Neural Network models, can effectively encode information and serve a pivotal role in AI applications such as knowledge base, semantic search, Retrieval Augmented Generation (RAG) and more. +Vectors, the output data format of Neural Network models, can effectively encode information and serve a pivotal role in AI applications such as knowledge base, semantic search, Retrieval Augmented Generation (RAG) and more. -Milvus is an open-source vector database that suits AI applications of every size from running a demo chatbot in Jupyter notebook to building web-scale search that serves billions of users. In this guide, we will walk you through how to set up Milvus locally within minutes and use the Python client library to generate, store and search vectors. +Milvus is an open-source vector database that suits AI applications of every size from running a demo chatbot in Jupyter notebook to building web-scale search that serves billions of users. In this guide, we will walk you through how to set up Milvus locally within minutes and use the Python client library to generate, store and search vectors. ## Install Milvus + In this guide we use Milvus Lite, a python library included in `pymilvus` that can be embedded into the client application. Milvus also supports deployment on [Docker](https://milvus.io/docs/install_standalone-docker.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) for production use cases. Before starting, make sure you have Python 3.8+ available in the local environment. Install `pymilvus` which contains both the python client library and Milvus Lite: - ```python $ pip install -U pymilvus ``` +
> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**. (Click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu). @@ -28,8 +29,8 @@ $ pip install -U pymilvus
## Set Up Vector Database -To create a local Milvus vector database, simply instantiate a `MilvusClient` by specifying a file name to store all data, such as "milvus_demo.db". +To create a local Milvus vector database, simply instantiate a `MilvusClient` by specifying a file name to store all data, such as "milvus_demo.db". ```python from pymilvus import MilvusClient @@ -38,8 +39,8 @@ client = MilvusClient("milvus_demo.db") ``` ## Create a Collection -In Milvus, we need a collection to store vectors and their associated metadata. You can think of it as a table in traditional SQL databases. When creating a collection, you can define schema and index params to configure vector specs such as dimensionality, index types and distant metrics. There are also complex concepts to optimize the index for vector search performance. For now, let's just focus on the basics and use default for everything possible. At minimum, you only need to set the collection name and the dimension of the vector field of the collection. +In Milvus, we need a collection to store vectors and their associated metadata. You can think of it as a table in traditional SQL databases. When creating a collection, you can define schema and index params to configure vector specs such as dimensionality, index types and distant metrics. There are also complex concepts to optimize the index for vector search performance. For now, let's just focus on the basics and use default for everything possible. At minimum, you only need to set the collection name and the dimension of the vector field of the collection. ```python if client.has_collection(collection_name="demo_collection"): @@ -50,25 +51,26 @@ client.create_collection( ) ``` -In the above setup, +In the above setup, + - The primary key and vector fields use their default names ("id" and "vector"). - The metric type (vector distance definition) is set to its default value ([COSINE](https://milvus.io/docs/metric.md#Cosine-Similarity)). - The primary key field accepts integers and does not automatically increments (namely not using [auto-id feature](https://milvus.io/docs/schema.md)) -Alternatively, you can formally define the schema of the collection by following this [instruction](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Collections/create_schema.md). + Alternatively, you can formally define the schema of the collection by following this [instruction](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Collections/create_schema.md). ## Prepare Data + In this guide, we use vectors to perform semantic search on text. We need to generate vectors for text by downloading embedding models. This can be easily done by using the utility functions from `pymilvus[model]` library. ## Represent text with vectors -First, install the model library. This package includes essential ML tools such as PyTorch. The package download may take some time if your local environment has never installed PyTorch. +First, install the model library. This package includes essential ML tools such as PyTorch. The package download may take some time if your local environment has never installed PyTorch. ```python $ pip install "pymilvus[model]" ``` -Generate vector embeddings with default model. Milvus expects data to be inserted organized as a list of dictionaries, where each dictionary represents a data record, termed as an entity. - +Generate vector embeddings with default model. Milvus expects data to be inserted organized as a list of dictionaries, where each dictionary represents a data record, termed as an entity. ```python from pymilvus import model @@ -95,21 +97,21 @@ print("Dim:", embedding_fn.dim, vectors[0].shape) # Dim: 768 (768,) # to demo metadata filtering later. data = [ {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"} - for i in range(len(vectors)) ] print("Data has", len(data), "entities, each with fields: ", data[0].keys()) print("Vector dim:", len(data[0]["vector"])) ``` - Dim: 768 (768,) - Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject']) - Vector dim: 768 - +``` +Dim: 768 (768,) +Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject']) +Vector dim: 768 +``` ## [Alternatively] Use fake representation with random vectors -If you couldn't download the model due to network issues, as a walkaround, you can use random vectors to represent the text and still finish the example. Just note that the search result won't reflect semantic similarity as the vectors are fake ones. +If you couldn't download the model due to network issues, as a walkaround, you can use random vectors to represent the text and still finish the example. Just note that the search result won't reflect semantic similarity as the vectors are fake ones. ```python import random @@ -131,13 +133,14 @@ print("Data has", len(data), "entities, each with fields: ", data[0].keys()) print("Vector dim:", len(data[0]["vector"])) ``` - Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject']) - Vector dim: 768 - +``` +Data has 3 entities, each with fields: dict_keys(['id', 'vector', 'text', 'subject']) +Vector dim: 768 +``` ## Insert Data -Let's insert the data into the collection: +Let's insert the data into the collection: ```python res = client.insert(collection_name="demo_collection", data=data) @@ -145,15 +148,17 @@ res = client.insert(collection_name="demo_collection", data=data) print(res) ``` - {'insert_count': 3, 'ids': [0, 1, 2], 'cost': 0} - +``` +{'insert_count': 3, 'ids': [0, 1, 2], 'cost': 0} +``` ## Semantic Search + Now we can do semantic searches by representing the search query text as vector, and conduct vector similarity search on Milvus. ### Vector search -Milvus accepts one or multiple vector search requests at the same time. The value of the query_vectors variable is a list of vectors, where each vector is an array of float numbers. +Milvus accepts one or multiple vector search requests at the same time. The value of the query_vectors variable is a list of vectors, where each vector is an array of float numbers. ```python query_vectors = embedding_fn.encode_queries(["Who is Alan Turing?"]) @@ -170,14 +175,15 @@ res = client.search( print(res) ``` - data: ["[{'id': 2, 'distance': 0.5859944820404053, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}}, {'id': 1, 'distance': 0.5118255615234375, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0} - +``` +data: ["[{'id': 2, 'distance': 0.5859944820404053, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}}, {'id': 1, 'distance': 0.5118255615234375, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0} +``` The output is a list of results, each mapping to a vector search query. Each query contains a list of results, where each result contains the entity primary key, the distance to the query vector, and the entity details with specified `output_fields`. ## Vector Search with Metadata Filtering -You can also conduct vector search while considering the values of the metadata (called "scalar" fields in Milvus, as scalar refers to non-vector data). This is done with a filter expression specifying certain criteria. Let's see how to search and filter with the `subject` field in the following example. +You can also conduct vector search while considering the values of the metadata (called "scalar" fields in Milvus, as scalar refers to non-vector data). This is done with a filter expression specifying certain criteria. Let's see how to search and filter with the `subject` field in the following example. ```python # Insert more docs in another subject. @@ -206,19 +212,20 @@ res = client.search( print(res) ``` - data: ["[{'id': 4, 'distance': 0.27030569314956665, 'entity': {'text': 'Computational synthesis with AI algorithms predicts molecular properties.', 'subject': 'biology'}}, {'id': 3, 'distance': 0.16425910592079163, 'entity': {'text': 'Machine learning has been used for drug design.', 'subject': 'biology'}}]"] , extra_info: {'cost': 0} - +``` +data: ["[{'id': 4, 'distance': 0.27030569314956665, 'entity': {'text': 'Computational synthesis with AI algorithms predicts molecular properties.', 'subject': 'biology'}}, {'id': 3, 'distance': 0.16425910592079163, 'entity': {'text': 'Machine learning has been used for drug design.', 'subject': 'biology'}}]"] , extra_info: {'cost': 0} +``` -By default, the scalar fields are not indexed. If you need to perform metadata filtered search in large dataset, you can consider using fixed schema and also turn on the [index](https://milvus.io/docs/scalar_index.md) to improve the search performance. +By default, the scalar fields are not indexed. If you need to perform metadata filtered search in large dataset, you can consider using fixed schema and also turn on the [index](https://milvus.io/docs/scalar_index.md) to improve the search performance. In addition to vector search, you can also perform other types of searches: ### Query + A query() is an operation that retrieves all entities matching a cretria, such as a [filter expression](https://milvus.io/docs/boolean.md) or matching some ids. For example, retrieving all entities whose scalar field has a particular value: - ```python res = client.query( collection_name="demo_collection", @@ -229,7 +236,6 @@ res = client.query( Directly retrieve entities by primary key: - ```python res = client.query( collection_name="demo_collection", @@ -239,8 +245,8 @@ res = client.query( ``` ## Delete Entities -If you'd like to purge data, you can delete entities specifying the primary key or delete all entities matching a particular filter expression. +If you'd like to purge data, you can delete entities specifying the primary key or delete all entities matching a particular filter expression. ```python # Delete entities by primary key @@ -257,13 +263,14 @@ res = client.delete( print(res) ``` - [0, 2] - [3, 4, 5] - +``` +[0, 2] +[3, 4, 5] +``` ## Load Existing Data -Since all data of Milvus Lite is stored in a local file, you can load all data into memory even after the program terminates, by creating a `MilvusClient` with the existing file. For example, this will recover the collections from "milvus_demo.db" file and continue to write data into it. +Since all data of Milvus Lite is stored in a local file, you can load all data into memory even after the program terminates, by creating a `MilvusClient` with the existing file. For example, this will recover the collections from "milvus_demo.db" file and continue to write data into it. ```python from pymilvus import MilvusClient @@ -272,8 +279,8 @@ client = MilvusClient("milvus_demo.db") ``` ## Drop the collection -If you would like to delete all the data in a collection, you can drop the collection with +If you would like to delete all the data in a collection, you can drop the collection with ```python # Drop collection @@ -281,8 +288,8 @@ client.drop_collection(collection_name="demo_collection") ``` ## Learn More -Milvus Lite is great for getting started with a local python program. If you have large scale data or would like to use Milvus in production, you can learn about deploying Milvus on [Docker](https://milvus.io/docs/install_standalone-docker.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md). All deployment modes of Milvus share the same API, so your client side code doesn't need to change much if moving to another deployment mode. Simply specify the [URI and Token](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of a Milvus server deployed anywhere: +Milvus Lite is great for getting started with a local python program. If you have large scale data or would like to use Milvus in production, you can learn about deploying Milvus on [Docker](https://milvus.io/docs/install_standalone-docker.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md). All deployment modes of Milvus share the same API, so your client side code doesn't need to change much if moving to another deployment mode. Simply specify the [URI and Token](https://milvus.io/api-reference/pymilvus/v2.4.x/MilvusClient/Client/MilvusClient.md) of a Milvus server deployed anywhere: ```python client = MilvusClient(uri="http://localhost:19530", token="root:Milvus") diff --git a/site/en/getstarted/run-milvus-gpu/install_cluster-helm-gpu.md b/site/en/getstarted/run-milvus-gpu/install_cluster-helm-gpu.md index cb3b1f28b..0024338d8 100644 --- a/site/en/getstarted/run-milvus-gpu/install_cluster-helm-gpu.md +++ b/site/en/getstarted/run-milvus-gpu/install_cluster-helm-gpu.md @@ -174,7 +174,7 @@ In addition to a single GPU device, you can also assign multiple GPU devices to diff --git a/site/en/getstarted/run-milvus-k8s/install_cluster-helm.md b/site/en/getstarted/run-milvus-k8s/install_cluster-helm.md index 1db2327e8..38c077d81 100644 --- a/site/en/getstarted/run-milvus-k8s/install_cluster-helm.md +++ b/site/en/getstarted/run-milvus-k8s/install_cluster-helm.md @@ -85,7 +85,7 @@ The command above deploys a Milvus cluster with its components and dependencies diff --git a/site/en/home/home.md b/site/en/home/home.md index 43d99e514..c45f2151a 100644 --- a/site/en/home/home.md +++ b/site/en/home/home.md @@ -32,24 +32,24 @@ id: home.md
- icon - + icon +

Learn how to install Milvus using either Docker Compose or on Kubernetes.

- icon - + icon +

Learn how to quickly run Milvus with sample code.

- icon - + icon +

Learn how to build vector similarity search applications with Milvus. diff --git a/site/en/migrate/es2m.md b/site/en/migrate/es2m.md index 88317c25e..e521541a7 100644 --- a/site/en/migrate/es2m.md +++ b/site/en/migrate/es2m.md @@ -118,7 +118,7 @@ The following table describes the parameters in the example config file. For a f | Parameter | Description | | --- | --- | - | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | + | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | | `target.remote.outputDir` | Output directory path in the cloud storage bucket. | | `target.remote.cloud` | Cloud storage service provider. Example values: `aws`, `gcp`, `azure`. | | `target.remote.region` | Cloud storage region. It can be any value if you use local MinIO. | diff --git a/site/en/migrate/f2m.md b/site/en/migrate/f2m.md index 637d7123c..77534510d 100644 --- a/site/en/migrate/f2m.md +++ b/site/en/migrate/f2m.md @@ -87,7 +87,7 @@ The following table describes the parameters in the example config file. For a f | Parameter | Description | | --- | --- | - | `source.mode` | Specifies where the source files are read from. Valid values:
- `local`: reads files from a local disk.
- `remote`: reads files from remote storage. | + | `source.mode` | Specifies where the source files are read from. Valid values:
- `local`: reads files from a local disk.
- `remote`: reads files from remote storage. | | `source.local.faissFile` | The directory path where the source files are located. For example, `/db/faiss.index`. | - `target` @@ -98,7 +98,7 @@ The following table describes the parameters in the example config file. For a f | `target.create.collection.shardsNums` | Number of shards to be created in the collection. For more information on shards, refer to [Terminology](https://milvus.io/docs/glossary.md#Shard). | | `target.create.collection.dim` | Dimension of the vector field. | | `target.create.collection.metricType` | Metric type used to measure similarities between vectors. For more information, refer to [Terminology](https://milvus.io/docs/glossary.md#Metric-type). | - | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | + | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | | `target.remote.outputDir` | Output directory path in the cloud storage bucket. | | `target.remote.cloud` | Cloud storage service provider. Example values: `aws`, `gcp`, `azure`. | | `target.remote.endpoint` | Endpoint of Milvus 2.x storage. | diff --git a/site/en/migrate/m2m.md b/site/en/migrate/m2m.md index c01a32535..caddd36d1 100644 --- a/site/en/migrate/m2m.md +++ b/site/en/migrate/m2m.md @@ -114,14 +114,14 @@ The following table describes the parameters in the example config file. For a f | Parameter | Description | | --- | --- | - | `source.mode` | Specifies where the source files are read from. Valid values:
- `local`: reads files from a local disk.
- `remote`: reads files from remote storage. | + | `source.mode` | Specifies where the source files are read from. Valid values:
- `local`: reads files from a local disk.
- `remote`: reads files from remote storage. | | `source.local.tablesDir` | The directory path where the source files are located. For example, `/db/tables/`. | - `target` | Parameter | Description | | --- | --- | - | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | + | `target.mode` | Storage location for dumped files. Valid values:
- `local`: Store dumped files on local disks.
- `remote`: Store dumped files on object storage. | | `target.remote.outputDir` | Output directory path in the cloud storage bucket. | | `target.remote.ak` | Access key for Milvus 2.x storage. | | `target.remote.sk` | Secret key for Milvus 2.x storage. | diff --git a/site/en/reference/disk_index.md b/site/en/reference/disk_index.md index 6e1aa1a36..746c10762 100644 --- a/site/en/reference/disk_index.md +++ b/site/en/reference/disk_index.md @@ -61,10 +61,10 @@ DiskIndex: | Parameter | Description | Value Range | Default Value | | --- | --- | --- | --- | -| `MaxDegree` | Maximum degree of the Vamana graph.
A larger value offers a higher recall rate but increases the size of and time to build the index. | [1, 512] | 56 | -| `SearchListSize` | Size of the candidate list.
A larger value increases the time spent on building the index but offers a higher recall rate.
Set it to a value smaller than `MaxDegree` unless you need to reduce the index-building time. | [1, int32_max] | 100 | -| `PQCodeBugetGBRatio` | Size limit on the PQ code.
A larger value offers a higher recall rate but increases memory usage. | (0.0, 0.25] | 0.125 | -| `SearchCacheBudgetGBRatio` | Ratio of cached node numbers to raw data.
A larger value improves index-building performance with increased memory usage. | [0.0, 0.3) | 0.10 | +| `MaxDegree` | Maximum degree of the Vamana graph.
A larger value offers a higher recall rate but increases the size of and time to build the index. | [1, 512] | 56 | +| `SearchListSize` | Size of the candidate list.
A larger value increases the time spent on building the index but offers a higher recall rate.
Set it to a value smaller than `MaxDegree` unless you need to reduce the index-building time. | [1, int32_max] | 100 | +| `PQCodeBugetGBRatio` | Size limit on the PQ code.
A larger value offers a higher recall rate but increases memory usage. | (0.0, 0.25] | 0.125 | +| `SearchCacheBudgetGBRatio` | Ratio of cached node numbers to raw data.
A larger value improves index-building performance with increased memory usage. | [0.0, 0.3) | 0.10 | | `BeamWidthRatio` | Ratio between the maximum number of IO requests per search iteration and CPU number. | [1, max(128 / CPU number, 16)] | 4.0 | ## Troubleshooting diff --git a/site/en/userGuide/manage-collections.md b/site/en/userGuide/manage-collections.md index 6af8c27c2..54414853a 100644 --- a/site/en/userGuide/manage-collections.md +++ b/site/en/userGuide/manage-collections.md @@ -335,11 +335,11 @@ export fields='[{ \ auto_id - Determines if the primary field automatically increments.
Setting this to True makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. The auto-generated IDs have a fixed length and cannot be altered. + Determines if the primary field automatically increments.
Setting this to True makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. The auto-generated IDs have a fixed length and cannot be altered. enable_dynamic_field - Determines if Milvus saves the values of undefined fields in a dynamic field if the data being inserted into the target collection includes fields that are not defined in the collection's schema.
When you set this to True, Milvus will create a field called $meta to store any undefined fields and their values from the data that is inserted. + Determines if Milvus saves the values of undefined fields in a dynamic field if the data being inserted into the target collection includes fields that are not defined in the collection's schema.
When you set this to True, Milvus will create a field called $meta to store any undefined fields and their values from the data that is inserted. field_name @@ -351,11 +351,11 @@ export fields='[{ \ is_primary - Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type. + Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type. dim - The dimension of the vector embeddings.
This is mandatory for a field of the DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR, DataType.FLOAT16_VECTOR, or DataType.BFLOAT16_VECTOR type. If you use DataType.SPARSE_FLOAT_VECTOR, omit this parameter. + The dimension of the vector embeddings.
This is mandatory for a field of the DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR, DataType.FLOAT16_VECTOR, or DataType.BFLOAT16_VECTOR type. If you use DataType.SPARSE_FLOAT_VECTOR, omit this parameter. @@ -378,15 +378,15 @@ export fields='[{ \ isPrimaryKey - Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.Int64 type or the DataType.VarChar type. + Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.Int64 type or the DataType.VarChar type. autoID - Whether allows the primary field to automatically increment.
Setting this to true makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. + Whether allows the primary field to automatically increment.
Setting this to true makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. dimension - The dimension of the vector embeddings.
This is mandatory for a field of the DataType.FloatVector, DataType.BinaryVector, DataType.Float16Vector, or DataType.BFloat16Vector type. + The dimension of the vector embeddings.
This is mandatory for a field of the DataType.FloatVector, DataType.BinaryVector, DataType.Float16Vector, or DataType.BFloat16Vector type. @@ -409,15 +409,15 @@ export fields='[{ \ is_primary_key - Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type. + Whether the current field is the primary field in a collection.
Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type. auto_id - Whether the primary field automatically increments upon data insertions into this collection.
The value defaults to False. Setting this to True makes the primary field automatically increment. Skip this parameter if you need to set up a collection with a customized schema. + Whether the primary field automatically increments upon data insertions into this collection.
The value defaults to False. Setting this to True makes the primary field automatically increment. Skip this parameter if you need to set up a collection with a customized schema. dim - The dimensionality of the collection field that holds vector embeddings.
The value should be an integer greater than 1 and is usually determined by the model you use to generate vector embeddings. + The dimensionality of the collection field that holds vector embeddings.
The value should be an integer greater than 1 and is usually determined by the model you use to generate vector embeddings. @@ -980,11 +980,11 @@ Use [createCollection()](https://milvus.io/api-reference/node/v2.4.x/Collections schema - The schema of this collection.
Setting this to None indicates this collection will be created with default settings.
To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here. In this case, Milvus ignores all other schema-related settings carried in the request. + The schema of this collection.
Setting this to None indicates this collection will be created with default settings.
To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here. In this case, Milvus ignores all other schema-related settings carried in the request. index_params - The parameters for building the index on the vector field in this collection. To set up a collection with a customized schema and automatically load the collection to memory, you need to create an IndexParams object and reference it here.
You should at least add an index for the vector field in this collection. You can also skip this parameter if you prefer to set up the index parameters later on. + The parameters for building the index on the vector field in this collection. To set up a collection with a customized schema and automatically load the collection to memory, you need to create an IndexParams object and reference it here.
You should at least add an index for the vector field in this collection. You can also skip this parameter if you prefer to set up the index parameters later on. @@ -1003,7 +1003,7 @@ Use [createCollection()](https://milvus.io/api-reference/node/v2.4.x/Collections collectionSchema - The schema of this collection.
Leaving it empty indicates this collection will be created with default settings. To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here. + The schema of this collection.
Leaving it empty indicates this collection will be created with default settings. To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here. indexParams diff --git a/site/en/userGuide/manage_databases.md b/site/en/userGuide/manage_databases.md index 087307b57..57f5d3dad 100644 --- a/site/en/userGuide/manage_databases.md +++ b/site/en/userGuide/manage_databases.md @@ -34,9 +34,9 @@ Use [MilvusClient](https://milvus.io/api-reference/node/v2.4.x/Client/MilvusClie

- Python - Java - Node.js + Python + Java + Node.js
```python diff --git a/site/en/userGuide/search-query-get/single-vector-search.md b/site/en/userGuide/search-query-get/single-vector-search.md index fccbd81d1..96e1ecd2a 100644 --- a/site/en/userGuide/search-query-get/single-vector-search.md +++ b/site/en/userGuide/search-query-get/single-vector-search.md @@ -478,15 +478,15 @@ console.log(res.results) data - A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. + A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. limit - The total number of entities to return.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. + The total number of entities to return.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. search_params - The parameter settings specific to this operation.
+ The parameter settings specific to this operation.
@@ -505,11 +505,11 @@ console.log(res.results) data - A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. + A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. topK - The number of records to return in the search result. This parameter uses the same syntax as the limit parameter, so you should only set one of them.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. + The number of records to return in the search result. This parameter uses the same syntax as the limit parameter, so you should only set one of them.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. @@ -528,11 +528,11 @@ console.log(res.results) data - A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. + A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. limit - The total number of entities to return.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. + The total number of entities to return.
You can use this parameter in combination with offset in param to enable pagination.
The sum of this value and offset in param should be less than 16,384. diff --git a/site/en/userGuide/search-query-get/with-iterators.md b/site/en/userGuide/search-query-get/with-iterators.md index e32d3eba3..cdcb1d8b6 100644 --- a/site/en/userGuide/search-query-get/with-iterators.md +++ b/site/en/userGuide/search-query-get/with-iterators.md @@ -362,7 +362,7 @@ System.out.println(results.size()); data - A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. + A list of vector embeddings.
Milvus searches for the most similar vector embeddings to the specified ones. anns_field @@ -370,19 +370,19 @@ System.out.println(results.size()); batch_size - The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. + The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. param - The parameter settings specific to this operation.
+ The parameter settings specific to this operation.
output_fields - A list of field names to include in each entity in return.
The value defaults to None. If left unspecified, only the primary field is included. + A list of field names to include in each entity in return.
The value defaults to None. If left unspecified, only the primary field is included. limit - The total number of entities to return.
The value defaults to -1, indicating all matching entities will be in return. + The total number of entities to return.
The value defaults to -1, indicating all matching entities will be in return. @@ -409,7 +409,7 @@ System.out.println(results.size()); withBatchSize - The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. + The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. withParams @@ -551,19 +551,19 @@ while (true) { batch_size - The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. + The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. expr - A scalar filtering condition to filter matching entities.
The value defaults to None, indicating that scalar filtering is ignored. To build a scalar filtering condition, refer to Boolean Expression Rules. + A scalar filtering condition to filter matching entities.
The value defaults to None, indicating that scalar filtering is ignored. To build a scalar filtering condition, refer to Boolean Expression Rules. output_fields - A list of field names to include in each entity in return.
The value defaults to None. If left unspecified, only the primary field is included. + A list of field names to include in each entity in return.
The value defaults to None. If left unspecified, only the primary field is included. limit - The total number of entities to return.
The value defaults to -1, indicating all matching entities will be in return. + The total number of entities to return.
The value defaults to -1, indicating all matching entities will be in return. @@ -586,7 +586,7 @@ while (true) { withBatchSize - The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. + The number of entities to return each time you call next() on the current iterator.
The value defaults to 1000. Set it to a proper value to control the number of entities to return per iteration. addOutField