Skip to content

Latest commit

 

History

History
46 lines (36 loc) · 4.23 KB

vector-database.md

File metadata and controls

46 lines (36 loc) · 4.23 KB
layout title categories primary_title breadcrumbs feature_area_icon_type feature_area_category_name feature_area_solution_name callouts meta_keywords
platform-solution
Using OpenSearch as a Vector Database
search
Using OpenSearch as a Vector Database
icon items
search-magnifying-glass
title url
The OpenSearch Platform
/platform/index.html
title url
Search
/platform/search/index.html
title url
Vector Database
/platform/search/vector-database.html
search-magnifying-glass
Search
Vector Database
name description
Trusted in production
Power AI applications on a mature search and analytics engine trusted in production by tens of thousands of users.
name description
Proven at scale
Build stable applications with a data platform proven to scale to up to tens of billions of vectors, with low latency and high availability.
name description
Open and flexible
Choose open-source tools and take advantage of integrations with popular open frameworks, plus the option to use managed services from major cloud providers.
name description
Build for the future
Future-proof your AI applications with vector, lexical, and hybrid search, analytics, and observability capabilities, all in one software suite.
Vector search, vectordb, vector db, vector database, opensearch vector, generative AI, llm, vector llm, opensearch llm

An open-source, all-in-one vector database for building flexible, scalable, and future-proof AI applications

Traditional lexical search, based on term frequency models like BM25, is widely used and effective for many search applications. However, lexical search techniques require significant investment in time and expertise to tune them to account for the meaning or relevance of the terms searched.  Today, more and more developers want to embed semantic understanding into their search applications. Enter machine learning embedding models that can encode the meaning and context of documents, images, and audio into vectors for similarity search. These embedded meanings can, in turn, be searched using the k-nearest neighbors (k-NN){:target="_blank"} functionality provided by OpenSearch. 

Using OpenSearch as a vector database brings together the power of traditional search, analytics, and vector search in one complete package. OpenSearch’s vector database capabilities can accelerate artificial intelligence (AI) application development by reducing the effort for builders to operationalize, manage, and integrate AI-generated assets. Bring your models, vectors, and metadata into OpenSearch to power vector, lexical, and hybrid search and analytics, with performance and scalability built in.

What is a vector database?

Information comes in many forms: unstructured data, like text documents, rich media, and audio, and structured data, like geospatial coordinates, tables, and graphs. Innovations in AI have enabled the use of models, or embeddings, to encode all types of data into vectors. These vectors are data points in a high-dimensional space that capture the meaning and context of an asset, allowing search tools to find similar assets by searching for neighboring data points.

Vector databases allow you to store and index vectors and metadata, unlocking the ability to use low-latency queries to discover assets by degree of similarity. Typically powered by k-NN indexes built using algorithms like Hierarchical Navigable Small Worlds (HNSW) and Inverted File (IVF) System, vector databases augment k-NN functionality by providing a foundation for applications like data management, fault tolerance, resource access controls, and a query engine.

OpenSearch provides an integrated  vector database that can support AI systems by serving as a knowledge base. This benefits AI applications like generative AI and natural language search by providing a long-term memory of AI-generated outputs. These outputs can be used to enhance information retrieval and analytics, improve efficiency and stability, and give generative AI models a broader and deeper pool of data from which to draw more accurate and truthful responses to queries.