Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Azure Cosmos DB DocumentStore, IndexStore, KVStore #1393

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

manekinekko
Copy link
Contributor

This PR adds support for Azure Cosmos DB:

  • KVStore
  • DocumentStore
  • IndexStore

This PR also improves AzureCosmosDBNoSqlVectorStore (based on PR #1331)

To run the example file, create a .env file under /examples/ with the follwoing content:

AZURE_OPENAI_ENDPOINT="https://AOAI-ACCOUNT.openai.azure.com"
AZURE_DEPLOYMENT_NAME="gpt-4o-mini"
EMBEDDING_MODEL="text-embedding-3-large"
AZURE_COSMOSDB_NOSQL_ENDPOINT = "https://DB-ACCOUNT.documents.azure.com:443/"

Then run:

npx tsx azure-cosmosdb.ts
See output
{
  docStore: AzureCosmosNoSqlDocumentStore {
    serializer: {
      toPersistence: [Function: toPersistence],
      fromPersistence: [Function: fromPersistence]
    },
    kvstore: AzureCosmosNoSqlKVStore {
      cosmosClient: [CosmosClient],
      database: undefined,
      container: undefined,
      dbName: 'KVStoreDB',
      containerName: 'KVStoreContainer',
      cosmosContainerProperties: undefined,
      cosmosDatabaseProperties: {}
    },
    nodeCollection: 'DocumentStoreDB.DocumentStoreContainer/data',
    refDocCollection: 'DocumentStoreDB.DocumentStoreContainer/ref_doc_info',
    metadataCollection: 'DocumentStoreDB.DocumentStoreContainer/metadata'
  }
}
{
  indexStore: AzureCosmosNoSqlIndexStore {
    _kvStore: AzureCosmosNoSqlKVStore {
      cosmosClient: [CosmosClient],
      database: undefined,
      container: undefined,
      dbName: 'KVStoreDB',
      containerName: 'KVStoreContainer',
      cosmosContainerProperties: undefined,
      cosmosDatabaseProperties: {}
    },
    _collection: 'IndexStoreDB.IndexStoreContainer/data'
  }
}
{
  vectorStore: AzureCosmosDBNoSqlVectorStore {
    embedModel: [Function: transform] OpenAIEmbedding {
      id: 'f55d90ee-04d2-4f7c-9719-8cbc1f57e5cc',
      embedBatchSize: 10,
      getTextEmbeddings: [AsyncFunction (anonymous)],
      apiKey: undefined,
      model: 'text-embedding-3-large',
      dimensions: undefined,
      maxRetries: 10,
      timeout: 60000,
      additionalSessionOptions: undefined,
      embedInfo: [Object],
      lazySession: [AsyncFunction (anonymous)]
    },
    isEmbeddingQuery: undefined,
    storesText: true,
    initPromise: undefined,
    container: undefined,
    cosmosClient: CosmosClient {
      clientContext: [ClientContext],
      endpointRefresher: Timeout {
        _idleTimeout: 300000,
        _idlePrev: [TimersList],
        _idleNext: [Timeout],
        _idleStart: 1540,
        _onTimeout: [Function (anonymous)],
        _timerArgs: undefined,
        _repeat: 300000,
        _destroyed: false,
        [Symbol(refed)]: false,
        [Symbol(kHasPrimitive)]: false,
        [Symbol(asyncId)]: 96,
        [Symbol(triggerId)]: 0
      },
      databases: [Databases],
      offers: [Offers]
    },
    textKey: 'text',
    flatMetadata: true,
    idKey: 'id',
    metadataKey: 'metadata',
    embeddingKey: 'embedding',
    initialize: [Function (anonymous)],
    kvStore: AzureCosmosNoSqlKVStore {
      cosmosClient: [CosmosClient],
      database: undefined,
      container: undefined,
      dbName: 'KVStoreDB',
      containerName: 'KVStoreContainer',
      cosmosContainerProperties: undefined,
      cosmosDatabaseProperties: {}
    }
  }
}
{
  storageContext: {
    docStore: AzureCosmosNoSqlDocumentStore {
      serializer: [Object],
      kvstore: [AzureCosmosNoSqlKVStore],
      nodeCollection: 'DocumentStoreDB.DocumentStoreContainer/data',
      refDocCollection: 'DocumentStoreDB.DocumentStoreContainer/ref_doc_info',
      metadataCollection: 'DocumentStoreDB.DocumentStoreContainer/metadata'
    },
    indexStore: AzureCosmosNoSqlIndexStore {
      _kvStore: [AzureCosmosNoSqlKVStore],
      _collection: 'IndexStoreDB.IndexStoreContainer/data'
    },
    vectorStores: { TEXT: [AzureCosmosDBNoSqlVectorStore] }
  }
}
Using node parser on documents...
Finished parsing documents.
getting embedding progress: 0 / 1
{
  index: VectorStoreIndex {
    serviceContext: undefined,
    storageContext: {
      docStore: [AzureCosmosNoSqlDocumentStore],
      indexStore: [AzureCosmosNoSqlIndexStore],
      vectorStores: [Object]
    },
    docStore: AzureCosmosNoSqlDocumentStore {
      serializer: [Object],
      kvstore: [AzureCosmosNoSqlKVStore],
      nodeCollection: 'DocumentStoreDB.DocumentStoreContainer/data',
      refDocCollection: 'DocumentStoreDB.DocumentStoreContainer/ref_doc_info',
      metadataCollection: 'DocumentStoreDB.DocumentStoreContainer/metadata'
    },
    indexStore: AzureCosmosNoSqlIndexStore {
      _kvStore: [AzureCosmosNoSqlKVStore],
      _collection: 'IndexStoreDB.IndexStoreContainer/data'
    },
    indexStruct: IndexDict {
      indexId: 'bccd7121-6967-4d86-aee6-0730f641824a',
      summary: undefined,
      nodesDict: {},
      type: 'simple_dict'
    },
    embedModel: undefined,
    vectorStores: { TEXT: [AzureCosmosDBNoSqlVectorStore] }
  }
}

@himself65 I wasn't sure how StructTypes are handled internally. I added a workaround to make the code pass. Can please review this change in particular and provide a better fix?

cc @marcusschiesser @amanrao23 @sajeetharan @DanWahlin

Copy link

changeset-bot bot commented Oct 25, 2024

⚠️ No Changeset found

Latest commit: 63a852d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Oct 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
llama-index-ts-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 25, 2024 3:08pm
llamaindex-ts-doc-next ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 25, 2024 3:08pm

Copy link

pkg-pr-new bot commented Oct 25, 2024

Open in Stackblitz

@llamaindex/autotool

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/autotool@1393

@llamaindex/cloud

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/cloud@1393

@llamaindex/community

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/community@1393

@llamaindex/env

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/env@1393

@llamaindex/experimental

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/experimental@1393

@llamaindex/core

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/core@1393

llamaindex

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/llamaindex@1393

@llamaindex/wasm-tools

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/wasm-tools@1393

@llamaindex/anthropic

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/anthropic@1393

@llamaindex/clip

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/clip@1393

@llamaindex/deepinfra

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/deepinfra@1393

@llamaindex/groq

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/groq@1393

@llamaindex/ollama

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/ollama@1393

@llamaindex/huggingface

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/huggingface@1393

@llamaindex/openai

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/openai@1393

@llamaindex/portkey-ai

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/portkey-ai@1393

@llamaindex/replicate

pnpm add https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/replicate@1393

commit: 63a852d

Copy link
Member

@himself65 himself65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, but does it support for all JS enironment like edge runtime and cloudflare woker?

If not I don't want export on index.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants