Skip to content

Commit

Permalink
Fill README
Browse files Browse the repository at this point in the history
  • Loading branch information
philippgille committed Dec 28, 2023
1 parent ea68456 commit 67095ee
Showing 1 changed file with 98 additions and 1 deletion.
99 changes: 98 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,99 @@
# chromem-go
In-memory vector database for Go with Chroma-like interface

[![Go Reference](https://pkg.go.dev/badge/github.com/philippgille/chromem-go.svg)](https://pkg.go.dev/github.com/philippgille/chromem-go)

In-memory vector database for Go with Chroma-like interface.

It's not a library to connect to the Chroma database. It's an in-memory database on its own, meant to enable retrieval augmented generation (RAG) applications in Go *without having to run a separate database*.
As such, the focus is not scale or performance, but simplicity.

> ⚠️ The initial implementation is fairly naive, with only the bare minimum in features. But over time we'll improve and extend it.
## Interface

Our inspiration, the [Chroma](https://www.trychroma.com/) interface, is the following (taken from their [README](https://github.com/chroma-core/chroma/blob/0.4.21/README.md)).

```python
import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
ids=["doc1", "doc2"], # unique for each doc
)

# Query/search 2 most similar results. You can also .get by id
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
# where={"metadata_field": "is_equal_to_this"}, # optional filter
# where_document={"$contains":"search_string"} # optional filter
)
```

Our Go library exposes the same interface:

```go
package main

import "github.com/philippgille/chromem-go"

func main() {
// Set up chromem-go in-memory, for easy prototyping. Persistence will be added in the future.
client := chromem.NewClient()

// Create collection. GetCollection, GetOrCreateCollection, DeleteCollection will be added in the future.
collection := client.CreateCollection("all-my-documents", nil, nil)

// Add docs to the collection. Update and delete will be added in the future.
// Row-based API will be added when Chroma adds it!
_ = collection.Add(ctx,
[]string{"doc1", "doc2"}, // unique ID for each doc
nil, // We handle embedding automatically. You can skip that and add your own embeddings as well.
[]map[string]string{{"source": "notion"}, {"source": "google-docs"}}, // Filter on these!
[]string{"This is document1", "This is document2"},
)

// Query/search 2 most similar results. Getting by ID will be added in the future.
results, _ := collection.Query(ctx,
"This is a query document",
2,
map[string]string{"metadata_field": "is_equal_to_this"}, // optional filter
map[string]string{"$contains": "search_string"}, // optional filter
)
}
```

Initially, only a minimal subset of all of Chroma's interface is implemented or exported, but we'll add more in future versions.

## Features

- Embedding creators:
- [X] [OpenAI ada v2](https://platform.openai.com/docs/guides/embeddings/embedding-models) (default)
- [X] Bring your own
- [ ] [Mistral (API)](https://docs.mistral.ai/api/#operation/createEmbedding)
- [ ] [ollama](https://ollama.ai/)
- [ ] [LocalAI](https://github.com/mudler/LocalAI)
- Similarity search:
- [X] Exact nearest neighbor search using cosine similarity
- [ ] Approximate nearest neighbor search with index
- [ ] Hierarchical Navigable Small World (HNSW)
- [ ] Inverted file flat (IVFFlat)
- Filters:
- [X] Document filters: `$contains`, `$not_contains`
- [X] Metadata filters: Exact matches
- [ ] Operators (`$and`, `$or` etc.)
- Storage:
- [X] In-memory
- [ ] Persistent (file)
- [ ] Persistent (others (S3, PostgreSQL, ...))

## Usage

For a full, working example, using the vector database for retrieval augmented generation (RAG), see [example/main.go](example/main.go)

0 comments on commit 67095ee

Please sign in to comment.