v0.5.0 (2024-03-23)
Highlights in this release are query performance improvements (5x faster, 98% fewer memory allocations), export/import of the entire DB to/from a single file with optional gzip-compression and AES-GCM encryption, optional gzip-compression for the regular persistence, a new code example for semantic search across 5,000 arXiv papers, and an embedding func for Cohere.
Added
- Added arXiv semantic search example (PR #45)
- Added basic query benchmark (PR #46)
- Added unit test for collection query errors (PR #51)
- Added
Collection.QueryEmbedding()
method for when you already have the embedding of your query (PR #52) - Added export and import of the entire DB to/from a single file, with optional gzip-compression and AES-GCM encryption (PR #58)
- Added optional gzip-compression to the regular persistence (i.e. the DB from
NewPersistentDB()
which writes a file for each added collection and document) (PR #59) - Added minimal example (PR #60, #62)
- Added embedding func for Cohere (PR #61)
Improved
- Changed the example link target to directory instead of
main.go
file (PR #43) - Improved query performance (5x faster, 98% fewer memory allocations) (PR #47, #53, #54)
-
benchstat output
goos: linux goarch: amd64 pkg: github.com/philippgille/chromem-go cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz │ before │ after │ │ sec/op │ sec/op vs base │ Collection_Query_NoContent_100-8 413.69µ ± 4% 90.79µ ± 2% -78.05% (p=0.002 n=6) Collection_Query_NoContent_1000-8 2759.4µ ± 0% 518.8µ ± 1% -81.20% (p=0.002 n=6) Collection_Query_NoContent_5000-8 12.980m ± 1% 2.144m ± 1% -83.49% (p=0.002 n=6) Collection_Query_NoContent_25000-8 66.559m ± 1% 9.947m ± 2% -85.06% (p=0.002 n=6) Collection_Query_NoContent_100000-8 282.41m ± 3% 39.75m ± 1% -85.92% (p=0.002 n=6) Collection_Query_100-8 416.75µ ± 2% 90.99µ ± 1% -78.17% (p=0.002 n=6) Collection_Query_1000-8 2792.8µ ± 23% 595.2µ ± 13% -78.69% (p=0.002 n=6) Collection_Query_5000-8 15.643m ± 1% 2.556m ± 1% -83.66% (p=0.002 n=6) Collection_Query_25000-8 78.29m ± 1% 11.66m ± 1% -85.11% (p=0.002 n=6) Collection_Query_100000-8 338.54m ± 5% 39.70m ± 12% -88.27% (p=0.002 n=6) geomean 12.97m 2.192m -83.10% │ before │ after │ │ B/op │ B/op vs base │ Collection_Query_NoContent_100-8 1211.007Ki ± 0% 5.030Ki ± 0% -99.58% (p=0.002 n=6) Collection_Query_NoContent_1000-8 12082.16Ki ± 0% 13.24Ki ± 0% -99.89% (p=0.002 n=6) Collection_Query_NoContent_5000-8 60394.23Ki ± 0% 45.99Ki ± 0% -99.92% (p=0.002 n=6) Collection_Query_NoContent_25000-8 301962.1Ki ± 0% 206.7Ki ± 0% -99.93% (p=0.002 n=6) Collection_Query_NoContent_100000-8 1207818.1Ki ± 0% 791.4Ki ± 0% -99.93% (p=0.002 n=6) Collection_Query_100-8 1211.006Ki ± 0% 5.033Ki ± 0% -99.58% (p=0.002 n=6) Collection_Query_1000-8 12082.11Ki ± 0% 13.25Ki ± 0% -99.89% (p=0.002 n=6) Collection_Query_5000-8 60394.10Ki ± 0% 46.04Ki ± 0% -99.92% (p=0.002 n=6) Collection_Query_25000-8 301962.1Ki ± 0% 206.8Ki ± 0% -99.93% (p=0.002 n=6) Collection_Query_100000-8 1207818.1Ki ± 0% 791.4Ki ± 0% -99.93% (p=0.002 n=6) geomean 49.13Mi 54.97Ki -99.89% │ before │ after │ │ allocs/op │ allocs/op vs base │ Collection_Query_NoContent_100-8 238.00 ± 0% 94.00 ± 1% -60.50% (p=0.002 n=6) Collection_Query_NoContent_1000-8 2038.5 ± 0% 140.5 ± 0% -93.11% (p=0.002 n=6) Collection_Query_NoContent_5000-8 10039.0 ± 0% 172.0 ± 1% -98.29% (p=0.002 n=6) Collection_Query_NoContent_25000-8 50038.0 ± 0% 204.0 ± 1% -99.59% (p=0.002 n=6) Collection_Query_NoContent_100000-8 200038.0 ± 0% 232.0 ± 3% -99.88% (p=0.002 n=6) Collection_Query_100-8 238.00 ± 0% 94.50 ± 1% -60.29% (p=0.002 n=6) Collection_Query_1000-8 2038.0 ± 0% 141.0 ± 1% -93.08% (p=0.002 n=6) Collection_Query_5000-8 10038.0 ± 0% 174.5 ± 2% -98.26% (p=0.002 n=6) Collection_Query_25000-8 50038.0 ± 0% 205.5 ± 2% -99.59% (p=0.002 n=6) Collection_Query_100000-8 200038.5 ± 0% 233.0 ± 1% -99.88% (p=0.002 n=6) geomean 8.661k 161.4 -98.14%
-
- Extended parameter validation (PR #50, #51)
- Simplified unit tests (PR #55)
- Improve
NewPersistentDB()
path handling (PR #56) - Improve loading of persistent DB (PR #57)
- Increased unit test coverage in various of the other listed PRs
Fixed
- Fixed path joining (PR #44)
Breaking changes
- Due to vectors now being normalized at the time of adding the document to the collection instead of when querying, the persisted data from prior versions is incompatible with this version (PR #47)