from n2 import HnswIndex
import random
f = 40
t = HnswIndex(f) # HnswIndex(f, "L2, euclidean, or angular")
for i in xrange(1000):
v = [random.gauss(0, 1) for z in xrange(f)]
t.add_data(v)
t.build(m=5, max_m0=10, n_threads=4)
t.save('test.hnsw')
u = HnswIndex(f)
u.load('test.hnsw')
print(u.search_by_id(0, 1000))
Note that if a user passes a negative value, it will be set to a default value when the metric has the default value.
HnswIndex(dim, metric)
: returns a new Hnsw indexdim
(int): dimension of vectorsmetric
(string): an optional parameter for choosing a metric of distance. (‘L2’|'euclidean'|‘angular’)
index.add_data(v)
: adds vectorv
v
(list of float): a vector with dimensiondim
index.build(m, max_m0, ef_construction, n_threads, mult, neighbor_selecting, graph_merging)
: builds a hnsw graph with given configurations.m
(int): max number of edges for nodes at level>0 (default=12)max_m0
(int): max number of edges for nodes at level==0 (default=24)ef_construction
(int): efConstruction (see HNSW paper…) (default=150)n_threads
(int): number of threads for building indexmult
(float): level multiplier (recommend: use default value) (default=1/log(1.0*M))neighbor_selecting
(string): neighbor selecting policy- available values
"heuristic"
(default): select neighbors using algorithm4 on HNSW paper (recommended)"naive"
: select closest neighbors (not recommended)
- available values
graph_merging
(string): graph merging heuristic- available values
"skip"
(default): do not merge (recommended for large scale of data(over 10M))"merge_level0"
: build an another graph in reverse order. then merge edges of level0 (recommended for under 10M scale data)
- available values
index.save(fname)
: saves the index to a diskfname
(string)
index.load(fname)
: loads an index from a disk.fname
(string) -use_mmap
(bool): An optional parameter, default value is true. If this parameter is set, N2 loads model through mmap.
index.unload()
: unloads (unmap)index.search_by_id(item_id, k, ef_serach=-1, include_distances=False)
: returnsk
nearest items.item_id
(int)k
(int)ef_search
(int): default value = 50 * kinclude_distances
(boolean): If you set this argument to True, it will return a list of tuples((item_id, distance)
).
index.search_by_vector(v, k, ef_serach=-1, include_distances=False)
: returnsk
nearest items.v
(list of float): a query vectork
(int)ef_search
(int): default value = 50 * kinclude_distances
(boolean): If you set this argument to True, it will return a list of tuples((item_id, distance)
).