Need a faster way to visualize the data #104

KShivendu · 2024-03-07T17:05:28Z

We have https://github.com/qdrant/vector-db-benchmark/blob/master/scripts/process-benchmarks.ipynb but it only prepares the data.

So web based interactive graphs would be nice. One can use plotly or dash framework.

Please use benchmarks.js as a reference. The logic for filterBestPoints is important to avoid clutter in graph.

It should look like qdrant.tech/benchmarks

aprabhak2 · 2024-04-15T23:20:16Z

@KShivendu, i was trying to run this benchmark for qdrant-rps-m16-ef128-glove-100-angular, and have the below JSON files in results folder. When i try to use this ipynb notebook, cell 17 gives the following error. Any help would be appreciated.

(vector-db-bench) [aprabh2]$ ls results
qdrant-rps-m-16-ef-128-glove-100-angular-search-0-2024-04-15-15-22-45.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-3-2024-04-15-15-23-51.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-6-2024-04-15-15-24-41.json
qdrant-rps-m-16-ef-128-glove-100-angular-search-1-2024-04-15-15-23-04.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-4-2024-04-15-15-24-08.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-7-2024-04-15-15-24-58.json
qdrant-rps-m-16-ef-128-glove-100-angular-search-2-2024-04-15-15-23-25.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-5-2024-04-15-15-24-25.json  
qdrant-rps-m-16-ef-128-glove-100-angular-upload-2024-04-15-15-22-26.json

cell17:

_search = search_df.reset_index()
_upload = upload_df.reset_index()

joined_df = _search.merge(_upload, on=["engine", "m", "ef", "dataset"], how="left", suffixes=("_search", "_upload"))
print(len(joined_df))
joined_df

ERROR:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_1302491/1721113676.py in ?()
----> 1 _search = search_df.reset_index()
      2 _upload = upload_df.reset_index()
      3 
      4 joined_df = _search.merge(_upload, on=["engine", "m", "ef", "dataset"], how="left", suffixes=("_search", "_upload"))

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/util/_decorators.py in ?(*args, **kwargs)
    307                     msg.format(arguments=arguments),
    308                     FutureWarning,
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, level, drop, inplace, col_level, col_fill)
   5844                     level_values = algorithms.take(
   5845                         level_values, lab, allow_fill=True, fill_value=lev._na_value
   5846                     )
   5847 
-> 5848                 new_obj.insert(0, name, level_values)
   5849 
   5850         new_obj.index = new_index
   5851         if not inplace:

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, loc, column, value, allow_duplicates)
   4439                 "'self.flags.allows_duplicate_labels' is False."
   4440             )
   4441         if not allow_duplicates and column in self.columns:
   4442             # Should this be a different kind of error??
-> 4443             raise ValueError(f"cannot insert {column}, already exists")
   4444         if not isinstance(loc, int):
   4445             raise TypeError("loc must be int")
   4446 

ValueError: cannot insert dataset, already exists

KShivendu · 2024-04-16T09:51:50Z

@aprabhak2 We just merged #125

It should be fixed now. Please try now and let us know if you face any issues.

aprabhak2 · 2024-04-24T20:43:23Z

First attempt at adding a plot. This can be added as a new cell to the end of the Notebook: https://github.com/qdrant/vector-db-benchmark/blob/master/scripts/process-benchmarks.ipynb

import json
import matplotlib.pyplot as plt

with open('results.json') as json_data:
    all_data = json.load(json_data)
    json_data.close()


xaxis="mean_precisions"
yaxis="rps"
dataset_name="glove-100-angular"
parallel=100.0
lower_is_better=False

engine_name_to_xy = {}

xpoints = []
ypoints = []
for curr_data in all_data:
    engine_name = curr_data['engine_name']
    if curr_data['dataset_name'] != dataset_name or curr_data['parallel'] != parallel:
        continue
    if engine_name not in engine_name_to_xy:
        engine_name_to_xy[engine_name]=[]
    engine_name_to_xy[engine_name].append((curr_data[xaxis],curr_data[yaxis]))

def check_better(x,y,lower_is_better):
    return lower_is_better if x < y else not lower_is_better

all_plts=[]
for engine_name, curr_xy_pts in engine_name_to_xy.items():
    curr_xy_pts.sort(key=lambda tup: tup[0], reverse=True)
    curr_x_pts=[]
    curr_y_pts=[]
    for idx, (x,y) in enumerate(curr_xy_pts):
        if idx == 0 or check_better(y,curr_y_pts[-1],lower_is_better):
            curr_y_pts.append(y)
            curr_x_pts.append(x)
    all_plts.append(plt.plot(curr_x_pts,curr_y_pts,label=engine_name,marker = 'o'))

plt.legend(loc="upper right")
plt.xlabel(xaxis)
plt.ylabel(yaxis)
plt.show()

KShivendu · 2024-06-05T08:52:55Z

@filipecosta90 would you be interested to pick this up :)

If yes, would be nice if we can do it with plotly to build interactive graphs.

filipecosta90 · 2024-06-05T08:54:09Z

@filipecosta90 would you be interested to pick this up :)

If yes, would be nice if we can do it with plotly to build interactive graphs.

sure. let me pick this one. should be able to devote some time to it at end of week

KShivendu changed the title ~~Need a simpler way to visualize the data~~ Need a faster way to visualize the data Mar 7, 2024

KShivendu added the good first issue Good for newcomers label Apr 3, 2024

KShivendu mentioned this issue Apr 16, 2024

refactor: Fix and simplify benchmark processing notebook #125

Merged

KShivendu added the enhancement New feature or request label Apr 19, 2024

KShivendu mentioned this issue Aug 5, 2024

[question] How to draw a picture like the one displayed on the README webpage #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a faster way to visualize the data #104

Need a faster way to visualize the data #104

KShivendu commented Mar 7, 2024 •

edited

Loading

aprabhak2 commented Apr 15, 2024 •

edited

Loading

KShivendu commented Apr 16, 2024

aprabhak2 commented Apr 24, 2024 •

edited

Loading

KShivendu commented Jun 5, 2024

filipecosta90 commented Jun 5, 2024

Need a faster way to visualize the data #104

Need a faster way to visualize the data #104

Comments

KShivendu commented Mar 7, 2024 • edited Loading

aprabhak2 commented Apr 15, 2024 • edited Loading

KShivendu commented Apr 16, 2024

aprabhak2 commented Apr 24, 2024 • edited Loading

KShivendu commented Jun 5, 2024

filipecosta90 commented Jun 5, 2024

KShivendu commented Mar 7, 2024 •

edited

Loading

aprabhak2 commented Apr 15, 2024 •

edited

Loading

aprabhak2 commented Apr 24, 2024 •

edited

Loading