Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a faster way to visualize the data #104

Open
KShivendu opened this issue Mar 7, 2024 · 5 comments
Open

Need a faster way to visualize the data #104

KShivendu opened this issue Mar 7, 2024 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@KShivendu
Copy link
Member

KShivendu commented Mar 7, 2024

We have https://github.com/qdrant/vector-db-benchmark/blob/master/scripts/process-benchmarks.ipynb but it only prepares the data.

So web based interactive graphs would be nice. One can use plotly or dash framework.

Please use benchmarks.js as a reference. The logic for filterBestPoints is important to avoid clutter in graph.

It should look like qdrant.tech/benchmarks

@KShivendu KShivendu changed the title Need a simpler way to visualize the data Need a faster way to visualize the data Mar 7, 2024
@KShivendu KShivendu added the good first issue Good for newcomers label Apr 3, 2024
@aprabhak2
Copy link

aprabhak2 commented Apr 15, 2024

@KShivendu, i was trying to run this benchmark for qdrant-rps-m16-ef128-glove-100-angular, and have the below JSON files in results folder. When i try to use this ipynb notebook, cell 17 gives the following error. Any help would be appreciated.

(vector-db-bench) [aprabh2]$ ls results
qdrant-rps-m-16-ef-128-glove-100-angular-search-0-2024-04-15-15-22-45.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-3-2024-04-15-15-23-51.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-6-2024-04-15-15-24-41.json
qdrant-rps-m-16-ef-128-glove-100-angular-search-1-2024-04-15-15-23-04.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-4-2024-04-15-15-24-08.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-7-2024-04-15-15-24-58.json
qdrant-rps-m-16-ef-128-glove-100-angular-search-2-2024-04-15-15-23-25.json  
qdrant-rps-m-16-ef-128-glove-100-angular-search-5-2024-04-15-15-24-25.json  
qdrant-rps-m-16-ef-128-glove-100-angular-upload-2024-04-15-15-22-26.json

cell17:

_search = search_df.reset_index()
_upload = upload_df.reset_index()

joined_df = _search.merge(_upload, on=["engine", "m", "ef", "dataset"], how="left", suffixes=("_search", "_upload"))
print(len(joined_df))
joined_df

ERROR:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_1302491/1721113676.py in ?()
----> 1 _search = search_df.reset_index()
      2 _upload = upload_df.reset_index()
      3 
      4 joined_df = _search.merge(_upload, on=["engine", "m", "ef", "dataset"], how="left", suffixes=("_search", "_upload"))

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/util/_decorators.py in ?(*args, **kwargs)
    307                     msg.format(arguments=arguments),
    308                     FutureWarning,
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, level, drop, inplace, col_level, col_fill)
   5844                     level_values = algorithms.take(
   5845                         level_values, lab, allow_fill=True, fill_value=lev._na_value
   5846                     )
   5847 
-> 5848                 new_obj.insert(0, name, level_values)
   5849 
   5850         new_obj.index = new_index
   5851         if not inplace:

/fastdata/01/aprabh2/anaconda3/envs/vector-db-bench/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, loc, column, value, allow_duplicates)
   4439                 "'self.flags.allows_duplicate_labels' is False."
   4440             )
   4441         if not allow_duplicates and column in self.columns:
   4442             # Should this be a different kind of error??
-> 4443             raise ValueError(f"cannot insert {column}, already exists")
   4444         if not isinstance(loc, int):
   4445             raise TypeError("loc must be int")
   4446 

ValueError: cannot insert dataset, already exists

@KShivendu
Copy link
Member Author

@aprabhak2 We just merged #125

It should be fixed now. Please try now and let us know if you face any issues.

@KShivendu KShivendu added the enhancement New feature or request label Apr 19, 2024
@aprabhak2
Copy link

aprabhak2 commented Apr 24, 2024

First attempt at adding a plot. This can be added as a new cell to the end of the Notebook: https://github.com/qdrant/vector-db-benchmark/blob/master/scripts/process-benchmarks.ipynb

import json
import matplotlib.pyplot as plt

with open('results.json') as json_data:
    all_data = json.load(json_data)
    json_data.close()


xaxis="mean_precisions"
yaxis="rps"
dataset_name="glove-100-angular"
parallel=100.0
lower_is_better=False

engine_name_to_xy = {}

xpoints = []
ypoints = []
for curr_data in all_data:
    engine_name = curr_data['engine_name']
    if curr_data['dataset_name'] != dataset_name or curr_data['parallel'] != parallel:
        continue
    if engine_name not in engine_name_to_xy:
        engine_name_to_xy[engine_name]=[]
    engine_name_to_xy[engine_name].append((curr_data[xaxis],curr_data[yaxis]))

def check_better(x,y,lower_is_better):
    return lower_is_better if x < y else not lower_is_better

all_plts=[]
for engine_name, curr_xy_pts in engine_name_to_xy.items():
    curr_xy_pts.sort(key=lambda tup: tup[0], reverse=True)
    curr_x_pts=[]
    curr_y_pts=[]
    for idx, (x,y) in enumerate(curr_xy_pts):
        if idx == 0 or check_better(y,curr_y_pts[-1],lower_is_better):
            curr_y_pts.append(y)
            curr_x_pts.append(x)
    all_plts.append(plt.plot(curr_x_pts,curr_y_pts,label=engine_name,marker = 'o'))

plt.legend(loc="upper right")
plt.xlabel(xaxis)
plt.ylabel(yaxis)
plt.show()

@KShivendu
Copy link
Member Author

@filipecosta90 would you be interested to pick this up :)

If yes, would be nice if we can do it with plotly to build interactive graphs.

@filipecosta90
Copy link
Contributor

@filipecosta90 would you be interested to pick this up :)

If yes, would be nice if we can do it with plotly to build interactive graphs.

sure. let me pick this one. should be able to devote some time to it at end of week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants