Skip to content

Latest commit

 

History

History

commerce-product-ranking

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
#Vespa

Vespa Product Ranking

This sample application is used to demonstrate how to improve Product Search with Learning to Rank (LTR).

Blog post series:

This work uses the largest product relevance dataset released by Amazon:

We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to the query. Each query-product pair is accompanied by additional information. The dataset is multilingual, as it contains queries in English, Japanese, and Spanish.

The dataset is found at amazon-science/esci-data. The dataset is released under the Apache 2.0 license.

Quick start

The following is a quick start recipe on how to get started with this application.

  • Docker Desktop installed and running. 6 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
  • zstd: brew install zstd
  • Python3 with requests pyarrow and pandas installed

Validate Docker resource settings, should be minimum 6 GB:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Download this sample application:

$ vespa clone commerce-product-ranking my-app && cd my-app

Download cross-encoder model:

$ curl -L -o application/models/title_ranker.onnx \
    https://data.vespa-cloud.com/sample-apps-data/title_ranker.onnx

See scripts/export-bi-encoder.py and scripts/export-cross-encoder.py for how to export models from PyTorch to ONNX format.

Deploy the application:

$ vespa deploy --wait 600 application

If the above fails, check the logs:

$ docker logs vespa

Deployment note

It is possible to deploy this app to Vespa Cloud.

Run basic system test

This step is optional, but it indexes two documents and runs a query test

$ (cd application; vespa test tests/system-test/feed-and-search-test.json)

Indexing sample product data

Download the pre-processed sample product data for 16 products:

$ zstdcat sample-data/sample-products.jsonl.zstd | vespa feed -

Evaluation

Evaluate the semantic-title rank profile using the evaluation script (scripts/evaluate.py).

Install requirements

pip3 install numpy pandas pyarrow requests
$ python3 scripts/evaluate.py \
  --endpoint http://localhost:8080/search/ \
  --example_file sample-data/test-sample.parquet \
  --ranking semantic-title

evaluate.py runs all the queries in the test split using the --ranking <rank-profile> and produces a <ranking>.run file with the top ranked results. This file is formatted in the format that trec_eval expects.

$ cat semantic-title.run

Example ranking produced by Vespa using the semantic-title rank-profile for query 535:

535 Q0 B08PB9TTKT 1 0.46388297538130346 semantic-title
535 Q0 B00B4PJC9K 2 0.4314163871097326 semantic-title
535 Q0 B0051GN8JI 3 0.4199624989861286 semantic-title
535 Q0 B084TV3C1B 4 0.4177780086570998 semantic-title
535 Q0 B08NVQ8MZX 5 0.4175260475587483 semantic-title
535 Q0 B00DHUA9VA 6 0.41558328517364673 semantic-title
535 Q0 B08SHMLP5S 7 0.41512211873088406 semantic-title
535 Q0 B08VSJGP1N 8 0.41479904241634674 semantic-title
535 Q0 B08QGZMCYQ 9 0.41107229418202607 semantic-title
535 Q0 B0007KPRIS 10 0.4073851390694049 semantic-title
535 Q0 B08VJ66CNL 11 0.4040355668337184 semantic-title
535 Q0 B000J1HDWI 12 0.40354871020728317 semantic-title
535 Q0 B0007KPS3C 13 0.39775755175088207 semantic-title
535 Q0 B0072LFB68 14 0.39334250744409155 semantic-title
535 Q0 B01M0SFMIH 15 0.3920197770681833 semantic-title
535 Q0 B0742BZXC2 16 0.3778094352830984 semantic-title

This run file can then be evaluated using the trec_eval utility.

Download a pre-processed query-product relevance judgments in TREC format:

$  curl -L -o test.qrels \
    https://data.vespa-cloud.com/sample-apps-data/test.qrels

Install trec_eval (your mileage may vary):

git clone --depth 1 --branch v9.0.8 https://github.com/usnistgov/trec_eval && cd trec_eval && make install && cd ..

Run evaluation :

$ trec_eval test.qrels semantic-title.run -m 'ndcg.1=0,2=0.01,3=0.1,4=1'

This particular product ranking for the query produces a NDCG score of 0.7046. Note that the sample-data/test-sample.parquet file only contains one query. To get the overall score, one must compute all the NDCG scores of all queries in the test split and report the average NDCG score.

Note that the evaluation uses custom NDCG label gains:

  • Label 1 is Irrelevant with 0 gain
  • Label 2 is Supplement with 0.01 gain
  • Label 3 is Complement with 0.1 gain
  • Label 4 is Exact with 1 gain

We can also try another ranking model:

$ python3 scripts/evaluate.py \
  --endpoint http://localhost:8080/search/ \
  --example_file sample-data/test-sample.parquet \
  --ranking cross-title
$ trec_eval test.qrels cross-title.run -m 'ndcg.1=0,2=0.01,3=0.1,4=1'

Which for this query produces a NDCG score of 0.8208, better than the semantic-title model.

Shutdown and remove the Docker container

$ docker rm -f vespa

Full evaluation

Download a pre-processed feed file with all (1,215,854) products:

$  curl -L -o product-search-products.jsonl.zstd \
    https://data.vespa-cloud.com/sample-apps-data/product-search-products.jsonl.zstd

This step is resource intensive as the semantic embedding model encodes the product title and description into the dense embedding vector space.

$ zstdcat product-search-products.jsonl.zstd | vespa feed -

Evaluate the hybrid baseline rank profile using the evaluation script (scripts/evaluate.py).

$ python3 scripts/evaluate.py \
  --endpoint http://localhost:8080/search/ \
  --example_file "https://github.com/amazon-science/esci-data/blob/main/shopping_queries_dataset/shopping_queries_dataset_examples.parquet?raw=true" \
  --ranking semantic-title

For Vespa cloud deployments we need to pass certificate and the private key.

$ python3 scripts/evaluate.py \
  --endpoint https://productsearch.samples.aws-us-east-1c.perf.z.vespa-app.cloud/search/ \
  --example_file "https://github.com/amazon-science/esci-data/blob/main/shopping_queries_dataset/shopping_queries_dataset_examples.parquet?raw=true" \
  --ranking semantic-title \
  --cert <path-to-data-plane-cert.pem> \
  --key <path-to-data-plane-private-key.pem>

Run evaluation using trec_eval:

$ trec_eval test.qrels semantic-title.run -m 'ndcg.1=0,2=0.01,3=0.1,4=1