Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Create testset #63

Open
1 of 3 tasks
JasonLo opened this issue Aug 21, 2023 · 3 comments
Open
1 of 3 tasks

[Tests] Create testset #63

JasonLo opened this issue Aug 21, 2023 · 3 comments
Assignees

Comments

@JasonLo
Copy link
Collaborator

JasonLo commented Aug 21, 2023

Problem:

  • Some demo output is misbehaving?

To-dos:

  • Create testset from hackathon files
  • Populate ideal answers
  • Decide metric for testset: top-5 accuracy of hitting the relevant document object?
@JasonLo JasonLo self-assigned this Aug 21, 2023
@JasonLo
Copy link
Collaborator Author

JasonLo commented Aug 23, 2023

Asking for ideal answer source if available.

@JasonLo JasonLo changed the title [Tests] Validate demo responses [Tests] Create testset Aug 24, 2023
@JasonLo
Copy link
Collaborator Author

JasonLo commented Aug 25, 2023

Metrics

Consideration: Does the QA interface reduce the manual browsing time for a search? It's assumed that browsing time decreases when the target results appear higher in the rank.

Target metric: Hit rate on the manually labeled xDD object(s) in top1/top5/top10 results.

@iross
Copy link
Collaborator

iross commented Aug 28, 2023

Before we go too far down the road of evaluation, I definitely need to re-visit the database population process and make sure that all the relevant documents have been embedded and stored in weaviate. I'm concerned that the snapshot I pushed in doesn't include everything, and the most likely things to be missing are the single documents we manually brought into the system (as opposed to the bulk acquisition that forms the vast majority of the corpus).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants