Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add func to generate multiple quries #1009

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

e7217
Copy link
Contributor

@e7217 e7217 commented Nov 27, 2024

description
Currently, the autorag generate one query per corpus during the QA generation stage. However, to more thoroughly validate performance, I wanted to explore the possibility of generating multiple query candidates. To this end, I have experimentally introduced a feature called multiple_queries_gen, which includes a parameter n to specify how many queries to generate. I have confirmed that it produces satisfactory output on my local machine.

If there was an intention behind not adding this feature, please let me know. Additionally, I acknowledge that the code I added may not be clean, so if there are any optimization suggestions, I would be happy to implement it.

Please review it at your convenience. Thank you.

qa = initial_corpus.sample(random_single_hop, n=len(initial_corpus.data), random_state=random.randint(1,100)).map(
        lambda df: df.reset_index(drop=True),
    ).make_retrieval_gt_contents().batch_apply(
        multiple_queries_gen,  # query generation
        llm=llm,
        lang="ko",
        n=3,
    ).batch_apply(
        make_basic_gen_gt,  # answer generation (basic)
        llm=llm,
        lang="ko",
    )....

sample
before:
image

after:
image

etc

  • add .vscode/ to .gitignore

@vkehfdl1
Copy link
Contributor

@e7217 Hi!

Thanks for the contribution.

I like the idea, but if we implement this feature, I want that feature can be used in the most of the query gen functions.
I think it will be okay to add this feature to llama_index_generate_base directly!

@e7217
Copy link
Contributor Author

e7217 commented Nov 27, 2024

@vkehfdl1

Thank you for checking.

I agree that improving it to be usable in most query functions is a good idea. However, at this stage, I wanted to take a more conservative approach to minimize the impact on the existing code you've already implemented. Would it be okay to do the additional work in the next phase?

@vkehfdl1
Copy link
Contributor

@e7217
Okay I agree. I will merge this issue after some adjustments. (for Linter and formatting)

And just do not close the issue in this PR.
Thank you!

@vkehfdl1 vkehfdl1 self-requested a review November 27, 2024 14:14
Copy link
Contributor

@vkehfdl1 vkehfdl1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great:)
Thanks for waiting my review

@vkehfdl1 vkehfdl1 enabled auto-merge (squash) November 29, 2024 12:43
@vkehfdl1 vkehfdl1 merged commit 7740a82 into Marker-Inc-Korea:main Nov 29, 2024
1 check passed
@e7217
Copy link
Contributor Author

e7217 commented Nov 29, 2024

close #1006

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants