-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging ReQue and RePair #42
Comments
Hello @hosseinfani, I successfully integrated all the refiners and merged the ReQue project with the RePair project. With this, I introduced the Example:
Currently, it is distinct from T5, but my future plan is to include T5 as a refiner alongside others. In the refiner, we now have the AbstractQRefiner class, which generates the original query. I observed that in the main code, we treat the original query separately from the generated refined queries. I propose considering AbstractQRefiner as a refiner and calling it along with others. Moreover, I incorporated the semsim (Semantic Similarity) score as a mandatory score for all the refiners. It has been relocated from Backtranslation to the AbstractQRefiner class. After generating q', semsim is calculated and stored. I introduced After incorporating the refiners, I added the Query class. In the Dataset class, I implemented a function that reads all the queries from the dataset's path, creates a query object, and stores them in a list. The msmarco and aol child classes override this function according to their datasets. With this addition, RePair can now work with datasets like robust04, gov2, and others that were part of ReQue. The main pipeline structure has been modified according to the Query class. Although it can still be optimized, I anticipate it will eventually transition to using the Query class exclusively. I initially planned to include the search, eval, and other pipeline commands in the Query class as we discussed. However, I realized that keeping these functions in the Dataset might be more practical for accessing all queries, running with batches, and other functionalities. I am still deliberating on the most suitable architecture. Tasks for the future:
|
@DelaramRajaei Awesome! Thanks. |
Hey @hosseinfani , I wanted to provide you with a project update. Currently, the pipeline is operational, although I'm addressing some minor bugs related to reading different datasets. I've initiated backtranslation on two datasets, robust04 and dbpedia, across 10 languages. Below are the logs. |
Hey @hosseinfani, I wanted to give you an update on the project. The successful merger of ReQue and RePair is now complete. I have executed backtranslation for all five datasets, employing two IR rankers (BM25, QLD) and two evaluation metrics (MAP, MRR). Encountered challenges in loading different datasets, particularly with clueweb09b and gov2, which have split their queries across multiple trecs. Currently, the code reads all files, but I plan to modify it to run each trec separately and aggregate the results, following the approach used in the ReQue project. Presently, the project is running for all expanders for gov2 across various IR rankers and evaluation metrics. The log of the ongoing run is provided. The log file contains records for Backtranslation, Conceptnet, Thesaurus, Wordnet, and Tagme refiners. I have also updated the RePair_StoryBoard in the Query Refinement channel on Teams. In parallel, I am working on the query class and rag fusion, though there hasn't been significant success in those areas yet. I am ensuring the expanders run flawlessly and addressing other bugs. Additionally, a minor change has been made in the output structure. After creating a folder for each dataset, it will store the refined data there and subsequently store the results of the ranker and metric in a new folder within the dataset folder. Below is an overview of the file storage:
|
Hi @DelaramRajaei |
Hello, @hosseinfani I am currently facing issues with the While exploring slides on All other refiners are functioning well, providing results for the gov2 dataset. During fixing issues, I encountered a minor issue with the Currently, my focus is on resolving the issue with |
Hello @hosseinfani, I looked into a few more solutions to address the problem with the RelevanceFeedback refiner, but unfortunately, I couldn't find a successful fix. As a temporary measure, I'll stick to using only Anserini for this refiner until I come across a better solution. Here's the code snippet that utilizes Anserini:
Meantime, I discovered some resources that might be useful in resolving the issue. Anerini Pyserini Keywords Extraction Using TF-IDF Method I came across this tool called Spacerini (link), which combines features from Pyserini and the Hugging Face ecosystem. It provides a simple and user-friendly method for researchers to explore and analyze large text datasets through interactive search applications. I'm not certain if we'll use it, but it could be helpful down the line. |
@DelaramRajaei for code reference, you can paste the codeline permanent link at github like this:
|
Hello @hosseinfani, I've fixed the issues with RM3 and BertQ. Here's a brief overview of the changes: RM3: BertQ: Both refiners are now working, and I've stored their results. While two other refiners (adoptonfields and onfields) are pending, my focus is currently on implementing Rag fusion and creating dense indexes to compare results with the existing refiners. Other helpful links: |
Hello @hosseinfani, I've stored the outcomes of the refinement process applied to Rapir across all five datasets (robust04, gov2, antique, dbpedia, clueweb09b), where Sparse indices were available. Additionally, I've updated the Rapir's storyboard on Teams. There have been changes to the pipeline, with the addition of more commands:
Additionally, several minor updates have been made to the Rapir project:
Currently, my focus is on working on rag-fusion. |
This is the issue where I log all my processes while adding ReQue's expanders to RePair.
The text was updated successfully, but these errors were encountered: