MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

d1shs0ap · 2021-12-30T19:46:56Z

This is the task I replicated: https://github.com/capreolus-ir/capreolus/blob/feature/msmarco_psg/docs/reproduction/MS_MARCO.md, by following docs/reproduction/sample_slurm_script.sh.

Findings

"Mini" version

The task did not finish with the recommended time and compute settings, i.e. the following configs:

After trying these configs(entire node), MRR@10=0.283 was achieved, slightly below the 0.295 given in the docs (finished in 21h)

"Full" version

MRR@10=0.346 was achieved as opposed to the expected MRR@10=0.35+, with the following configs(entire node): (finished in 42h)

The text was updated successfully, but these errors were encountered:

crystina-z · 2022-01-02T15:20:25Z

Hi @d1shs0ap, thanks for helping to replicate. The link to the two config file seems to be broken, would u mind paste them into the issue? Also which commit are we using? Thanks!

d1shs0ap · 2022-01-03T02:21:45Z

Hey @crystina-z, updated the config screenshots. The commit that I ran the experiments on is e10928f. Thank you!

d1shs0ap · 2022-01-08T08:43:31Z

Hi @crystina-z, any updates on this issue?

crystina-z · 2022-01-09T18:48:02Z

hi @d1shs0ap sorry for the waiting, it took a long while for me to realize that it's missing one line to specify the decay rate in the config file - appending the reranker.trainer.decay=0.1 to the end of config should gives MRR@10 0.35+. I'll update it in the next PR. lmk if the issue is still there after adding this.

Thanks again for pointing this issue out!

d1shs0ap · 2022-01-10T00:59:40Z

Ok great thanks, I'll test it out now!

d1shs0ap · 2022-01-16T08:17:32Z

Hey @crystina-z the experiment just finished, here are the results:

Mini version: MRR@10=0.293
Full version: MRR@10=0.347
Commit: e9cf9a6

Should I run the experiment again, with the latest commits?

crystina-z · 2022-01-16T15:18:52Z

hi @d1shs0ap that would be nice. tho before that could u share the config file and command you used to run the scripts, just in case I missed anything there.

d1shs0ap · 2022-01-17T05:46:12Z

@crystina-z Here's the config file:

optimize=MRR@10
threshold=100
testthreshold=1

benchmark.name=msmarcopsg
rank.searcher.name=msmarcopsgbm25

reranker.name=TFBERTMaxP
reranker.pretrained=bert-base-uncased

reranker.extractor.usecache=True
reranker.extractor.numpassages=1
reranker.extractor.maxseqlen=512
reranker.extractor.maxqlen=50
reranker.extractor.tokenizer.pretrained=bert-base-uncased

reranker.trainer.usecache=True
reranker.trainer.niters=1
reranker.trainer.batch=4
reranker.trainer.evalbatch=256
reranker.trainer.itersize=48000
reranker.trainer.warmupiters=1
reranker.trainer.decay=0.1
reranker.trainer.decayiters=1
reranker.trainer.decaytype=linear

reranker.trainer.loss=pairwise_hinge_loss

I first ran

ENVDIR=$HOME/venv/capreolus-env
source $ENVDIR/bin/activate
module load java/11
module load python/3.7
module load scipy-stack

in the terminal, then sbatch docs/reproduction/sample_slurm_script.sh, which is the following:

#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=0
#SBATCH --time=48:00:00
#SBATCH --account=$SLURM_ACCOUNT
#SBATCH --cpus-per-task=32

#SBATCH -o ./msmarco-psg-output.log

niters=10
batch_size=16
validatefreq=$niters # to ensure the validation is run only at the end of training
decayiters=$niters   # either same with $itersize or 0
threshold=1000       # the top-k documents to rerank

python -m capreolus.run rerank.train with \
	file=docs/reproduction/config_msmarco.txt  \
	threshold=$threshold \
	reranker.trainer.niters=$niters \
	reranker.trainer.batch=$batch_size \
	reranker.trainer.decayiters=$decayiters \
	reranker.trainer.validatefreq=$validatefreq \
	fold=s1

I should also mention that this is ran on the forked repository nimasadri11/capreolus. Thanks!

d1shs0ap · 2022-01-25T05:41:45Z

@crystina-z Retrained with latest changes and got MRR@10=0.351. However, I ran this experiment on the nimasadri11 fork. Should I add a pull request on that fork? (Currently waiting for the experiment results for the original repo)

crystina-z · 2022-01-25T15:40:58Z

@d1shs0ap thanks for the update! yea for this issue let's wait for the result on the original repo for now? feel free to add another PR to nima's fork as well. thanks!

d1shs0ap · 2022-01-27T07:55:57Z

@crystina-z The latest MRR I got after running on the original repo is 0.3496, is that good enough? Here's the output:

�[2;37m2022-01-26 13:41:35,370 - �[0m�[32mINFO - capreolus.trainer.tensorflow.train - dev metrics: MRR@10=0.350 P_1=0.230 P_10=0.064 P_20=0.036 P_5=0.105 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.354 ndcg_cut_10=0.410 ndcg_cut_20=0.431 ndcg_cut_5=0.375 recall_100=0.814 recall_1000=0.853 recip_rank=0.359�[0m
�[2;37m2022-01-26 13:41:35,399 - �[0m�[32mINFO - capreolus.trainer.tensorflow.train - new best dev metric: 0.3496�[0m

crystina-z · 2022-01-27T15:27:04Z

@d1shs0ap the score still looks a bit lowish to me tho. maybe let's PR the record to nima's branch and I'll check the score here.

Could u please share your version of the transformers and all tensorflow related packages? Thanks so much!

d1shs0ap · 2022-01-27T18:00:16Z

@crystina-z Hey I made the PR to Nima's branch, below are my package versions:

tensorboard==2.7.0
tensorboard-data-server==0.6.1+computecanada
tensorboard-plugin-wit==1.8.0+computecanada
tensorflow==2.4.1+computecanada
tensorflow-addons==0.13.0+computecanada
tensorflow-datasets==4.4.0
tensorflow-estimator==2.4.0+computecanada
tensorflow-hub==0.12.0+computecanada
tensorflow-io-gcs-filesystem==0.22.0+computecanada
tensorflow-metadata==1.5.0
tensorflow-model-optimization==0.7.0
tensorflow-ranking==0.4.2
tensorflow-serving-api==2.7.0
tf-models-official==2.5.0
tf-slim==1.1.0

and

transformers==4.6.0

Thanks!

andrewyates assigned crystina-z Jan 2, 2022

crystina-z added a commit to nimasadri11/capreolus that referenced this issue Jan 9, 2022

add decay in config_msmarco.txt; Issue capreolus-ir#200

3758e5a

crystina-z mentioned this issue Jan 16, 2022

Reproduction of model training #201

Merged

d1shs0ap mentioned this issue Jan 27, 2022

Msmarco repro nimasadri11/capreolus#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

d1shs0ap commented Dec 30, 2021 •

edited

Loading

crystina-z commented Jan 2, 2022

d1shs0ap commented Jan 3, 2022

d1shs0ap commented Jan 8, 2022

crystina-z commented Jan 9, 2022

d1shs0ap commented Jan 10, 2022

d1shs0ap commented Jan 16, 2022 •

edited

Loading

crystina-z commented Jan 16, 2022 •

edited

Loading

d1shs0ap commented Jan 17, 2022 •

edited

Loading

d1shs0ap commented Jan 25, 2022 •

edited

Loading

crystina-z commented Jan 25, 2022

d1shs0ap commented Jan 27, 2022

crystina-z commented Jan 27, 2022

d1shs0ap commented Jan 27, 2022 •

edited

Loading

MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

Comments

d1shs0ap commented Dec 30, 2021 • edited Loading

Findings

"Mini" version

"Full" version

crystina-z commented Jan 2, 2022

d1shs0ap commented Jan 3, 2022

d1shs0ap commented Jan 8, 2022

crystina-z commented Jan 9, 2022

d1shs0ap commented Jan 10, 2022

d1shs0ap commented Jan 16, 2022 • edited Loading

crystina-z commented Jan 16, 2022 • edited Loading

d1shs0ap commented Jan 17, 2022 • edited Loading

d1shs0ap commented Jan 25, 2022 • edited Loading

crystina-z commented Jan 25, 2022

d1shs0ap commented Jan 27, 2022

crystina-z commented Jan 27, 2022

d1shs0ap commented Jan 27, 2022 • edited Loading

d1shs0ap commented Dec 30, 2021 •

edited

Loading

d1shs0ap commented Jan 16, 2022 •

edited

Loading

crystina-z commented Jan 16, 2022 •

edited

Loading

d1shs0ap commented Jan 17, 2022 •

edited

Loading

d1shs0ap commented Jan 25, 2022 •

edited

Loading

d1shs0ap commented Jan 27, 2022 •

edited

Loading