You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
It's a very interesting work and I try to reproduce the results in your paper. However, I run the code in the readme file with the default config, namely images_all_exemplars.py, and I get some unfamiliar results.
When I run $ python -m emergent_in_context_learning.experiment.experiment --config $PATH_TO_CONFIG --logtostderr --config.one_off_evaluate --config.restore_path $CKPT_DIR --jaxline_mode eval_fewshot_holdout
I1117 09:06:41.539124 140185420793280 data_generators.py:241] Zipf exponent: 0
I1117 09:06:41.539163 140185420793280 data_generators.py:242] Use Zipf for common/rare: False
I1117 09:06:41.539418 140185420793280 data_generators.py:243] Noise scale: 0
I1117 09:07:17.796816 140185420793280 data_generators.py:241] Zipf exponent: 0
I1117 09:07:17.796856 140185420793280 data_generators.py:242] Use Zipf for common/rare: False
I1117 09:07:17.796890 140185420793280 data_generators.py:243] Noise scale: 0
I1117 09:07:17.933558 140185420793280 utils.py:590] Returned checkpoint latest with id 0.
I1117 09:08:14.875151 140185420793280 experiment.py:552] [Step 500000] eval_loss=6.79, eval_accuracy=0.27
Thanks for your question! The eval metrics that are reported in the paper are accuracy_query for in-weights learning (evaluated only for the query prediction) and accuracy_closed for in-context learning (i.e. only across the labels actually observed in context; see Sectopm 2.3 in the paper) -- what were the numbers for eval_fewshot_holdout?
Unfortunately, the transformer model that is available here is not exactly the one we run internally, because we have some internal dependencies that I wasn't able to opensource. This external implementation has not been fully tested, and may drive some discrepancies unfortunately. My recommendation is to incorporate the data generators into your favorite transformer train/eval framework. Hope that helps!
@yfzhang114 not sure if you've figured this out. I was trying to reproduce this work on my end here. In short, I think what happens is that the batch norm stats used in ResNet is not synchronized among multiple devices. If we do synchronize it the result is very different.
Below is an experiment where I set P(bursty) = 1, with async meaning the former and sync being the latter.
Hi!
It's a very interesting work and I try to reproduce the results in your paper. However, I run the code in the readme file with the default config, namely images_all_exemplars.py, and I get some unfamiliar results.
When I run $ python -m emergent_in_context_learning.experiment.experiment --config $PATH_TO_CONFIG --logtostderr --config.one_off_evaluate --config.restore_path $CKPT_DIR --jaxline_mode eval_fewshot_holdout
Considering the in-weight learning, $ python -m emergent_in_context_learning.experiment.experiment --config $PATH_TO_CONFIG --logtostderr --config.one_off_evaluate --config.restore_path $CKPT_DIR --jaxline_mode eval_no_support_zipfian
the performance of in-weight learning is 0.63, which is higher than in-context learning.
The text was updated successfully, but these errors were encountered: