* downgrading ubuntu version for github tests (ServiceNow#62)
* Llm api update (ServiceNow#59)
* getting rid of .invoke()
* adding an AbstractChatModel
* changing chat_api structure
* Reproducibility again (ServiceNow#61)
* core functions
* switch to dask
* removing joblib dependency and adding dask
* fixing imports
* handles multiple backends
* ensure asyncio loop creation
* more tests
* setting dashboard address to None
* minor
* Finally found a way to make it work
* initial reproducibility files
* Seems to be superflus
* adding a reproducibility journal
* minor update
* more robust
* adding reproducibility tools
* fix white listing
* minor
* minor
* minor
* minor
* minor fix
* more tests
* more results yay
* disabling this test
* update
* update
* black
* maybe fixing github workflow ?
* make get_git_username great again
* trigger change
* new browsergym
* GPT-4o result (and new comment column)
* Seems like there was a change to 4o flags, trying these
* minor comment
* better xray
* minor fix
* addming a comment field
* new agent
* another test with GPT-4o
* adding llama3 from openrouter
* fix naming
* unused import
* new summary tools and remove "_args" from columns in results
* add Llama
* initial code for reproducibility agent
* adjust inspect results
* infer from benchmark
* fix reproducibility agent
* prevent the repro_dir to be an index variable
* updating repro agent stats
* Reproducibility agent
* instructions to setup workarena
* fixing tests
* handles better a few edge cases
* default progress function to None
* minor formatting
* minor
* initial commit
* refactoring with Study class
* refactor to adapt for study class
* minor
* fix pricy test
* fixing tests
* tmp
* print report
* minor fix
* refine little details about reproducibility
* minor
* no need for set_temp anymore
* sanity check before running main
* minor update
* minor
* new results with 4o on workarena.l1
* sharing is caring
* add llama to main.py
* new hournal entry
* lamma 3 70B
* minor
* typo
* black fix (wasn't configured)
---------
Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]>
* version bump
---------
Co-authored-by: Alexandre Lacoste <[email protected]>