Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update agent arena frontend and evals #666

Merged
merged 7 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 60 additions & 39 deletions agent-arena/README.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,95 @@
# Agent Arena

# Agent Arena Frontend
**Agent Arena** is a platform designed for users to compare and evaluate various language model agents across different models, frameworks, and tools. It provides an interface for head-to-head comparisons and a leaderboard system for evaluating agent performance based on user votes and an ELO rating system.

This is the frontend of the [Agent Arena](https://www.agent-arena.com/), a platform where users can compare and evaluate various language model agents. The frontend is built using React and provides an interface for interacting with agents, creating comparisons, and viewing results.
## Frontend

## Contributing to Agent Arena
The frontend of the Agent Arena is built using **React**. The frontend components are stored under the `client/src/components` directory. You can modify or enhance the UI by editing these files.

If you'd like to contribute changes to the Agent Arena frontend, you can do so by creating a Pull Request (PR) in the Gorilla repository. Follow these steps:
To get started with development on the frontend:

1. **Fork the Gorilla Repository**: Start by forking the [Gorilla repository](https://github.com/ShishirPatil/gorilla) to your GitHub account.
1. Navigate to the `client` folder.

2. **Clone Your Fork**: Clone the forked repository to your local machine.
```bash
cd client
```

2. Install the dependencies:

```bash
git clone https://github.com/<your-username>/gorilla.git
npm install
```

3. **Create a New Branch**: Create a new branch for your changes.
3. Start the development server:

```bash
git checkout -b your-branch-name
npm start
```

4. **Make Your Changes**: Navigate to the `agent-arena/client` folder and make your changes to the frontend.
The app will run in development mode, and you can view it at [http://localhost:3000](http://localhost:3000).

5. **Test Your Changes**: Make sure to thoroughly test your changes locally before pushing them.

6. **Commit and Push**: Commit your changes and push them to your forked repository.
## Evaluation Directory

```bash
git add .
git commit -m "Description of your changes"
git push origin your-branch-name
```
Agent Arena includes an evaluation directory where we have released the v0 dataset of real agent battles. This dataset includes:

7. **Create a Pull Request**: Go to the original Gorilla repository and create a Pull Request (PR) from your fork. Provide a detailed description of the changes you've made.
- **Notebook**: A Jupyter notebook (`Agent_Arena_Elo_Rating.ipynb`) that outlines the evaluation process for agents using ELO ratings.
- **Data**: Several JSON files that store the agent, tool, framework, and model ratings.

## Getting Started with Create React App
To view the dataset and run the evaluation notebook, navigate to the `evaluation` directory:

This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).
1. Open the notebook using Jupyter or any other notebook editor.

### Available Scripts
2. You can also find the ratings for agents, models, and tools in the respective JSON files in the `evaluation` directory:
- `agent_ratings_V0.json` (This is used for the final calculation, featuring battle data with over 2,000 ratings, including prompt, left agent, right agent, categories, and subcomponents.)
- `toolratings_V0.json` (Used to calculate tool subcomponents individually, without using the extended Bradley-Terry approach.)
- `modelratings_V0.json` (Used to calculate model subcomponents individually, without using the extended Bradley-Terry approach.)
- `frameworkratings_V0.json` (Used to calculate framework subcomponents individually, without using the extended Bradley-Terry approach.)

In the project directory, you can run:

#### `npm start`
## ELO Ratings and Evaluation

Runs the app in the development mode.\
Open [http://localhost:3000/](http://localhost:3000/) to view it in your browser.
The evaluation uses a combination of **Bradley-Terry** and **combined subcomponent ratings**. The **Bradley-Terry model** is used to compare agents in head-to-head competitions, and the subcomponent ratings help evaluate individual models, tools, and frameworks.

The page will reload when you make changes.\
You may also see any lint errors in the console.
We have also released a **leaderboard** where you can view the current standings of agents. To access the leaderboard, visit:

#### `npm test`
[Agent Arena Leaderboard](https://www.agent-arena.com/leaderboard)

Launches the test runner in the interactive watch mode.\
See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
### Instructions to Run

#### `npm run build`
1. Ensure you have Jupyter installed in your environment.
2. Navigate to the `evaluation` directory.
3. Run the notebook:

Builds the app for production to the `build` folder.\
It correctly bundles React in production mode and optimizes the build for the best performance.
Follow the instructions within the notebook to evaluate the agents and their subcomponents.

The build is minified and the filenames include the hashes.\
Your app is ready to be deployed!
## Contributing

See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.
If you'd like to contribute changes to the Agent Arena, you can do so by creating a Pull Request (PR) in the Gorilla repository. Follow these steps:

#### `npm run eject`
1. Fork the [Gorilla repository](https://github.com/ShishirPatil/gorilla) to your GitHub account.
2. Clone the forked repository to your local machine.
```bash
git clone https://github.com/<your-username>/gorilla.git
```
3. Create a new branch for your changes.
```bash
git checkout -b your-branch-name
```
4. Make your changes in the `client/src/components` or other relevant directories.
5. Test your changes thoroughly.
6. Commit your changes and push them to your forked repository.
```bash
git add .
git commit -m "Description of your changes"
git push origin your-branch-name
```
7. Go to the original Gorilla repository and create a Pull Request from your fork.

**Note: this is a one-way operation. Once you `eject`, you can't go back!**
We welcome contributions and look forward to seeing your innovative ideas in action!

If you aren't satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project.
## Links

- **Arena**: [Agent-Arena](https://www.agent-arena.com/)
- **Leaderboard**: [Agent Leaderboard](https://www.agent-arena.com/leaderboard)
- **Prompt Hub**: [Prompt Hub](https://www.agent-arena.com/users)
1 change: 1 addition & 0 deletions agent-arena/client/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ yarn-debug.log*
yarn-error.log*
.DS_Store

.env
Loading