-
Notifications
You must be signed in to change notification settings - Fork 287
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
326 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Streamlit + W&B for LLM Annotation | ||
|
||
Try W&B Tables and Streamlit data editor to annotate data for LLMs. | ||
|
||
This repo builds a simple app for labeling and annotating tables. These tables can be sample text outputs from developing LLMs. For example, you can test a set of sample prompts, get the LLM responses, and then save those results to be annotated in this workflow. | ||
|
||
Here's a quick overview: | ||
1. Set up a virtual environment and install dependencies | ||
2. Run the Streamlit App locally, and customize the UI | ||
3. Annotate and version tables of data using W&B + Streamlit | ||
|
||
## 1. Set Up Environment | ||
Start your virtual environment, and install all dependencies from the requirements file. This simple app just uses `pandas` and `streamlit`. | ||
```shell | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## 2. Run the Streamlit App | ||
Open the `wandb_streamlit_app.py` file, take a look at the app definition, and make edits. | ||
Edit the column configuration, as defined [in the Streamlit docs.](https://docs.streamlit.io/library/api-reference/data/st.column_config) | ||
|
||
### Optional: Apply a custom theme | ||
|
||
Create a hidden `.streamlit` folder in the root of the project with the following command: | ||
```shell | ||
mkdir .streamlit | ||
``` | ||
|
||
Copy the `config.toml` file to the hidden folder and edit as desired. This will apply that theme to the Streamlit app as you run it. There are multiple ways to work with custom themes but this is one of the simplest ways. | ||
- [This video tutorial](https://www.youtube.com/watch?v=Mz12mlwzbVU) provides a nice walkthrough of creating and applying custom themes | ||
- W&B color palette detail can be found [here](https://congenial-broccoli-daa12ae2.pages.github.io/?path=%2Fstory%2Fcommon-colors--overview) | ||
|
||
Finally, run the Streamlit app on localhost: | ||
```shell | ||
streamlit run wandb_streamlit_app.py | ||
``` | ||
From there, you can edit the columns that have been configured for labeling: | ||
![Screenshot 2024-02-06 at 5 16 48 PM](https://github.com/wandb/annotation_streamlit/assets/14133187/0439eb5f-1d1a-4495-bad7-c57a94ce7563) | ||
|
||
## 3. Annotate and Version Tables | ||
|
||
This is what the data looks like from the CSV of sample LLM inputs and outputs: | ||
<img width="1000" alt="sample table" src="https://github.com/wandb/annotation_streamlit/assets/6355078/bea1105e-cc65-4bbb-b899-e99bcb2220cb"> | ||
|
||
Once loaded as a W&B Table, we have a clean, annotated version of this workflow, complete with metadata, in our system of record: | ||
![Screenshot 2024-02-06 at 5 06 25 PM](https://github.com/wandb/annotation_streamlit/assets/14133187/aa404c53-b934-47d0-a29b-c7e2cbef27a1) | ||
|
||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
pandas==2.1.2 | ||
streamlit==1.28.1 |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,234 @@ | ||
import pandas as pd | ||
import streamlit as st | ||
|
||
from io import StringIO | ||
|
||
st.set_page_config( | ||
page_title="W&B Annotation", | ||
# page_icon="🧊", | ||
layout="wide", | ||
) | ||
st.markdown('''<style> | ||
a { | ||
color: #13A9BA !important; | ||
text-decoration: none; | ||
transition: .3s; | ||
} | ||
a:hover { | ||
color: #0097AB !important; | ||
text-decoration: none; | ||
} | ||
.st-emotion-cache-187vp62 p { | ||
margin-bottom: .5rem; | ||
} | ||
ol { | ||
padding-left: 1.5rem; | ||
} | ||
</style>''', unsafe_allow_html=True) | ||
|
||
st.image("images/wandb-streamlit-logo.png") | ||
st.title('Custom annotation of W&B Tables with the Streamlit data editor') | ||
st.write('Summarization can be a critical but challenging language modeling task, with varying manual and automated approaches that prove hard to evaluate and compare. [Weights & Biases]("https://wandb.ai/site") helps machine learning practitioners log summary inputs and results from multiple experimental approaches and interrogate and evaluate those results effectively at scale. [Streamlit’s data editor]("https://docs.streamlit.io/library/api-reference/data/st.data_editor"), showcased in this application, helps teams responsible for annotating modeling results for interim and final assessments in language modeling pipelines interact with these results and revise them for downstream tasks (e.g., creating gold standard examples or fine-tuning).') | ||
|
||
st.write('This application takes an input csv file of news articles and automated summaries using [BART]("https://huggingface.co/facebook/bart-large-cnn") and [BART-SAMSUM]("https://huggingface.co/philschmid/bart-large-cnn-samsum") and allows a user to smoothly evaluate summaries from multiple approaches. Specifically, a user can:') | ||
|
||
st.markdown(''' | ||
1. select whether an automated summary requires adjustment (***needs_revision** column*) | ||
2. enter a suggested summary approach from a drop-down menu (***approach** column*) | ||
3. enter a manual edition of the current summary (***manual_edition** column*) | ||
4. add optional comments for the revision (***comments** column*) | ||
''') | ||
|
||
st.write('Users can upload two separate files to compare experimental results from different approaches and annotate dynamically in-app.') | ||
|
||
with st.expander("See additional resources"): | ||
|
||
st.link_button("Annotation Colab notebook", "https://colab.research.google.com/drive/133sV8VgY5wftiDpjIT3UnNcvBwTQzfSa?usp=sharing") | ||
st.link_button("W&B project for Tables integration", "https://wandb.ai/claire-boetticher/news_summarization?workspace=user-claire-boetticher") | ||
|
||
# first dataframe with manual revision columns | ||
|
||
uploaded_file = st.file_uploader("Choose a file for summary review (Approach 1)", key="bart") | ||
if uploaded_file is not None: | ||
|
||
# Read csv as dataframe | ||
dataframe = pd.read_csv(uploaded_file) | ||
dataframe.rename(columns={'Unnamed': 'Row'}, inplace=True) | ||
|
||
# Reorder index from random sample row numbers - optional | ||
# dataframe.reset_index(drop=True) | ||
|
||
# Add empty column for free text comments | ||
dataframe['needs_revision'] = '' | ||
dataframe['approach'] = '' | ||
dataframe['manual_edition'] = '' | ||
dataframe['comments'] = '' | ||
|
||
# Display dataframe in app | ||
bart_edited_df = st.data_editor(dataframe, | ||
column_config={ | ||
"needs_revision": st.column_config.CheckboxColumn( | ||
"needs_revision", | ||
help="Does the Summary column need to be changed?", | ||
width="medium", | ||
default=False, | ||
required=True, | ||
), | ||
"approach": st.column_config.SelectboxColumn( | ||
"approach", | ||
help="Select approach for revising summary", | ||
width="medium", | ||
options=[ | ||
"Manual edit", | ||
"Adjust model", | ||
"Other (add suggestion in Comments column)" | ||
], | ||
required=False, | ||
), | ||
"manual_edition": st.column_config.TextColumn( | ||
"manual_edition", | ||
help="Enter new summary", | ||
width="medium", | ||
required=False, | ||
), | ||
"comments": st.column_config.TextColumn( | ||
"comments", | ||
help="Describe reasoning for editing the current assigned Summary value", | ||
width="large", | ||
required=False, | ||
), | ||
}, | ||
# disabled freezes columns so users cannot change the values | ||
disabled=("articles", "bart_summaries", "source_word_count", "summary_word_count", "source_lexical_diversity", "summary_lexical_diversity"), | ||
hide_index=True, | ||
column_order=("articles", "bart_summaries", "needs_revision", "approach", "manual_edition", "comments", "source_word_count", "summary_word_count", "source_lexical_diversity", "summary_lexical_diversity"), | ||
num_rows="dynamic" | ||
|
||
) | ||
|
||
# second dataframe with different columns and metrics for evaluation | ||
|
||
uploaded_file = st.file_uploader("Choose a file for summary review (Approach 2)", key="samsum") | ||
if uploaded_file is not None: | ||
|
||
# Read csv as dataframe | ||
dataframe = pd.read_csv(uploaded_file) | ||
dataframe.rename(columns={'Unnamed': 'Row'}, inplace=True) | ||
|
||
# Reorder index from random sample row numbers - optional | ||
# dataframe.reset_index(drop=True) | ||
|
||
# Add empty column for free text comments | ||
dataframe['relevance'] = '' | ||
dataframe['coherence'] = '' | ||
dataframe['consistency'] = '' | ||
dataframe['fluency'] = '' | ||
dataframe['needs_revision'] = '' | ||
dataframe['approach'] = '' | ||
dataframe['manual_edition'] = '' | ||
dataframe['comments'] = '' | ||
|
||
# Display dataframe in app | ||
samsum_edited_df = st.data_editor(dataframe, | ||
column_config={ | ||
"relevance": st.column_config.SelectboxColumn( | ||
"relevance (1-5)", | ||
help="select a relevance score from 1 (lowest) to 5 (highest)", | ||
width="medium", | ||
options=[ | ||
"1", | ||
"2", | ||
"3", | ||
"4", | ||
"5" | ||
], | ||
), | ||
"coherence": st.column_config.SelectboxColumn( | ||
"coherence (1-5)", | ||
help="select a coherence score from 1 (lowest) to 5 (highest)", | ||
width="medium", | ||
options=[ | ||
"1", | ||
"2", | ||
"3", | ||
"4", | ||
"5" | ||
], | ||
), | ||
"consistency": st.column_config.SelectboxColumn( | ||
"consistency (1-5)", | ||
help="select a consistency score from 1 (lowest) to 5 (highest)", | ||
width="medium", | ||
options=[ | ||
"1", | ||
"2", | ||
"3", | ||
"4", | ||
"5" | ||
], | ||
), | ||
"fluency": st.column_config.SelectboxColumn( | ||
"fluency (1-5)", | ||
help="select a fluency score from 1 (lowest) to 5 (highest)", | ||
width="medium", | ||
options=[ | ||
"1", | ||
"2", | ||
"3", | ||
"4", | ||
"5" | ||
], | ||
), | ||
"needs_revision": st.column_config.CheckboxColumn( | ||
"needs_revision", | ||
help="Does the summary column need to be changed?", | ||
width="medium", | ||
default=False, | ||
required=True, | ||
), | ||
"manual_edition": st.column_config.TextColumn( | ||
"manual_edition", | ||
help="Enter new summary", | ||
width="medium", | ||
required=False, | ||
), | ||
"comments": st.column_config.TextColumn( | ||
"comments", | ||
help="Describe reasoning for editing the current assigned Summary value", | ||
width="large", | ||
required=False, | ||
), | ||
}, | ||
# disabled freezes columns so users cannot change the values | ||
disabled=("articles", "bart_samsum_summaries", "source_word_count", "summary_word_count", "source_lexical_diversity", "summary_lexical_diversity"), | ||
hide_index=True, | ||
column_order=("articles", "bart_samsum_summaries", "relevance", "coherence", "consistency", "fluency", "needs_revision", "manual_edition", "comments", "source_word_count", "summary_word_count", "source_lexical_diversity", "summary_lexical_diversity"), | ||
num_rows="dynamic" | ||
|
||
) | ||
|
||
|
||
## Add session state logic so that if Needs Revision is True, other columns are required | ||
# import pandas as pd | ||
# import streamlit as st | ||
|
||
# data_df = pd.DataFrame( | ||
# { | ||
# "widgets": ["st.selectbox", "st.number_input", "st.text_area", "st.button"], | ||
# "favorite": [True, False, False, True], | ||
# } | ||
# ) | ||
|
||
# st.data_editor( | ||
# data_df, | ||
# column_config={ | ||
# "favorite": st.column_config.CheckboxColumn( | ||
# "Your favorite?", | ||
# help="Select your **favorite** widgets", | ||
# default=False, | ||
# ) | ||
# }, | ||
# disabled=["widgets"], | ||
# hide_index=True, | ||
# ) |