Please follow the Dataset Access section of the README.md to prepare the data, and run the preprocessing.py
script as instructed. Ensure that the structure of the ./data
directory is as shown below:
GUI-Odyssey
├── data
│ ├── annotations
│ │ └── *.json
│ ├── screenshots
│ │ └── *.png
│ ├── splits
│ │ ├── app_split.json
│ │ ├── device_split.json
│ │ ├── random_split.json
│ │ └── task_split.json
│ ├── format_converter.py
│ └── preprocessing.py
└── ...
Next, run the following command to generate chat-format data for training and testing. The his_len
parameter can be set to specify the length of historical information:
cd data
python format_converter.py --his_len 4
The OdysseyAgent is bulit upon Qwen-VL.
Before running, set up the environment and install the required packages:
cd src
pip install -r requirements.txt
Next, initialize OdysseyAgent
using the weights from Qwen-VL-Chat
:
python merge_weight.py
Further, we also provide four variants of OdysseyAgent:
Each fine-tuned on Train-Random
, Train-Task
, Train-Device
, and Train-App
respectively.
Specify the path to the OdysseyAgent
and the chat-format training data generated in the Data preprocessing
stage (one of the four splits) in the script/train.sh
file. Then, run the following command:
cd src
bash script/train.sh
Specify the path to the checkpoint and dataset split (one of app_split
, device_split
, random_split
, task_split
) in the script/eval.sh
file. Then, run the following command:
cd src
bash script/eval.sh