Please follow the instructions below to reproduce the procedure of building SceneInstruct.
- Llama-3.1-70B-Instruct: You can download the weights of Llama-3.1-70B-Instruct at HF Repo. To serve Llama-3.1-70B-Instruct with vLLM:
vllm serve <Llama-3.1-70B path> --tensor_parallel_size 2
- OpenAI API key: Create a file
openai_key
and add your API key.
- Deploy Llama-3.1-70B-Instruct following Model Preparation.
- Set
<model-checkpoint-path>
in create_descriptions.py to your model path. - Run the following command:
python create_descriptions.py \ --num-prompts-needed 3000 # the number of new descriptions to be created
- The generated descriptions are saved in
data_prompt.jsonl
by default.
- Deploy the models following Model Preparation.
- Set
<model-checkpoint-path>
in create_descriptions.py to your model path. - Run the following command:
python collect_before_assign_placement.py python collect_assign_placement.py
- The generated SceneInstruct dataset is saved in three files:
data_prompt_assign_placement.jsonl
,data_prompt_check_positional_error.jsonl
, anddata_prompt_fix_positional_error.jsonl
.