Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.48 KB

dataset.md

File metadata and controls

33 lines (26 loc) · 1.48 KB

Build SceneInstruct Dataset

Please follow the instructions below to reproduce the procedure of building SceneInstruct.

Model Preparation

  • Llama-3.1-70B-Instruct: You can download the weights of Llama-3.1-70B-Instruct at HF Repo. To serve Llama-3.1-70B-Instruct with vLLM:
    vllm serve <Llama-3.1-70B path> --tensor_parallel_size 2
  • OpenAI API key: Create a file openai_key and add your API key.

Create Scene Descriptions with Evol-Instruct

  1. Deploy Llama-3.1-70B-Instruct following Model Preparation.
  2. Set <model-checkpoint-path> in create_descriptions.py to your model path.
  3. Run the following command:
    python create_descriptions.py \
        --num-prompts-needed 3000 # the number of new descriptions to be created
  4. The generated descriptions are saved in data_prompt.jsonl by default.

Collect SceneGenAgent Trajectories

  1. Deploy the models following Model Preparation.
  2. Set <model-checkpoint-path> in create_descriptions.py to your model path.
  3. Run the following command:
    python collect_before_assign_placement.py
    python collect_assign_placement.py
  4. The generated SceneInstruct dataset is saved in three files: data_prompt_assign_placement.jsonl, data_prompt_check_positional_error.jsonl, and data_prompt_fix_positional_error.jsonl.