Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download the CUB dataset, but there is no train.json in it. #67

Open
michaelwithu opened this issue Jun 21, 2024 · 2 comments
Open

download the CUB dataset, but there is no train.json in it. #67

michaelwithu opened this issue Jun 21, 2024 · 2 comments

Comments

@michaelwithu
Copy link

Download CUB dataset by link :https://data.caltech.edu/records/65de6-vp158
but there is no train.json.

then,links(https://cornell.box.com/v/vptfgvcsplits; https://drive.google.com/drive/folders/1mnvxTkYxmOr2W9QjcgS64UBpoJ4UmKaM?usp=sharing) do not work.

@aba122
Copy link

aba122 commented Jul 28, 2024

I have same issue like you

@arpita-chowdhury-osu
Copy link

It should look like this:

{
   image_name : class_index
}

image

You can follow the below code to generate the json. if you take the whole train set as training data and val as both test and val data. Feel free to randomly take 20% of train data in train.json and 20% in val.json if needed. I didn't cause for cub I don't need to find proper hyperparameters, they did it already.
assuming your dataset for cub looks like this:

  • cub
    • train
      • 1_className
        • image_1
        • image_2
      • 2_className
        • image_1
        • ...
    • val
      • 1_className
        • image_1
import os
import json

def create_json_files(data_dir):
    json_data = {'train': {}, 'val': {}, 'test': {}}

    for split in ['train', 'val']:
        split_dir = os.path.join(data_dir, split)
        for class_name in os.listdir(split_dir):
            class_dir = os.path.join(split_dir, class_name)
            if os.path.isdir(class_dir):
                class_id = int(class_name.split(".")[0]) 
                for img_name in os.listdir(class_dir):
                    img_path = os.path.join(split, class_name, img_name) 
                    json_data[split][img_path] = class_id

    # Create the JSON files
    for split in ['train', 'val']:
        json_file_path = os.path.join(data_dir, f'{split}.json')
        with open(json_file_path, 'w') as f:
            json.dump(json_data[split], f, indent=4)
    
    # For the test set, we'll assume it uses the same format as val
    json_data['test'] = json_data['val']
    test_file_path = os.path.join(data_dir, 'test.json')
    with open(test_file_path, 'w') as f:
        json.dump(json_data['test'], f, indent=4)

    return json_data

dataset_path = "<path_to _your_dataset>"
create_json_files(dataset_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants