Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to appropriately register data builder? #28

Open
Tony363 opened this issue Jun 16, 2024 · 0 comments
Open

How to appropriately register data builder? #28

Tony363 opened this issue Jun 16, 2024 · 0 comments

Comments

@Tony363
Copy link

Tony363 commented Jun 16, 2024

Hi,

I am trying to finetune MiniGPT4-Video on my custom dataset. I could not seem to register my own data builder so I modified the Registry method like below.

    @classmethod
    def get_builder_class(cls, name):
        from minigpt4.datasets.builders.image_text_pair_builder import EngageNetBuilder
        return cls.mapping["builder_name_mapping"].get(name, EngageNetBuilder)

I have the below in the datasets/builders/image_text_pair_builder.py

@registry.register_builder("engagenet")
class EngageNetBuilder(BaseDatasetBuilder):
    train_dataset_cls = EngageNetDataset 

    DATASET_CONFIG_DICT = {
        "default": "configs/datasets/engagenet/default.yaml",
    }
    print(DATASET_CONFIG_DICT)

    def build_datasets(self):
        # download, split, etc...
        # only called on 1 GPU/TPU in distributed
        self.build_processors()

        build_info = self.config.build_info # information from the config file
        datasets = dict()

        # create datasets
        dataset_cls = self.train_dataset_cls
        datasets['train'] = dataset_cls(
            vis_processor=self.vis_processors["train"], # Add the vis_processor here
            text_processor=self.text_processors["train"], # Add the text_processor here
            vis_root=build_info.vis_root, # Add videos path here
            ann_paths=build_info.ann_paths, # Add annotations path here
            subtitles_path=build_info.subtitles_path, # Add subtitles path here
            model_name='mistral' # Add model name here (llama2 or mistral)
        )

        return datasets

How to appropriately register data builder?

@Tony363 Tony363 closed this as completed Jun 16, 2024
@Tony363 Tony363 changed the title What does self.data_type do? How to appropriately register data builder? Jun 16, 2024
@Tony363 Tony363 reopened this Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant