Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more information about dataset #110

Open
IcyFeather233 opened this issue Jun 27, 2024 · 1 comment
Open

Add more information about dataset #110

IcyFeather233 opened this issue Jun 27, 2024 · 1 comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@IcyFeather233
Copy link
Contributor

In testenv.yaml configuration, train_url and test_url is configured.

But no document is about how these two data files should be prepared.

For example, what data format is needed? I think if there is a example, it will be better for people new to this project to use.

The document should cover TXT, CSV, JSON, which is in load data class

@IcyFeather233
Copy link
Contributor Author

By the way, I think the JSONDataParse class is not standard, it seems that it's only used for coco, which is a image dataset, I think it should be a universal json data parse class:


class JSONDataParse(BaseDataSource, ABC):
    """
    json file which contain Structured Data parser
    """

    def __init__(self, data_type, func=None):
        super(JSONDataParse, self).__init__(data_type=data_type, func=func)

    def parse(self, *args, **kwargs):
        DIRECTORY = "train"
        LABEL_PATH = "*/gt/gt_val_half.txt"
        filepath = Path(*args)
        self.data_dir = Path(Path(filepath).parents[1], DIRECTORY)
        self.coco = COCO(filepath)
        self.ids = self.coco.getImgIds()
        self.class_ids = sorted(self.coco.getCatIds())
        self.annotations = [self.load_anno_from_ids(_ids) for _ids in self.ids]
        self.x = {"data_dir": self.data_dir, "coco": self.coco, "ids": self.ids, "class_ids": self.class_ids, "annotations": self.annotations}
        self.y = [f for f in self.data_dir.glob(LABEL_PATH)]

    def load_anno_from_ids(self, id_):
        im_ann = self.coco.loadImgs(id_)[0]
        width = im_ann["width"]
        height = im_ann["height"]
        frame_id = im_ann["frame_id"]
        video_id = im_ann["video_id"]
        anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
        annotations = self.coco.loadAnns(anno_ids)
        objs = []
        for obj in annotations:
            x1 = obj["bbox"][0]
            y1 = obj["bbox"][1]
            x2 = x1 + obj["bbox"][2]
            y2 = y1 + obj["bbox"][3]
            if obj["area"] > 0 and x2 >= x1 and y2 >= y1:
                obj["clean_bbox"] = [x1, y1, x2, y2]
                objs.append(obj)

        num_objs = len(objs)
        res = np.zeros((num_objs, 6))

        for ix, obj in enumerate(objs):
            cls = self.class_ids.index(obj["category_id"])
            res[ix, 0:4] = obj["clean_bbox"]
            res[ix, 4] = cls
            res[ix, 5] = obj["track_id"]

        file_name = im_ann["file_name"] if "file_name" in im_ann else "{:012}".format(id_) + ".jpg"
        img_info = (height, width, frame_id, video_id, file_name)

        del im_ann, annotations

        return (res, img_info, file_name)

@MooreZheng MooreZheng added kind/feature Categorizes issue or PR as related to a new feature. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants