SignPuddle

The Swiss-French SignPuddle includes around 5000 entries in SignWriting with an illustration.

Data

We collected the data using collect_data.py.

Since this data is unstructured, we assume all png files are illustrations, and all jpg files are pictographs.

This means that we have multiple types of data:

Ideally, we should be able to classify these images, and apply some "tag" in their prompt. We should not train on corrupted or noise images.