39993-Images-OCR-Data-of-Internet-Image

Description

39,993 Images – OCR Data of Internet Image. The collecting scenes of this dataset include subtitle, advertisement, cellphone screenshot, comic, emoticon, poster, magazine cover, etc. The language distribution is Chinese, English (a few). For annotation, line-level rectangular bounding box annotation and transcription for the texts were adopted for the internet images (column-level quadrilateral bounding box annotation and transcription for the texts were adopted for small amount of data). The dataset can be used for OCR tasks of internet images.

For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/171?source=Github

Data size

39,993 images, 227,910 bounding boxes

Collecting environment

including subtitle, advertisement, cellphone screenshot, comic, emoticon, poster, magazine cover etc.

Data diversity

including multiple types of internet images

Language distribution

Chinese, English (a few)

Data format

the image data format is .jpg, the annotation file format is .json

Annotation content

line-level rectangular bounding box annotation and transcription for the texts (column-level quadrilateral bounding box annotation and transcription for the texts were adopted for small amount of data)

Accuracy

the error bound of each vertex of a rectangular bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97%

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

39993-Images-OCR-Data-of-Internet-Image

Description

Data size

Collecting environment

Data diversity

Language distribution

Data format

Annotation content

Accuracy

Licensing Information

About

Releases

Packages

Contributors 2

Nexdata-AI/39993-Images-OCR-Data-of-Internet-Image

Folders and files

Latest commit

History

Repository files navigation

39993-Images-OCR-Data-of-Internet-Image

Description

Data size

Collecting environment

Data diversity

Language distribution

Data format

Annotation content

Accuracy

Licensing Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages