Dataset Name | Samples |
---|---|
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
Description
1. Overview 1) Natural Scenes Data size : 200,000 images Language: English, French, German, Italian, Portuguese, Russian, Spanish, Japanese, Korean, Indonesian, Malay, Vietnamese, Thai, Turkish, Arabic, Traditional Chinese and etc. Collecting environment : including shop plaque, stop board, poster, ticket, road sign, comic, cover picture, prompt/reminder, warning, packing instruction, menu, building sign, etc. Diversity : including 20 languages, multiple natural scenes, multiple photographic angles (looking up angle, looking down angle, eye-level angle) Device : cellphone, camera Image parameter : the image data format is .jpg, and the annotation file data format is .json Annotation content : line-level quadrilateral bounding box annotation and transcription for the texts Accuracy : the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97% 2) Handwriting Data size : 300,000 images Language: English, French, German, Spanish, Arabic, Italian, Japanese, Korean, Traditional Chinese Collecting environment: pure color background Device: scanner Photographic angle: eye-level angle Data format: the image data format is .png Data content: including address, company name and personal name, each image has 20 writing boxes Accuracy rate: The collection content accuracy is not less than 97% 2. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. The ready-to-go AI & ML Training Data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/ocr?source=Datarade
Country Coverage
(61 countries)Data Categories
- Annotated Imagery Data
- Machine Learning (ML) Data
- Deep Learning (DL) Data
- Object Detection Data
- Computer Vision Data
Pricing
One-off purchase | $10K |
Monthly License |
Not available |
Yearly License |
Not available |
Usage-based |
Not available |
Volumes
- images
- 500K
Does this product fit your data needs?
Get in touch with our team to start unlocking your data solutions.
Request Information