Video Annotation Services | AI-assisted Labeling | Computer Vision Data | Video Labeling for AI & ML | Annotated Imagery Data
Nexdata provides high-quality Annotated Imagery Data annotation for video classification, timestamps, video tracking and detection.
Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data
Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.
Unscripted Call Center Telephony Speech Data | 20,000 Hours |Speech Recognition Data| Speech AI Datasets
Off-the-shelf 20,000 hours Unscripted Call Center Telephony Speech Data, covering 30+ languages including English, German, French, Spanish, Italian, Portuguese, Korean, Japanese, Hindi, Arabic and etc. It covers multiple domains like finance, real-estate, sale, health, insurance, and telecom.
Test Questions Data | 50 Millions | Foundation Model | Unsupervised Text Data | Large Language Model(LLM) Data
Off-the-shelf 50 Million Test Questions Text Parsing And Processing Data. Each question contains title, answer, parse, subject, grade, question type; The educational stages cover primary, middle, high school, and university; Subjects cover mathmatics, biology, accounting, etc.
Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets
Speech Synthesis speech data is recorded by native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Speech Synthesis Data Collection Service | 50+ Languages Resources | Numerous Voice Sample | TTS Data | Audio Data | Deep Learning (DL) Data
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data
Nexdata is equipped with professional recording equipment and has resources pool of 70+ countries and regions, and provide various types of speech recognition data collection services for Machine Learning (ML) Data.
Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data| Speech Recognition Data | Machine Learning (ML) Data
Off-the-shelf Scripted Monologues Speech Datasets cover 100+ languages. All the Machine Learning (ML) Data are collected from native speakers, with signed authorization agreement.The recording contents include economics, entertainment, news, oral, figure, letter, etc
Real-world Casual Conversation and Monologue Speech Data | 20,000 Hours | Spontaneous Speech |Audio Data
Off-the-shelf 20,000 hours of Real-world Casual Conversation Speech data, covering 30+ languages. Covering diverse domains like self-media, conversations, live streams, and variety shows, the data reflects authentic, real-world interactions.
Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI Datasets
Off-the-shelf Re-ID data is collected from real surveillance scenes. The Identity Data diversity includes different age groups, different time periods, different shooting angles, different human body orientations and postures, clothing for different seasons.