Logo of Nexdata Storefront Contact Us
Back

Nexdata | Multilingual Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data

Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.

Request Information

Description

1. Specifications Format: 16k Hz, 16 bit, wav, mono channel Content category: Dialogue or monologue in several common domains, such as daily vlogs, travel, podcast, technology, beauty, etc Language: English(USA, UK, Canada, Australia, India, Philippine, etc.), French, German, Japanese, Arabic(MSA, Gulf, Levantine, Egyptian accents, etc.), etc. Recording condition: Mixed(indoor, public place, entertainment,etc.) 2. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of speech data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade

Country Coverage

(47 countries)
Africa (6)
Asia (8)
Australia (2)
Europe (16)
North America (7)
South America (8)

Data Categories

  • Machine Learning (ML) Data
  • Deep Learning (DL) Data
  • Audio Data
  • Large Language Model (LLM) Data
  • Speech Data

Pricing

Starts at
$20K
One-off purchase
$20K
Monthly License
Not available
Yearly License
Not available
Usage-based
Not available

Volumes

hours
1M

Does this product fit your data needs?

Get in touch with our team to start unlocking your data solutions.

Request Information