WebAishell (SLR33): includes about 178 hours of Mandarin speech data recorded in a quiet indoor environment; Free ST Chinese Mandarin Corpus (SLR38): include 102600 utterances rescored in silent indoor environments using cellphones; Primewords Chinese Corpus Set 1 (SLR47): includes about 100 hours of Mandarin speech data recorded by … WebNov 21, 2024 · MAGICDATA Mandarin Chinese Read Speech Corpus. Magic Data技术有限公司的语料库,语料库包含755小时的语音数据,其主要是移动终端的录音数据。邀请来自中国不同重点区域的1080名演讲者参与录制。句子转录准确率高于98%。录音在安静的室内 …
免费中文语音数据集 - 哔哩哔哩
WebAug 22, 2024 · They include 新闻语料 (news corpus) 8GB, 社区互动-语料 (social interaction corpus) 3GB, 维基百科-语料 (Wikipedia corpus) 1.1GB, 评论数据-语料 (comment data corpus) 2.3GB. The other large corpus I'm aware of is the Leiden Weibo Corpus (download from here ) which "consists of 5,103,566 messages posted on Sina Weibo in ... WebDec 19, 2024 · 1. Free ST Chinese Mandarin Corpus. The corpus was recorded indoors in a quiet environment by using a mobile phone. It has 855 speakers. Each speaker has 120 words. All the words have been carefully transcribed and checked. Ensure transcription accuracy. The corpus contains audio file, transcription and metadata. 2. Primewords … hermanus fm
Free ST Chinese Mandarin Corpus-LingLab
http://www.openslr.org/47/ WebThe Chinese Web Corpus ( zhTenTen) is a Chinese corpus made up of texts collected from the Internet. The corpus belongs to the TenTen corpus family which is a set of the web corpora built using the same method with a target size 10+ billion words. Sketch Engine currently provides access to TenTen corpora in more than 30 languages. Web1.Free ST Chinese Mandarin Corpus. 1)基本信息: 参与者:855人. 这个语料库是用手机在室内安静的环境中录制的。它有855个speakers。每个演讲者有120个话语。所有的话 … mavis discount tire freehold nj