English Speech Datasets
This speech data collection was planned, collected, annotated and curated with natural language processing best practice in mind. Machine learning and speech recognition rely on unbiased, fully representative datasets, which is why Way With Words speech collection focuses on collecting speech data from the widest range of demographic elements and most equal split between gender distribution as possible.
Applications of NLP in AI demand that speech recognition training data be qualified, structured, and represented to suit machine learning in speech processing. We believe that we’ve collected useful baseline datasets to benchmark effective improvements in the accuracy of existing speech to text models. Speech recognition training data can also, of course, be commissioned on a bespoke basis to suit any conventions, needs, domains, and languages that may be required.
Data Set Details
18 – 69
Number of speakers
South African English
Age Range Distribution
Recorders per age group
[18 – 29]: 27 Recorders
[30 – 40]: 28 Recorders
[50 – 69]: 8 Recorders
Gender Split Across Recorded Hours
Women: 28 Recorders
Men: 35 Recorders
Hours Collected Across Domains
Runtime per domain
Debt Collection: 12:22:25
Gender Split of English Call Recorders Across Domains
Gender Split of English Call Recorders
Education Level Distribution of English Call Recorders
Geographical Distribution of English Call Recorders
Frequently Asked Questions about our
Speech Collection Services
How are your dataset recordings structured?
Our off-the-shelf dataset collections comprise of unscripted, natural conversations that are conducted by call recorders recruited, trained, and approved to simulate real-world conversations in common domains. This means recordings and transcripts include routine security verifications such as ID, email, and phone number validation.
How do you recruit for Speech Collection datasets?
Our priority is to create datasets that are unbiased and cover as wide a range of demographics as possible. This is the first consideration when we begin the planning and recruitment process of any Speech Collection dataset project.
What kind of agreement is in place for the purchase of this Speech Collection dataset?
A Licence Agreement governs the sale and usage of this Speech Collection dataset. Our off-the-shelf options are available for clients to test and benchmark before larger, more custom commitments can be considered that are better suited to client requirements and conventions.
Why consider Way With Words for Speech Collection datasets?
Way With Words has produced thousands of hours of bespoke Speech Collection datasets, which are unfortunately not available under Licence Agreement. This off-the-shelf dataset was created to evidence our abilities as we believe we can offer tremendous value on custom collections delivered exactly to client specification.