High quality, real world speech datasets
Speech Datasets For Sale
We create speech datasets including transcripts for machine learning purposes. Our service is used for technologies looking to create or improve existing automatic speech recognition models (ASR) using natural language processing (NLP) for select languages and various domains.
Each dataset can be created according to dialect, demographics, domain or any other required conditions.
Speech datasets for select languages and industries are available, or bespoke speech collection projects available on request.
Why Use Way With Words
99%+ Accurate
We produce highly accurate transcripts.
On Time
We complete your transcripts on time.
Data Compliant
We are fully GDPR and DPA 2018 Compliant.
Priority Support
We answer all your questions as a priority.
99%+ Accurate
We produce highly accurate transcripts.
On Time
Data Compliant
Priority Support
Speech Collection Process
How it works
Our speech collection process can be customised to suit your needs.
STEP 1
Request a speech dataset
Submit your speech dataset requirements using our form below. We review your request and send a proposed job and price plan for approval.
STEP 2
We create a speech dataset
On acceptance, we proceed with your job which involves recording your required speech and transcribing it.
STEP 3
Receive speech dataset
On completion, or at agreed intervals, we transfer your speech datasets (recordings with completed transcripts) to you.
Datasets Available for Purchase
Coming in October 2022

English (South African)
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata

Afrikaans
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata

seSotho
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata

isiZulu
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata

isiXhosa
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata

Swahili (Coming Soon)
Hours available:
50 Hours
Specifications:
16kHz .wav recordings
Domains:
Online Retail, Insurance, Travel and Debt Collection
Product Description
Recordings plus annotated transcripts and metadata
Frequently Asked Questions about our
Speech Collection Services
Who uses your Machine Transcription Polishing service?
Our Machine Transcription Polishing service is ideal for large-volume orders on a B2B basis where automatic speech recognition needs proofreading under specific parameters in order for a client (usually a technology company) to test or improve their existing automated speech recognition systems. The service is not targeted at customers that want to improve auto-generated transcripts on a small scale. For this, we recommend using our Standard Transcription service in order to receive 99%+ accuracy guaranteed.
Who edits my machine-generated transcripts?
Once we understand the client’s exact needs and finalise terms of service, we source, assess and contract our MTP transcribers specifically for the client’s job. We ensure the team meets the agreed polishing requirements. Our reputation is based on a very selective process to contract a highly professional MTP transcription team. We match the client’s job requirements with MTP transcribers best suited to ensure client needs are met. For all work, we also provide a dedicated manager to oversee work on a daily basis.
How do you ensure accuracy?
Once the first transcribers are in place we start polishing the machine-generated transcripts. While doing so, we introduce a series of quality control steps in the workflow cycle to ensure all data processes for receiving, processing and returning client work are 100% in accordance with the agreed parameters. We also monitor all the processes to ensure strict adherence to prevailing and client-specific data protection requirements.
Do you sign Service Level Agreements?
For ongoing work, we prefer to work with an SLA. The SLA sets out a clear timetable that includes an initialisation period to set up the required team and logistics for client work. The SLA also covers terms and conditions related to the work and data privacy. If a client requires ongoing work, over an agreed period, Way With Words also usually provides a dedicated MTP team with management oversight, recruitment, selection, assessment, training processes and any other logistical assistance to aid the bespoke requirement.