High quality, real world speech datasets

Speech Collection Datasets

We create speech datasets including transcripts for machine learning purposes. Our service is used for technologies looking to create or improve existing automatic speech recognition models (ASR) using natural language processing (NLP) for select languages and various domains.

Each dataset can be created according to dialect, demographics, domain or any other required conditions.

Speech datasets for select languages and industries are available, or bespoke speech collection projects available on request.

 

Speech Collection for AI training

Why Use Way With Words

99%+ accurate transcripts

99%+ Accurate

We produce highly accurate transcripts.

We deliver on time

On Time

We complete your transcripts on time.

We are Data Compliant

Data Compliant

We are fully GDPR and DPA 2018 Compliant.

Client Support

Priority Support

We answer all your questions as a priority.

99%+ accurate transcripts

99%+ Accurate

We produce highly accurate transcripts.

We deliver on time

On Time

We complete your transcripts on time.
We are Data Compliant

Data Compliant

We are fully GDPR and DPA 2018 Compliant.
Client Support

Priority Support

We answer all your questions as a priority.

Speech Collection Process

How it works

Our speech collection process can be customised to suit your needs.

STEP 1

Request a speech dataset

Submit your speech dataset requirements using our form below. We review your request and send a proposed job and price plan for approval.

STEP 2

We create a speech dataset

On acceptance, we proceed with your job which involves recording your required speech and transcribing it.

STEP 3

Receive speech dataset

On completion, or at agreed intervals, we transfer your speech datasets (recordings with completed transcripts) to you.

Frequently Asked Questions about our

Speech Collection Services

Who uses your Machine Transcription Polishing service?

Our Machine Transcription Polishing service is ideal for large-volume orders on a B2B basis where automatic speech recognition needs proofreading under specific parameters in order for a client (usually a technology company) to test or improve their existing automated speech recognition systems. The service is not targeted at customers that want to improve auto-generated transcripts on a small scale. For this, we recommend using our Standard Transcription service in order to receive 99%+ accuracy guaranteed.

Who edits my machine-generated transcripts?

Once we understand the client’s exact needs and finalise terms of service, we source, assess and contract our MTP transcribers specifically for the client’s job. We ensure the team meets the agreed polishing requirements. Our reputation is based on a very selective process to contract a highly professional MTP transcription team. We match the client’s job requirements with MTP transcribers best suited to ensure client needs are met. For all work, we also provide a dedicated manager to oversee work on a daily basis.

How do you ensure accuracy?

Once the first transcribers are in place we start polishing the machine-generated transcripts. While doing so, we introduce a series of quality control steps in the workflow cycle to ensure all data processes for receiving, processing and returning client work are 100% in accordance with the agreed parameters. We also monitor all the processes to ensure strict adherence to prevailing and client-specific data protection requirements.

Do you sign Service Level Agreements?

For ongoing work, we prefer to work with an SLA. The SLA sets out a clear timetable that includes an initialisation period to set up the required team and logistics for client work. The SLA also covers terms and conditions related to the work and data privacy. If a client requires ongoing work, over an agreed period, Way With Words also usually provides a dedicated MTP team with management oversight, recruitment, selection, assessment, training processes and any other logistical assistance to aid the bespoke requirement.

Bespoke Speech Collection Projects Completed

Our Speech Collection service is used by clients to improve speech recognition and voice recognition technologies, services or platforms. Speech datasets are required to support and enable acoustic modelling and automated speech recognition.

Afrikaans Call Recording

Afrikaans

Scottish Accented English Speech Collection

English (Scottish Dialect)

United Kingdom Accented English Speech Collection

English (UK Dialect)

United Kingdom Accented English Speech Collection

English (UK Expats)

Irish Accented English Speech Collection

English (Irish Dialect)

Australian Accented English Speech Collection

English (UK Dialect)

Datasets Available for Purchase

Coming in October 2022

Speech Collection

English (South African)

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata

Speech Collection

Afrikaans

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata

Speech Collection

seSotho

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata

Speech Collection

isiZulu

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata

Speech Collection

isiXhosa

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata

Speech Collection

Swahili (Coming Soon)

Hours available:
50 Hours

Specifications:
16kHz .wav recordings

Domains:
Online Retail, Insurance, Travel and Debt Collection

Product Description
Recordings plus annotated transcripts and metadata