quora

WAY WITH WORDS

Speech Collection

Speech Datasets

Bespoke Speech Recordings With Matching Transcripts

We provide a speech capture and transcription solution for machine learning purposes. Our service is used for the production of bespoke voice and text datasets for various domains. Each dataset can be created according to language, dialect, demography, speaker type, industry, subject matter or any other required condition.  

Z

99%+ Accurate

Speech recordings and transcripts to specification.

i

Bespoke Data

Datasets collected remain your property.

~

Data Compliant

We are fully GDPR and DPA 2018 Compliant.

Client Support

Available to answer questions as a priority.

Speech Dataset

 Speech Recording With Transcript: Dataset Sample

 

RECORDING

The domain sample is in the Afrikaans language. The scenario is general conversational. Single-channel recording of conversation at 8 kHz.

Recordings are made using a live recording process and system with select participants who are contracted according to the requirements of the speech collection job.

TRANSCRIPT

The transcript is in Microsoft Word .docx file format with time-codes displayed precisely for set voice annotations.

Other file formats can be discussed.

DATASET

The dataset comprises the audio recordings and their matching transcripts to the specifications requested.

SPEECH COLLECTION FAQs +

WHICH LANGUAGES DO YOU OFFER FOR SPEECH COLLECTION?

We provide speech collection for almost all English-language dialects. Other languages are considered on a case-by-case basis.

WHAT VOLUME CAN I ORDER?

Requests for speech collection usually range from 100 to 5,000 audio hours of recording and transcription per dataset.

Large volume orders require an initialisation stage to scale up the job requirements. This is discussed once we have your estimations.

HOW DO YOU RECORD SPEECH?

Our speech capture unit records first-person speech according to the requirements of the client. This includes consideration of the channels required, kHz requested and human subject matter and target language and dialects.

Our transcribers then transcribe the collected recordings.

HOW DO I RECEIVE MY SPEECH COLLECTION DATASET?

You will receive the recordings usually in Wav (or another format if requested), as well as full transcripts of each recording in MS Word .docx (or another format if requested).

WHO RECORDS AND TRANSCRIBES MY SPEECH DATASET?

We have an in-house team to set up, source and collect speech recordings to order.

We then process the speech recordings through a second transcription team specifically identified for the job.

Our reputation is based on a very selective process to ensure that we source, vet, assess and contract a highly professional team for both recording and transcribing the client's speech dataset. We match the client's job requirements with the best possible skilled transcribers to ensure your bespoke speech recordings are transcribed with 99%+ accuracy.

We set very demanding quality control standards for each client project. We also vet our transcribers frequently to ensure a secure ID.

DO YOU SIGN ANY SERVICE LEVEL AGREEMENT (SLA)?

For a short-term, once-off project, no SLA is required.

For ongoing work, we prefer to work with an SLA. The SLA sets out a clear timetable that includes an initialisation period to set up the required team and logistics for your work. The SLA also covers terms and conditions related to the work and data privacy.

If a client requires ongoing work over an agreed period, Way With Words usually provides a dedicated speech capture and transcription team with management oversight, recruitment, selection, assessment, training processes and any other logistical assistance to aid the client's bespoke requirement.

DO YOU SELL MY SPEECH COLLECTION DATASETS TO ANYONE ELSE?

No. All your files remain your property and are not shared or sold on to anyone.

We process highly confidential matters on a daily basis for our global clients. We take numerous measures to ensure client confidentiality, including:

  • Client-tailored Non-Disclosure Agreements.
  • Employment contracts for Way With Words’ staff, management and contractors which include confidentiality clauses.
  • Password protection and encryption.
  • Secure server for uploading.

Where required, we also ensure our contractors and staff sign the relevant data protection or Official Secrets Act of the client country.

IS WAY WITH WORDS DATA COMPLIANT?

We have, on many occasions, carried out extremely sensitive work for various government agencies, financial institutions and research marketing companies. We fulfil a number of data compliance requirements, including those prescribed by the GDPR and the DPA 2018. For more information please see the footer below.

WHO ARE YOUR TRANSCRIBERS?

We receive thousands of transcriber applications monthly through our dedicated website job portal. Our transcribers are chosen after a rigorous process of testing and training on their abilities to research, understand and comply with our very strict operational, quality control and knowledge requirements.

HOW ARE YOUR TRANSCRIBERS CONTRACTED?

Our transcribers are required to sign strict Confidentiality Agreements with us. Other conditions of work are considered on a case-by-case basis with the client if required.

HOW MUCH DOES IT COST?

The total price will reflect a price per:

  • Audio minute of recording.
  • Audio minute of transcribing.

A final costing is prepared and presented once we have all your job requirements.

Price per audio minute will be influenced by the total volume, the format requirements and the duration of the project.

Complete our Speech Collection Request Form below to request pricing.

MORE QUESTIONS? Visit our FAQ page.

 

How It Works

 Speech Collection Process

 

speech collection - how it works

↑ REQUEST A SPEECH DATASET

Submit your speech dataset requirements using our form below. We review your request and send a proposed job and price plan for approval.

→ WE CREATE A SPEECH DATASET

On acceptance, we proceed with your job which involves recording your required speech and transcribing it.

↓ RECEIVE SPEECH DATASET

On completion, or at agreed intervals, we transfer your speech datasets (recordings with completed transcripts) to you.

Rates & Pricing

Complete Our Form Below To Request A Quote

SPEECH COLLECTION FORM

Use Cases

Speech Collection Use Cases

Speech Datasets

Our Speech Collection service is used by clients to improve speech recognition and voice recognition technologies, services or platforms. Speech datasets are required to support and enable various machine learning processes (MLP) to improve language and dialect capabilities.

View some of our use cases »

 

Sample Language Dataset - Afrikaans

Speech Collection dataset for the Afrikaans language.

The client required recordings of first-language speakers of Afrikaans on an 8 kHz channel with complete transcripts of each speaker and channel. Topics were identified and speakers were asked to discuss each of these within specific time limits. Themes for discussion included politics, sport and various social topics.

Sample Dialect Dataset - Welsh

Speech Collection dataset for Welsh.

The client required a series of corrections to provide highly accurate transcripts of first-language Welsh speakers pre-recorded on an unclear 8 kHz channel. Speakers were indistinct and no specific subject matter was provided. The client received the required datasets with much-improved accuracy.