Unlimited, automated, highly accurate captions.

Single Speaker Captioning

Built for volume, powered by automation, tuned for accuracy, at a fixed cost.

Our Single Speaker Captioning solution is a significant step forward for institutions, professionals, or other video users who record video regularly and are looking for a solution with the highest accuracy at a fixed monthly price. Unlike other providers, users can now caption their voice without limit by signing up to one of our annual or tiered licences. This can lead to a significant cost saving where users regularly record videos featuring themselves and require highly accurate captions or subtitles.

Video Captioning Services

Automatic Captioning Process

How does it work?

We have co-developed a new service offering with our technology partner, Saigen, whereby a unique automated speech recognition model can be provisioned against a speaker’s voice.

Once created, this model offers high levels of accuracy that improve over time and outperform any available off-the-shelf models.

This service offering provides the best of all worlds: scalability of accurate captions at a fraction of the cost of a human or hybrid service. The automated transcript and caption files generated from the Single-Speaker Model can be used in a variety of ways and allow for the generation of ancillary language services such as translation, which would enable multilingual learning.
This provides an opportunity for institutions to offer exponential value at scale.


Upfront Model Training

Using highly accurate human-based transcription we train a speaker-specific AI model for every presenter/lecturer on your team.


Upload Unlimited Video

Once trained, you can process as many videos as you like on our workflow system or connect your video recording systems directly via API.


Collect Captions

Once processed, transcripts and captions are available for collection via our secure platform or API.


Accuracy gained on average over existing STT models.

When training speaker-specific models with accurate human transcription, we’ve realised maximum gains of 33% accuracy and an average gain of 15.4% over existing STT models.

Captioning Process

Key Benefits

Why choose our Single-Speaker Captioning Model


Unlimited captioning on a per speaker basis. Competitors are largely price-driven by a rate by unit of time (per minute, per hour).


Competitor pricing continues to increase exponentially as volumes increase, leading to unpredictable budgets or resource constraints.


Captions processed within minutes.
Most competitors that serve clients who require accurate captions must apply a hybrid or human in the loop solution often leading to long delays with the final video.


Most competitors offer off-the-shelf solutions that are restrictive or may offer custom solutions at huge expense.
We work with you from inception to execution to produce a solution to your specifications. 

“With accuracy built-in through our human and technology partnership, there is no other comparable service for price to value in the Captioning space at present.”

Graham Morrissey

COO Way With Words

Graham Morrissey

Faculty-wide Models

While the Single-Speaker Model offers best value for university-wide captioning, an alternative approach, such as a general academic model or faculty-wide model(s), are also possible. While these would be less accurate than a model trained for a single speaker, they would still enable accuracy at scale to a higher degree than any other available automated options.

For a more scalable solution that is still accurate, the Single-Speaker Captioning model is highly recommended for adoption. To support considerable overheads that go into transcribing data that is used to train, deploy, host, and process unique speaker models that are available 24/7 via API, a volume-based licence arrangement is necessary to ensure demands can be catered for and met at a large scale.


Find out more about

Unlimited Captioning

Contact our team using the form or find our frequently asked questions below.

How accurate are your automatic captions?

We’ve seen a significant improvement in the accuracy of automatic captions created using models trained on single speaker data comprising of recordings in a range of conditions. In most cases we’ve seen accuracy of 95%+ with our best models reaching 98% and beyond.

Are there any upfront costs?

No. All per-licence training costs for models based on each individual speaker is included in your monthly fee. A transfer cost for retraining of individual speaker models will be applicable in cases where licences need to be switched out.

Can we use this for our entire school/university?

Yes, depending the tier of license that is selected. In instances where individual licenses are not applicable, our faculty-wide models will offer a suitable alternative.  The service is designed to enable institution-wide captioning. The more users onboarded the bigger the saving.

How do I get started?

Contact us today by completing the form so that we may learn more about your organisation and discuss your unique requirements.