Welcome to YobiYoba, a speech recognition service to transcribe audio and video recordings.

Transcription made easy

Automatic conversion of speech into text saves time and effort.
Upload audio or video files and specify the language; we'll convert them into time-coded transcripts that you can edit.

A 4-STEP PROCESS

Get your transcripts in 4 easy steps

Upload your file(s)

You can upload audio or video files in any format from any source

Specify the language and process the file(s)

You can provide textual information about your data to help the automatic processing

Verify & edit the transcripts

You can verify the transcripts (synchronized with the audio) and modify them

Download transcripts

You can choose from a wide selection of transcript formats (SRT, XML, RTF, PDF, ...)

Main features

YobiYoba™ service identifies the audio segments containing speech, then it recognizes the language being spoken if it is not known and specified a priori, and converts the speech segments to time-coded text. The processing results in a fully annotated document containing speech and non speech segments, speaker labels, words with time codes, high quality confidence scores, and punctuation marks.

Pricing

A simple pay as you go pricing scheme

From 0.01 € (excl. tax) _/Min

No minimum fee per file
Transcription credit never expire
You only pay for the amount of transcribed speech, not for the file duration
No extra fee to manage, edit, or convert your transcripts
Export your transcriptions in various text and subtitling formats (PDF, DOC, RTF, CSV, SRT, VTT, ...)
Credit your account with Paypal, Stripe, Credit Card or a coupon
Pricing is degressive according to the amount of purchased time. From 0.08 € to 0.01 € per minute (excl. tax).

All prices include EU VAT

Speech to text conversion transforms spoken words into written texts. YobiYoba voice-to-text conversion process is done in 3 steps. First our software identifies the audio segments containing speech, then it recognizes the language being spoken if it is not known a priori, and finally it converts the speech segments to time-coded texts. The transcription result is an XML document which we then convert on-demand to various text and subtitling formats including PDF, RTF, CSV, SRT or VTT.

It is important to understand that like any other pattern recognition technology, speech transcription cannot be error free. We therefore provide an editing tool for you to manually modify or correct the automatic transcripts. You can also help the transcription process by providing a list of uncommon words which are specific to your data (such as proper names). To get even better results, you can also provide some plain text which is closely related to the audio data.