Friday, October 21, 2016

Synchronizing text and audio - time alignment

We have implemented automatic and language independent text to audio alignment. What is it all about? In short, you upload your audio and text and you get the subtitles. See the pictures below.

What is the time alignment? The alignment is a process attaching time stamps to a transcript or text according to the audio. Usually the text is without timing. A sentence, a paragraph, a page of text does not have any timing information. But when you have an audio attached to this text -- means the audio contains a speech -- you may want to add the timing information to the text. To know when a particular word was spoken in the audio. You can imagine the process as making subtitles (time aligned sentences) from your text and audio.

The good news is that alignment can be done automatically and in our case even language independently.

What is the difference to automatic speech transcription? Well, in case you have only audio and want to get text (transcript, verbatim), you have to use automatic speech transcription -- to convert audio into text. But there are cases when you already have the transcript. Some examples might be:
  • You wrote a script for a lecture, talk, pitch, or news. You "read" the text and got a recording. Now you want to make subtitles. It is waste of time and money to transcribe your speech again. Use the aligner.
  • You asked someone to have transcribed your audio and then you got only plain text.  But you found subtitles useful later. Just align the previous transcript to your audio using our aligner.
  • Can also be useful for e-book to audio-book alignment. 
Here you just need to put your audio and text into and you will get aligned text (like subtitles). Plus you can use our editor to edit the text, timings, etc. You can use our API.

The largest advantage of our approach is in its independence from the language. It should work reasonably well for any language.  

This technology has also some caveats. It expects that the speech in the audio and the text fully matches. Assuming that you have part of your speech untranscribed or some notes in the text not spoken in the audio, the technology does its best to align it. So in such cases there can appear time shifts near these regions.

Thursday, August 25, 2016

Capitalization and punctuation

When processing your recordings with our Speech-To-Text technology for English language, you get transcript containing capital letters and basic punctuation. This makes the subtitles easier to read. Sure, there is still a lot of space for improvement. However, we believe that you will like this new feature. Give it a try at

Wednesday, October 14, 2015

SpokenData runs on HTTPS

Since last week, the entire SpokenData website has been using HTTPS, so not just only for authentication and payments as previously. All HTTP requests are automatically redirected to HTTPS ones. We try to keep security pretty tight.

Tuesday, September 29, 2015

Times Digital: QuickQuote

QuickQuote is a web application helping users to select video quotes from a video and embed them in an article. SpokenData API is used to generate video transcription. This project is maintained by Times Digital and Pietro Passarelli.
You can also get to know more about the tool at

Thursday, June 18, 2015

Automatic Speech Transcription in English, Russian, Chinese, Spanish, Czech, Slovak, ...

SpokenData automatically transcribes recordings in quite a number of language and more are about to come. Just upload your recordings and select the language. Dou you have some specific domain of audio such as TV news, lectures and so forth? Then you can also select from specifically trained recognizers that should generate better transcription. If you have lots of data to transcribe, contact us and we will train a recognizer to get the best transcription of your data.

Wednesday, February 18, 2015

Онлайн перевод устной речи в текст - теперь с поддержкой русского языка!

Мы добавили на наш сайт поддержку русского языка, и теперь Вы можете расшифровывать записи с речью на русском! Просто залогиньтесь на SpokenData, загрузите свою запись и получите автоматическую письменную расшифровку текста на записи совершенно бесплатно! Вы также можете указать путь к вашей записи с помощью ссылки на YouTube, Vimeo или любой другой онлайн хостинг. Наша программа скачивает данные и конвертирует аудио в текст за считанные минуты. Когда расшифровка закончена, Вы получите оповещение по электронной почте. После этого Вы можете вностить изменения в текст с помощью нашего онлайн-редактора.

Для разработчиков программного обеспечения мы предоставляем простой в употреблении API.

Вы все еще не зарегистрированы на Пройдите быструю регистрацию здесь и откройте для себя возможности расшифровки Ваших аудио меньше, чем за минуту!

Saturday, February 7, 2015

Russian voice to text online service.


We added support of a new language - Russian. So, you can process any of your recordings in Russian now. Just log on SpokenData, submit your data and get automatic text transcript for free in few minutes. Another option is to provide us with URL of YouTube, Vimeo or other on-line services where your data is. We download the data and convert them into text quickly. You are notified by email when the conversion of audio into text is done. You can also edit the transcript yourself in our web editor later.
If you are a developer, feel free to integrate our API. It's easy.

You do not have account yet? Just register here and you can process your data in 1 minute!

Saturday, January 31, 2015

American Spanish speech to text for free!


We are happy to announce that we support American Spanish now. So, if you have any voice recordings, you can process them in SpokenData to get automatic text transcript for free now. As our service is in cloud, it is very easy for you to get the text. Just take you audio or video files in Spanish and upload them.
The second option is to provide us with URL of YouTube, Vimeo or other on-line services. We download the data and convert them into text quickly. You are norified by email when the conversion is done. You can also edit the transcript yourself in our web editor later. If you are a developer, feel free to integrate our API. It's easy.

You do not have account yet? Just register here and you can process your data in 1 minute!

Friday, January 23, 2015

Download recording video, audio and subtitles

SpokenData users can now download processed recording video in mp4, audio in mp3 and recording subtitles in a variety of formats. The files are accessible through the download menu or SpokenData API.
  • SRT - SubRip text file format
  • TRS - used in Transcriber
  • WebVTT - The Web Video Text Tracks Format

Monday, January 19, 2015

Set deadline for your transcription

Do you create/edit the transcription yourself or have a team of annotators? SpokenData has a new handy feature that might help you finish the transcription process in time. From now on, you can select the deadline for each processed recording. Just click on the menu button and select the Set deadline item.

Then, you can order your recordings by the deadline value and see the recordings which should be finished soon. The deadline information can appear in 3 different colors:

  • red: deadline has already passed
  • orange: deadline will pass within 24 hours
  • black: deadline will pass in more than 24 hours

The deadline can always be changed or removed. These feature can also help your annotators who will see how much time they have left to complete their jobs.

Tuesday, December 9, 2014

Vimeo is supported

As some of our users host their recordings on Vimeo, we now support processing of Vimeo files. Simply enter a Vimeo url into the Media File URL input box. 
In general, users can enter:
  • a direct url to a media file (mp3, mp4, mpg, avi, 3gp, mkv, wav and many others)
  • YouTube url
  • Vimeo url
Besides that, you can also upload a multimedia file using the upload form or SpokenData API.

Tuesday, November 11, 2014

SpokenData API - Search in Speech

SpokenData API has a new function that enables users to search in recording transcriptions. This means that you can quickly get a list of captions matching the search query with their start and end time, caption content and speaker identity. The search can be performed either in all user recordings or in a list of selected recordings.

An example of a basic SpokenData search API call can be:

It simply means to search for occurrences of student in all recording transcriptions of the DEMO account.

The returned XML shows the elapsed time for parsing the search query and for performing the search. As the number of results can be very high, the search API call supports paging. By default, the maximum number of results per page is set to 10. In the output XML, there are 2 types of results - recordings and captions. Each has different paging.