The good news is that alignment can be done automatically and in our case even language independently.
What is the difference to automatic speech transcription? Well, in case you have only audio and want to get text (transcript, verbatim), you have to use automatic speech transcription -- to convert audio into text. But there are cases when you already have the transcript. Some examples might be:
- You wrote a script for a lecture, talk, pitch, or news. You "read" the text and got a recording. Now you want to make subtitles. It is waste of time and money to transcribe your speech again. Use the aligner.
- You asked someone to have transcribed your audio and then you got only plain text. But you found subtitles useful later. Just align the previous transcript to your audio using our aligner.
- Can also be useful for e-book to audio-book alignment.
The largest advantage of our approach is in its independence from the language. It should work reasonably well for any language.
This technology has also some caveats. It expects that the speech in the audio and the text fully matches. Assuming that you have part of your speech untranscribed or some notes in the text not spoken in the audio, the technology does its best to align it. So in such cases there can appear time shifts near these regions.