Let's use any conference as an example. To record the conference and to have plenty of videos on YouTube is fine, but it starts to be messy. You can find useful following reasons for transcribing talks.
- Some people do not understand English very well. Reading subtitles can help them understand.
- You need to market your conference to attract people. Videos show the quality of your conference to prospects. Transcribing the video to text increases your SEO. More people will find you.
- Large collections of videos can be searchable with a difficulty for particular information. Time synchronous speech transcript can help you search in speech quickly even in a large collection of videos.
So the use of automatic voice to text technology is a logical step to reduce the need of human resources. Especially for cases 2) and 3). Here you do not care about a few errors, because the transcript is primarily for machines - search engines.
The huge advantage of our service here is the ability of automatic speech recognizer adaptation on the target domain - your conference. Usually, every technical conference has proceedings which are full of content words, abbreviations, technical terms etc. These words are important (within you conference) but rare in general speech. So standard recognizers trained on general speech can miss them easily and the transcript is useless for you.
To give you a real use case, SuperLectures - a conference video service - uses SpokenData.com automatic transcriptions in the above mentioned way. They provide us with proceedings so that we could adapt our recognizer. Then we return them textual transcription of their audio/video data.