tag:blogger.com,1999:blog-19701197194887337682024-03-05T14:36:12.407-08:00SpokenData BlogSpokenDatahttp://www.blogger.com/profile/06146005599895441185noreply@blogger.comBlogger34125tag:blogger.com,1999:blog-1970119719488733768.post-40547345898212363672018-06-11T07:06:00.000-07:002018-06-11T07:38:19.596-07:00Switched to MediaElement.js<a href="https://www.spokendata.com/">SpokenData</a> transcription editor has several new features that should come in handy. We have added info on elapsed time since the last save, slightly modified the graphics and changed the video player. We switched from <a href="https://www.jwplayer.com/">JW player</a> to <a href="https://www.mediaelementjs.com/">MediaElement.js</a> which gives us more customization control. When playing the video there is now more precise synchronization between the video player and the waveform component. So give a try to our updated editor. The editing experience should be smoother than before.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggvinNbnw0-niFTtLr0bp9Z-6kW3ldPa1SIdbKFP1SWvilk42ZMr6ofveedUDDNhkfiSo5Jg3913t6nNbIsBnK3-40oxA8haiYnUDOsOtWqMoeG5LnLVIoz4L45t74ga1AU6Ed1AGS_qk/s1600/mediaelement-spokendata.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="680" data-original-width="890" height="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggvinNbnw0-niFTtLr0bp9Z-6kW3ldPa1SIdbKFP1SWvilk42ZMr6ofveedUDDNhkfiSo5Jg3913t6nNbIsBnK3-40oxA8haiYnUDOsOtWqMoeG5LnLVIoz4L45t74ga1AU6Ed1AGS_qk/s400/mediaelement-spokendata.png" width="400" /></a></div>
<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-82729051645372947182016-10-21T02:51:00.000-07:002016-10-21T02:53:57.936-07:00Synchronizing text and audio - time alignmentWe have implemented <b>automatic and language independent text to audio alignment</b>. What is it all about? In short, you <b>upload your audio and text and you get the subtitles</b>. See the pictures below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwL2VhrlVbdFzvvZUqLvzgD6zDo8MQK8PLIKA7yLoDD1rMjtOFCoyv7he0-BzYlN3jkcEq1AYT-JAksK_tX96kzKzh9QL9jmHdti5mk4ISGQd_q0f6268IdswkPdCPV0seQu9vK62pt4U/s1600/alignment01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwL2VhrlVbdFzvvZUqLvzgD6zDo8MQK8PLIKA7yLoDD1rMjtOFCoyv7he0-BzYlN3jkcEq1AYT-JAksK_tX96kzKzh9QL9jmHdti5mk4ISGQd_q0f6268IdswkPdCPV0seQu9vK62pt4U/s1600/alignment01.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8XGJB4Vn8blJreewVW8ZtMG5w5D7P86z0g-V95xSlHarBpbTFXTADKxdOvE1-6IBAPJfZ7pSfhg7q1Y35NpOLgYwy5fL9GVpKTq2I2EFGfHusIoSXhamJGFShJX8StEYrktUubCxHgG4/s1600/alignment02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8XGJB4Vn8blJreewVW8ZtMG5w5D7P86z0g-V95xSlHarBpbTFXTADKxdOvE1-6IBAPJfZ7pSfhg7q1Y35NpOLgYwy5fL9GVpKTq2I2EFGfHusIoSXhamJGFShJX8StEYrktUubCxHgG4/s1600/alignment02.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
What is the time alignment? The alignment is a process attaching time stamps to a transcript or text according to the audio. Usually the text is without timing. A sentence, a paragraph, a page of text does not have any timing information. But when you have an audio attached to this text -- means the audio contains a speech -- you may want to add the timing information to the text. To know when a particular word was spoken in the audio. You can imagine the process as making subtitles (time aligned sentences) from your text and audio.<br />
<br />
The good news is that alignment can be done automatically and in our case even language independently.<br />
<br />
What is the difference to automatic speech transcription? Well, in case you have only audio and want to get text (transcript, verbatim), you have to use automatic speech transcription -- to convert audio into text. But there are cases when you already have the transcript. Some examples might be:<br />
<ul>
<li>You wrote a script for a lecture, talk, pitch, or news. You "read" the text and got a recording. Now you want to make subtitles. It is waste of time and money to transcribe your speech again. Use the aligner.</li>
<li>You asked someone to have transcribed your audio and then you got only plain text. But you found subtitles useful later. Just align the previous transcript to your audio using our aligner.</li>
<li>Can also be useful for e-book to audio-book alignment. </li>
</ul>
Here you just need to put your audio and text into SpokenData.com and you will get aligned text (like subtitles). Plus you can use our editor to <a href="http://blog.spokendata.com/2014/09/how-to-adjust-subtitle-timings.html">edit the text, timings, etc</a>. You can use our <a href="http://blog.spokendata.com/2014/06/spokendata-api-file-upload.html">API</a>. <br />
<br />
The largest advantage of our approach is in its independence from the language. It should work reasonably well for any language. <br />
<br />
This technology has also some caveats. It expects that the speech in the audio and the text fully matches. Assuming that you have part of your speech untranscribed or some notes in the text not spoken in the audio, the technology does its best to align it. So in such cases there can appear time shifts near these regions.<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-53191792846090092982016-08-25T02:41:00.004-07:002016-08-25T02:41:55.710-07:00Capitalization and punctuationWhen processing your recordings with our Speech-To-Text technology for English language, you get transcript containing capital letters and basic punctuation. This makes the subtitles easier to read. Sure, there is still a lot of space for improvement. However, we believe that you will like this new feature. Give it a try at <a href="https://spokendata.com/">https://spokendata.com</a>.<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip2MPMaDSG_zdnq70NFWAR4WuKAGlKA7Zjy7-mVtTfWd9-xCtV0kSzUcnDx5TQ7B23TCZDMYXCplhPSU8GUWPXXQFC3521AaBrvCKMQjJfDG508PfzRNyRfQvAT0lhEQAhDxA60bfj6BA/s1600/capitalization-punctuation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip2MPMaDSG_zdnq70NFWAR4WuKAGlKA7Zjy7-mVtTfWd9-xCtV0kSzUcnDx5TQ7B23TCZDMYXCplhPSU8GUWPXXQFC3521AaBrvCKMQjJfDG508PfzRNyRfQvAT0lhEQAhDxA60bfj6BA/s1600/capitalization-punctuation.png" /></a>SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-32430725476215743912015-10-14T05:47:00.000-07:002015-10-15T01:48:35.071-07:00SpokenData runs on HTTPSSince last week, the entire <a href="https://spokendata.com/">SpokenData</a> website has been using HTTPS, so not just only for authentication and payments as previously. All HTTP requests are automatically redirected to HTTPS ones. We try to keep security pretty tight.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://spokendata.com/"><img border="0" height="47" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnZj3ZF5gsXDS062fppLABXSolgilsOncEiS51YhB1ylxP5Bw6eAokQ3k_A0Zs3I1hIxkyLsy4H3c8HwPMDS-XWSXATuFkXuCJjSp0obUEoAB4J7QqK7cMZYxIagA_BH44MR3aWxLQ128/s320/https-spokendata.jpg" width="320" /></a></div>
<span id="goog_411491415"></span><span id="goog_411491416"></span><br />SpokenDatahttp://www.blogger.com/profile/06146005599895441185noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-90882908241796876912015-09-29T07:32:00.002-07:002015-09-30T01:12:29.915-07:00Times Digital: QuickQuote<a href="http://times.github.io/quickQuote/">QuickQuote</a> is a web application helping users to select video quotes from a video and embed them in an article. <a href="http://spokendata.com/api">SpokenData API</a> is used to generate video transcription. This project is maintained by <a href="https://github.com/times">Times Digital</a> and <a href="https://github.com/pietrop">Pietro Passarelli</a>.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYrD-AmT7nwL5lJNq5b-On9kMx1cMPKaS6D6hlWXDs1tCGltgYWsRJ4WZqtQzd6OhSjh-uOnMG6TtRvKemnw-LZ0N1e5zDThb_r8GK9h-RMqdF1SrqYDxMby-7LT5XQH9xQTjqlV8MTu0/s1600/quickquote.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYrD-AmT7nwL5lJNq5b-On9kMx1cMPKaS6D6hlWXDs1tCGltgYWsRJ4WZqtQzd6OhSjh-uOnMG6TtRvKemnw-LZ0N1e5zDThb_r8GK9h-RMqdF1SrqYDxMby-7LT5XQH9xQTjqlV8MTu0/s400/quickquote.png" width="400" /></a></div>
You can also get to know more about the tool at <a href="http://www.niemanlab.org/2015/09/a-new-tool-from-the-times-of-london-lets-you-easily-detect-and-capture-quotes-from-a-video/">http://www.niemanlab.org/2015/09/a-new-tool-from-the-times-of-london-lets-you-easily-detect-and-capture-quotes-from-a-video/</a>.SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-2537098863897470472015-06-18T07:18:00.003-07:002015-06-18T07:20:19.213-07:00Automatic Speech Transcription in English, Russian, Chinese, Spanish, Czech, Slovak, ...<a href="http://spokendata.com/">SpokenData</a> automatically transcribes recordings in quite a number of language and more are about to come. Just upload your recordings and select the language. Dou you have some specific domain of audio such as TV news, lectures and so forth? Then you can also select from specifically trained recognizers that should generate better transcription. If you have lots of data to transcribe, contact us and we will train a recognizer to get the best transcription of your data.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://spokendata.com/"><img border="0" height="250" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZR29jqHI4fGgfp9F0tAadkfyvvvXg-BPZY9dcBQZs9wb5sstW64IKbjsJBp1RB7UKcZk9B2lrDxUmFEIGCqzyKYPTHe6mwfWsvBl8xA0gPilNvbV4-aFOTqfckFScTQF-ZaluIf_fvcQ/s400/spokendata-languages.png" width="400" /></a></div>
SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-79862272664083914922015-02-18T01:35:00.001-08:002015-02-18T01:35:21.862-08:00Онлайн перевод устной речи в текст - теперь с поддержкой русского языка!<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" height="101" width="200" /></a></div>
Мы добавили на наш сайт поддержку русского языка, и теперь Вы можете расшифровывать записи с речью на русском! Просто залогиньтесь на <a href="http://spokendata.com/">SpokenData</a>, загрузите свою запись и получите автоматическую письменную расшифровку текста на записи совершенно бесплатно! Вы также можете указать путь к вашей записи с помощью ссылки на YouTube, Vimeo или любой другой онлайн хостинг. Наша программа скачивает данные и конвертирует аудио в текст за считанные минуты. Когда расшифровка закончена, Вы получите оповещение по электронной почте. После этого Вы можете вностить изменения в текст с помощью нашего онлайн-редактора.<br /><br />Для разработчиков программного обеспечения мы предоставляем простой в употреблении <a href="http://spokendata.com/api-for-developers">API</a>.<br /><br />Вы все еще не зарегистрированы на SpokenData.com? Пройдите быструю регистрацию <a href="http://spokendata.com/register">здесь</a> и откройте для себя возможности расшифровки Ваших аудио меньше, чем за минуту!SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-57806267234473748072015-02-07T12:08:00.001-08:002015-02-10T01:20:20.577-08:00Russian voice to text online service.<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" height="101" width="200" /></a>Hi,<br />
<br />
We added support of a new language - Russian. So, you can process any of your recordings in Russian now. Just log on <a href="http://spokendata.com/">SpokenData</a>, submit your data and get automatic text transcript for free in few minutes. Another option is to provide us with
URL of YouTube, Vimeo or other on-line services where your data is. We download the data
and convert them into text quickly. You are notified by email when the
conversion of audio into text is done. You can also edit the transcript yourself in our web
editor later.<br />
If you are a developer, feel free to integrate <a href="http://spokendata.com/api-for-developers">our API</a>. It's easy.<br />
<br />
You do not have SpokenData.com account yet? Just <a href="http://spokendata.com/register">register here</a> and you can process your data in 1 minute! SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-58895755864675073152015-01-31T10:20:00.002-08:002015-02-10T01:21:31.135-08:00American Spanish speech to text for free!<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" height="101" width="200" /></a>Hello,<br />
<br />
We are happy to announce that we support American Spanish now. So, if you have any voice recordings, you can process them in <a href="http://spokendata.com/">SpokenData</a> to get automatic text transcript for free now. As our service is in cloud, it is very easy for you to get the text. Just take you audio or video files in Spanish and upload them.<br />
The second option is to provide us with URL of YouTube, Vimeo or other on-line services. We download the data and convert them into text quickly. You are norified by email when the conversion is done. You can also edit the transcript yourself in our web editor later. If you are a developer, feel free to integrate <a href="http://spokendata.com/api-for-developers">our API</a>. It's easy.<br />
<br />
You do not have SpokenData.com account yet? Just <a href="http://spokendata.com/register">register here</a> and you can process your data in 1 minute! SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-58101021523568909032015-01-23T00:26:00.000-08:002015-01-31T10:21:44.458-08:00Download recording video, audio and subtitles<span style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQuu8YHZVy3KrsXOCZpqAttc-gjLw48hCMPzDxlaWanyy78iv729_0a5V1wfiwjTnYUkE0KL996Y9CDXijOB06b_qhNiGzpbU3J2FSUSGktuRWBy802ETyP6mKiJZX6NbXAeIl4ot9Rxk/s1600/spokendata-download-menu.png" height="320" width="204" /></span><a href="http://spokendata.com/">SpokenData </a>users can now download processed recording video in mp4, audio in mp3 and recording subtitles in a variety of formats. The files are accessible through the download menu or SpokenData API.<br />
<blockquote class="tr_bq">
<ul>
<li><a href="http://en.wikipedia.org/wiki/SubRip#SubRip_text_file_format">SRT</a> - SubRip text file format</li>
<li>TRS - used in <a href="http://trans.sourceforge.net/en/presentation.php">Transcriber</a></li>
<li><a href="http://dev.w3.org/html5/webvtt/">WebVTT</a> - The Web Video Text Tracks Format</li>
</ul>
</blockquote>
<br />
<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-75462396908694696732015-01-19T04:25:00.001-08:002015-01-31T10:22:13.738-08:00Set deadline for your transcriptionDo you create/edit the transcription yourself or have a team of annotators? <a href="http://spokendata.com/">SpokenData</a> has a new handy feature that might help you finish the transcription process in time. From now on, you can select the deadline for each processed recording. Just click on the menu button and select the <b>Set deadline </b>item.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidjSbnYGrTFVIB6RAjD0Eo4dJbx4I4qEEXdrerFU7MJakAZ1owdKvGgEiDQLwzZIzNweIH_cYcOnM-H022FpPf4tsvYVKIv6Mz9ainwNGDQRPeRThuyma1zgG3Y4zdbG8dqTUU2_KfDYI/s1600/deadline.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidjSbnYGrTFVIB6RAjD0Eo4dJbx4I4qEEXdrerFU7MJakAZ1owdKvGgEiDQLwzZIzNweIH_cYcOnM-H022FpPf4tsvYVKIv6Mz9ainwNGDQRPeRThuyma1zgG3Y4zdbG8dqTUU2_KfDYI/s1600/deadline.png" height="204" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
Then, you can order your recordings by the deadline value and see the recordings which should be finished soon. The deadline information can appear in 3 different colors:<br />
<br />
<ul>
<li><b>red</b>: deadline has already passed</li>
<li><b>orange</b>: deadline will pass within 24 hours</li>
<li><b>black</b>: deadline will pass in more than 24 hours</li>
</ul>
<br />
<br />
The deadline can always be changed or removed. These feature can also help your annotators who will see how much time they have left to complete their jobs.SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-20803808514750991692014-12-09T06:20:00.001-08:002015-01-31T10:23:14.734-08:00Vimeo is supported<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo79TpFe3LCaoocp4b6Gs-XWpfKL1mfQyEuFF4JhPG-CYjeqY8hD-jVTYYXadyD48LuNj6ZaKsoBZlKOpzqQoaOePdg94ZePcObrOB_gN9QI6sFUeFqCX7f06Qt2SqqKXIWrcVuxfg2TE/s1600/vimeo-logo.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo79TpFe3LCaoocp4b6Gs-XWpfKL1mfQyEuFF4JhPG-CYjeqY8hD-jVTYYXadyD48LuNj6ZaKsoBZlKOpzqQoaOePdg94ZePcObrOB_gN9QI6sFUeFqCX7f06Qt2SqqKXIWrcVuxfg2TE/s1600/vimeo-logo.png" height="60" width="200" /></a>As some of our users host their recordings on Vimeo, <b>we now support processing of Vimeo files</b>. Simply enter a Vimeo url into the Media File URL input box. <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmxK9NuX6hmG5gT-Phd3THjbABjlhCbXRUe6iB88fXd1X_U6IUl9LsP_otieKnZR1hNy8bhSfqWkGT3aJeGS-4EaATquxKz0VwyYAB0wloZz4hLeX8lMpZVnI5z7xL3Kn7tu4IoTCombc/s1600/vimeo-media-file-url.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmxK9NuX6hmG5gT-Phd3THjbABjlhCbXRUe6iB88fXd1X_U6IUl9LsP_otieKnZR1hNy8bhSfqWkGT3aJeGS-4EaATquxKz0VwyYAB0wloZz4hLeX8lMpZVnI5z7xL3Kn7tu4IoTCombc/s1600/vimeo-media-file-url.png" height="38" width="320" /></a><br />
In general, users can enter:<br />
<ul>
<li>a direct url to a media file (mp3, mp4, mpg, avi, 3gp, mkv, wav and many others)</li>
<li>YouTube url</li>
<li>Vimeo url</li>
</ul>
<div>
Besides that, you can also upload a multimedia file using the upload form or <a href="http://spokendata.com/api-for-developers">SpokenData API</a>.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJfmR2t5nqVJdoH_Vf9fBNP1KlLheUGJxQnzXK1CHzQ8CHoFz0rWQTPRpvmrJ_E2AyF16R0Fwd7fQtE3rM920rlMxBr_1WNe4ad2Tkju_xVM1ZmOChe79YmgvZT4HuWbtsyua5RK5DeLc/s1600/vimeo-url.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a></div>
SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-5467551096577946602014-11-11T05:09:00.000-08:002015-01-31T10:24:47.933-08:00SpokenData API - Search in Speech<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBYk9-V_jJlAD_Wnp7JPHLQjL_xSZzmbi03xD23RxfrbiAfEQR3zvgFen04VY-6SqT2mSnHVTlrQ-GujKvJB81ezEAtVKuvmDtK8CC8WZ1HxRmSF6H-d95j47xjGGuSvVAYxGcD9oWGME/s1600/spokendata-search.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="116" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBYk9-V_jJlAD_Wnp7JPHLQjL_xSZzmbi03xD23RxfrbiAfEQR3zvgFen04VY-6SqT2mSnHVTlrQ-GujKvJB81ezEAtVKuvmDtK8CC8WZ1HxRmSF6H-d95j47xjGGuSvVAYxGcD9oWGME/s200/spokendata-search.png" width="200" /></a><a href="http://spokendata.com/api-for-developers">SpokenData API</a> has a new function that <b>enables users to search in recording transcriptions</b>. This means that you can quickly get a list of captions matching the search query with their start and end time, caption content and speaker identity. The search can be performed either in all user recordings or in a list of selected recordings.<br />
<br />
An example of a basic SpokenData search <a href="http://spokendata.com/api-for-developers">API</a> call can be:<br />
<a href="http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=student">http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=student</a><br />
<br />
It simply means to search for occurrences of <i>student</i> in all recording transcriptions of the DEMO account.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfF-l7fRNXpP3Q9axy4DJ0qxgys9l91Qmjaz6CHlAJ3t1YL-OaFaexWG5gjEmV16PafDlbGQyIDJc2g6aoc_smvDrP4PvsDIBGiYHRdg7ywlc-fh5N6eMEdP6thNXnHaRjAGRRxlsaEgc/s1600/search-output.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfF-l7fRNXpP3Q9axy4DJ0qxgys9l91Qmjaz6CHlAJ3t1YL-OaFaexWG5gjEmV16PafDlbGQyIDJc2g6aoc_smvDrP4PvsDIBGiYHRdg7ywlc-fh5N6eMEdP6thNXnHaRjAGRRxlsaEgc/s1600/search-output.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
The returned XML shows the elapsed time for parsing the search query and for performing the search. As the number of results can be very high, the search API call <b>supports paging</b>. By default, the maximum number of results per page is set to 10. In the output XML, there are 2 types of results - recordings and captions. Each has different paging.</div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<a name='more'></a><br />
<div class="separator" style="clear: both; text-align: left;">
<b>Recordings</b></div>
<div class="separator" style="clear: both; text-align: left;">
The value of <i>number_of_occurrences </i>shows the number of recording captions matching the search query. Recordings are sorted by number of occurrences (from higher to lower). </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Recordings paging:</div>
<ul>
<li>recordingPageSize = 10 by default</li>
<li>recordingPageNumber = 0 by default</li>
</ul>
<div>
<b>Captions</b></div>
<div>
Every caption has several values - start and end time of caption, speaker identity and the caption content. Captions are ordered by the caption start time (from lower to higher).<br />
<br />
Captions paging:</div>
<div>
<ul>
<li>captionPageSize = 10 by default</li>
<li>captionPageNumber = 0 by default</li>
</ul>
</div>
Caption results can be omitted by adding a parameter <i>recordingListOnly=1 </i>to the search API call.<br />
<br />
<b>Search in selected recordings</b><br />
The optional parameter recordingId can specify which recordings are selected for the search. The value of this parameter is a list of recording IDs delimited by a comma (e.g.: recordingId=845,446).<br />
<br />
An example of an advanced SpokenData API search call:<br />
<a href="http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=what&recordingId=846&captionPageSize=2&captionPageNumber=2">http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=what&recordingId=846&captionPageSize=2&captionPageNumber=1</a><br />
<br />
This API call returns the third and fourth caption result (the caption page size is 2 and the page number is 1) and the search is performed only in transcription of recording with id=846.<br />
<br />
More information on search and other API calls is shown after signing in and enabling SpokenData API.SpokenDatahttp://www.blogger.com/profile/06146005599895441185noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-75564574047231395112014-10-20T08:27:00.001-07:002015-01-31T10:25:40.408-08:00How to show the speaker segmentation<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv2RYcelGUxZdZwVYmmZ_sL7s8soSrU09FF-XW-ZOu4MtkS0K9b60IdtDy85ui9qHle1Qvl0hvKRTdlXdvo7hD9r4oxuKvXPgPrjPlHrWtsK-xB2pFI-yqgbzqLkPHj3j9l2DdCpCW35k/s1600/show-speaker-segmentation.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv2RYcelGUxZdZwVYmmZ_sL7s8soSrU09FF-XW-ZOu4MtkS0K9b60IdtDy85ui9qHle1Qvl0hvKRTdlXdvo7hD9r4oxuKvXPgPrjPlHrWtsK-xB2pFI-yqgbzqLkPHj3j9l2DdCpCW35k/s1600/show-speaker-segmentation.jpg" height="164" width="320" /></a>In SpokenData subtitles editor, we changed the rule of displaying the speaker segmentation. From now on, it is hidden by default except for the recordings processed directly by the Speaker segmentation method. However, the speaker segmentation can quickly be shown by clicking on the checkbox <b>Show speaker segmentation</b>. Currently, SpokenData subtitles editor does not support editing the generated speaker identity. When it is implemented, we will consider changing this rule.SpokenDatahttp://www.blogger.com/profile/06146005599895441185noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-9911108086639990552014-09-29T07:51:00.000-07:002014-09-29T07:54:23.690-07:00How to adjust subtitle timingsEach subtitle has its start and end time. These specify when and for how long the caption appears over the video. When editing subtitles in our editor, you can simply adjust their timings either with your mouse or directly from the keyboard. First, you need to enter into the <b>editing mode</b>. This can be achieved by several ways:<br />
<div>
<ul>
<li><b>double click on the subtitle caption</b></li>
<li><b>double click on the audio waveform segment</b></li>
<li><b>CTRL + click on the subtitle caption</b></li>
<li><b>CTRL + I</b> - to edit the currently played caption</li>
<li><b>TAB</b> or <b>SHIFT+TAB </b>- to edit the next or previous caption</li>
</ul>
<div>
Now, as you are in the <b>editing mode, </b>you can change the subtitle caption and its start and end time. The currently edited subtitle is marked with a light-blue background color.</div>
<div>
<ul>
<li><b>ALT + Left</b> - shift caption start time by -0.1s</li>
<li><b>ALT + Right</b> - shift caption start time by +0.1s</li>
<li><b>ALT + Up</b> - shift caption end time by +0.1s</li>
<li><b>ALT + Down</b> - shift caption end time by -0.1s</li>
</ul>
</div>
<div>
<br /></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj16Qdqw3DuehbZ4E58QX9LCkXvIaH2cbV1Lm-ktM63MusqZFI4Dzy1Ey9S3UWfUIg3du3XMigNkc-lK_WXT8nA39drcVMcPz4kONxB1-7utIp9XxXLiq4QnTEC0UCd4CSu6Ct0HYf1vxE/s1600/subtitles-editor-edit-mode.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj16Qdqw3DuehbZ4E58QX9LCkXvIaH2cbV1Lm-ktM63MusqZFI4Dzy1Ey9S3UWfUIg3du3XMigNkc-lK_WXT8nA39drcVMcPz4kONxB1-7utIp9XxXLiq4QnTEC0UCd4CSu6Ct0HYf1vxE/s1600/subtitles-editor-edit-mode.png" height="275" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
On the left in the editor, there is an audio waveform with segments that represent duration of particular subtitles. You can easily adjust subtitle duration by holding down the mouse left button and dragging the segment borders. The audio waveform can significantly help you to define the segment beginning and end because moments with no speech/sound in the audio look like a straight line.</div>
SpokenDatahttp://www.blogger.com/profile/06146005599895441185noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-18415005321533732882014-09-12T06:38:00.001-07:002015-01-31T10:26:51.255-08:00Interactive waveform with Outwave.js<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJMpw86edZSoyvIy9wR6v5rgxydUqJ1qZLdBgPU5WcYyL78-d7xi8NNiu5jVAGMFRR2kbfof40hiKkZlLyRcr2kjco3xC55Lc2lMbEiMHz8MmgilZ6mU7FYVjyH_P4qwIGjXY2XgCd6BY/s1600/outwave-js.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJMpw86edZSoyvIy9wR6v5rgxydUqJ1qZLdBgPU5WcYyL78-d7xi8NNiu5jVAGMFRR2kbfof40hiKkZlLyRcr2kjco3xC55Lc2lMbEiMHz8MmgilZ6mU7FYVjyH_P4qwIGjXY2XgCd6BY/s1600/outwave-js.png" height="191" width="320" /></a><a href="https://github.com/vdot/outwave.js">Outwave.js</a> is a handy library that can render audio waveform in a web browser. Its development was also supported by SpokenData. Apart from that, this library has a great extension for displaying annotation segments directly on the waveform. The segments can be easily added, deleted, merged or split. By and large, we really needed such a library.<br />
<br />
Therefore, we are happy to announce that Outwave.js was integrated into the SpokenData subtitles editor. From now on, our users will more easily define speech or non-speech segments just by dragging the segment boundaries on the waveform with the mouse button held down.<br />
<br />
We are certain you will benefit from the Outwave library as we do. The fastest way to test our new feature is to start the <a href="http://spokendata.com/demo/start">SpokenData demo</a> and edit any recording subtitles.SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-25162946973636057272014-08-04T06:51:00.001-07:002015-01-31T10:27:45.553-08:00New annotation.xml file structure<div class="collapsible" id="collapsible148" style="font-size: 13px;">
<div class="expanded">
<div class="line">
<span style="font-family: inherit;">We have modified the annotation xml file structure. Now it is a way easier to parse. You can get this file through <a href="http://spokendata.com/api-for-developers">SpokenData API</a>. See this short example:</span><br />
<div style="font-family: monospace;">
<span class="webkit-html-tag"><br /></span></div>
<div style="font-family: monospace;">
<span class="webkit-html-tag"><segment></span></div>
</div>
<div class="collapsible-content" style="font-family: monospace; margin-left: 1em;">
<div class="line">
<span class="webkit-html-tag"><start></span><span class="text">63.25</span><span class="webkit-html-tag"></start></span></div>
<div class="line">
<span class="webkit-html-tag"><end></span><span class="text">65.40</span><span class="webkit-html-tag"></end></span></div>
<div class="line">
<span class="webkit-html-tag"><speaker></span><span class="text">A</span><span class="webkit-html-tag"></speaker></span></div>
<div class="collapsible" id="collapsible149">
<div class="expanded">
<div class="line">
<span class="button collapse-button" style="-webkit-user-select: none; background-image: -webkit-canvas(arrowDown); background-position: 0% 0%; background-repeat: no-repeat; cursor: pointer; display: inline-block; height: 10px; margin-left: -10px; vertical-align: bottom; width: 10px;"></span><span class="webkit-html-tag"><text>Hello, this is the first caption</span></text></div>
</div>
</div>
</div>
<div class="line" style="font-family: monospace;">
<span class="webkit-html-tag"></segment></span></div>
</div>
</div>
<div class="collapsible" id="collapsible150">
<div class="expanded">
<div class="line" style="font-family: monospace; font-size: 13px;">
<span class="button collapse-button" style="-webkit-user-select: none; background-image: -webkit-canvas(arrowDown); background-position: 0% 0%; background-repeat: no-repeat; cursor: pointer; display: inline-block; height: 10px; margin-left: -10px; vertical-align: bottom; width: 10px;"></span><span class="webkit-html-tag"><segment></span></div>
<div class="collapsible-content" style="font-family: monospace; font-size: 13px; margin-left: 1em;">
<div class="line">
<span class="webkit-html-tag"><start></span><span class="text">72.92</span><span class="webkit-html-tag"></start></span></div>
<div class="line">
<span class="webkit-html-tag"><end></span><span class="text">74.49</span><span class="webkit-html-tag"></end></span></div>
<div class="line">
<span class="webkit-html-tag"><speaker>B</span><span class="webkit-html-tag"></speaker></span></div>
<div class="collapsible" id="collapsible151">
<div class="expanded">
<div class="line">
<span class="button collapse-button" style="-webkit-user-select: none; background-image: -webkit-canvas(arrowDown); background-position: 0% 0%; background-repeat: no-repeat; cursor: pointer; display: inline-block; height: 10px; margin-left: -10px; vertical-align: bottom; width: 10px;"></span><span class="webkit-html-tag"><text>and here comes the second one</span></text></div>
</div>
</div>
</div>
<div class="line">
<div style="font-family: monospace; font-size: 13px;">
<span class="webkit-html-tag"></segment></span></div>
<div style="font-family: monospace; font-size: 13px;">
<span class="webkit-html-tag"><br /></span></div>
<div style="font-size: 13px;">
<span class="webkit-html-tag" style="font-family: inherit;">Start and end tags represent the subtitle appearance time in seconds. We store values with precision of 2 decimal places. Speaker tag identifies the person who is speaking. It can keep whatever alphanumerical value. And the text tag serves for storing the subtitle content. </span><br />
<span class="webkit-html-tag" style="font-family: inherit;"><br /></span>
<span class="webkit-html-tag" style="font-family: inherit;">You can see a live example from the SpokenData demo here:</span></div>
<span class="webkit-html-tag" style="font-family: inherit; font-size: x-small;"><a href="http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/846/annotation.xml">http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/846/annotation.xml</a></span></div>
<div class="line" style="font-size: 13px;">
<span class="webkit-html-tag"><br /></span></div>
<div class="line" style="font-family: monospace; font-size: 13px;">
<br /></div>
</div>
</div>
SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-5031981486773016012014-07-14T06:37:00.001-07:002014-07-14T06:42:42.444-07:00Use case: How to transcribe conference video recordings and make subtitles for them?One handy usage of automatic speech recognition technologies - speech-to-text - is a transcription of conference talks. There are plenty of conferences and lots of them are being recorded and published on a conference homepage or YouTube for example.<br />
Let's use any conference as an example. To record the conference and to have plenty of videos on YouTube is fine, but it starts to be messy. You can find useful following reasons for transcribing talks.<br />
<ol>
<li>Some people do not understand English very well. Reading subtitles can help them understand.</li>
<li>You need to market your conference to attract people. Videos show the quality of your conference to prospects. Transcribing the video to text increases your SEO. More people will find you.</li>
<li>Large collections of videos can be searchable with a difficulty for particular information. Time synchronous speech transcript can help you search in speech quickly even in a large collection of videos. </li>
</ol>
To use human labor for subtitling videos make sense, because people do not like watching subtitles with errors - and automatic voice to text can make errors. On the other hand, transcribing all recordings from a several day long conference can be enormously expensive on human resources.<br />
So the use of automatic voice to text technology is a logical step to reduce the need of human resources. Especially for cases 2) and 3). Here you do not care about a few errors, because the transcript is primarily for machines - search engines.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZPTp6zjvR6o1kHSspisNWenzMTF81vQDUfktjl21n9B7RRDvulq9C_cMCGPSy5939teV9_0LCweEdpN_Z6u-wzFx0wqHkJtNeoMavaI298hAy2hWFyQyHjg3ta8pyUxsocnbC_9Qsc9o/s1600/general.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZPTp6zjvR6o1kHSspisNWenzMTF81vQDUfktjl21n9B7RRDvulq9C_cMCGPSy5939teV9_0LCweEdpN_Z6u-wzFx0wqHkJtNeoMavaI298hAy2hWFyQyHjg3ta8pyUxsocnbC_9Qsc9o/s1600/general.jpg" height="136" width="320" /></a></div>
<br />
The huge advantage of our service here is the ability of automatic speech recognizer adaptation on the target domain - your conference. Usually, every technical conference has proceedings which are full of content words, abbreviations, technical terms etc. These words are important (within you conference) but rare in general speech. So standard recognizers trained on general speech can miss them easily and the transcript is useless for you.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjckF__eGscNvbQRyylVuATSMtSsD_7YdslYuFVMbtcoNMykUnDOp-jcORPZ52vd2PXg1nJFX6HSGevCTJqxp_pI-xKrQDJD-IqWI4IfGr_I1b7HUYSJ57Da33Emy519mwjn_f6pf44wIk/s1600/adapted.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjckF__eGscNvbQRyylVuATSMtSsD_7YdslYuFVMbtcoNMykUnDOp-jcORPZ52vd2PXg1nJFX6HSGevCTJqxp_pI-xKrQDJD-IqWI4IfGr_I1b7HUYSJ57Da33Emy519mwjn_f6pf44wIk/s1600/adapted.jpg" height="136" width="320" /></a></div>
<br />
To give you a real use case, SuperLectures - a <a href="http://www.superlectures.com/">conference video service</a> - uses <a href="http://spokendata.com/">SpokenData.com</a> automatic transcriptions in the above mentioned way. They provide us with proceedings so that we could adapt our recognizer. Then we return them textual transcription of their audio/video data.<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-30153538020836025392014-06-26T05:23:00.000-07:002014-06-26T06:47:05.176-07:00SpokenData API - File upload<b>SpokenData API</b> allows developers to easily add new recordings to their media library. You can either pass a media file URL or you can upload the whole file using the HTTP PUT method that was recently implemented into SpokenData. This post will demonstrate the second option - uploading a file through the SpokenData API call.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS7RD9ePABvdn8VGeIdpen7k8K2zYrj202wIRmdX_Wzp_y35Lg-DcZ2Coj3vtATwmrKwNRy7vcZga6BPGU8NaI-w94qCtk-KaVwNtbM5xOcVXX82bh9DHIZa0qiu3f2hAdMVx0oaoKDts/s1600/spokendata-api.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS7RD9ePABvdn8VGeIdpen7k8K2zYrj202wIRmdX_Wzp_y35Lg-DcZ2Coj3vtATwmrKwNRy7vcZga6BPGU8NaI-w94qCtk-KaVwNtbM5xOcVXX82bh9DHIZa0qiu3f2hAdMVx0oaoKDts/s1600/spokendata-api.png" height="127" width="320" /></a></div>
<br />
<br />
So, how does it work? Each SpokenData API function is composed of SpokenData base <b>API url</b>, <b>USER-ID</b>, <b>API-TOKEN</b> and the <b>name of the function</b>.<br />
<br />
<table>
<tbody>
<tr><th>SpokenData API url:</th><td>http://spokendata.com/api</td></tr>
<tr><th>USER-ID: </th><td>different for each user, available for signed-in at <a href="http://spokendata/api">http://spokendata/api</a></td></tr>
<tr><th>API-TOKEN: </th><td>different for each user, available for signed-in at <a href="http://spokendata/api">http://spokendata/api</a></td></tr>
<tr><th>API function: </th><td>recording/put</td></tr>
</tbody></table>
<br />
If we concatenate the above values, we get the API call url. It may look like this:<br />
<div style="background: rgb(238, 238, 238); overflow: scroll; padding: 10px; white-space: nowrap;">
<span style="font-family: Courier New, Courier, monospace;">http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/put</span></div>
<br />
Each API call can have parameters. When uploading a new file through API, you need to enter the recording filename and the language for automatic speech processing. So basically, you will end up with a URL looking like this:<br />
<div style="background: rgb(238, 238, 238); overflow: scroll; padding: 10px; white-space: nowrap;">
<span style="font-family: Courier New, Courier, monospace;">http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/put?filename=audio.mp3&language=english</span></div>
<br />
Basically, there are also available other parameters you can read about in the <a href="http://spokendata.com/api">SpokenData API documentation</a>. When you call the above url, <b>don't forget to put the file content</b>.<br />
<br />
Here is a code example that uploads an MP3 file using the HTTP PUT method to SpokenData.<br />
<div style="background: rgb(238, 238, 238); overflow: scroll; padding: 10px; white-space: nowrap;">
<div>
<span style="font-family: Courier New, Courier, monospace;"><?php</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>$fileToUpload = 'd:/audio.mp3';</span></div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>$url = 'http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/put?filename=audio.mp3&language=english';</span><br />
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>$file = fopen($fileToUpload, "rb");</span></div>
<div>
<span class="Apple-tab-span" style="white-space: pre;"><span style="font-family: Courier New, Courier, monospace;"> </span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>$curl = curl_init();</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 2);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_HEADER, false);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_BINARYTRANSFER, 1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_URL, $url);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_VERBOSE, '1');</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_HTTPHEADER, array('Expect: '));</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_PUT, 1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_INFILE, $file);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_setopt($curl, CURLOPT_INFILESIZE, filesize($fileToUpload));</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>$response = curl_exec($curl);<span class="Apple-tab-span" style="white-space: pre;"> </span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>curl_close($curl); </span></div>
<div>
<span class="Apple-tab-span" style="white-space: pre;"><span style="font-family: Courier New, Courier, monospace;"> </span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>header ("Content-Type:text/xml");</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>echo $response;</span>
</div>
</div>
<br />
The server responds in XML. Here is an example response of the above script.<br />
<div style="background: rgb(238, 238, 238); overflow: scroll; padding: 10px; white-space: nowrap;">
<table><tbody>
<tr><td class="webkit-line-number" value="1"></td><td class="webkit-line-content"><span class="webkit-html-comment"><span style="font-family: Courier New, Courier, monospace;"><?xml version="1.0" encoding="utf8"?></span></span></td></tr>
<tr><td class="webkit-line-number" value="2"></td><td class="webkit-line-content"><span class="webkit-html-tag"><span style="font-family: Courier New, Courier, monospace;"><data></span></span></td></tr>
<tr><td class="webkit-line-number" value="3"></td><td class="webkit-line-content"><span style="font-family: Courier New, Courier, monospace;"><span class="webkit-html-tag"><message></span>New media file &quot;audio.mp3&quot; was successfully added.<span class="webkit-html-tag"></message></span></span></td></tr>
<tr><td class="webkit-line-number" value="4"></td><td class="webkit-line-content"><span style="font-family: Courier New, Courier, monospace;"><span class="webkit-html-tag"><recording <span class="webkit-html-attribute-name">id</span>="<span class="webkit-html-attribute-value">1373</span>"></span><span class="webkit-html-tag"></recording></span></span></td></tr>
<tr><td class="webkit-line-number" value="5"></td><td class="webkit-line-content"><span class="webkit-html-tag"><span style="font-family: Courier New, Courier, monospace;"></data></span></span></td></tr>
</tbody></table>
</div>
<br />
As you can see, the file <strong>audio.mp3 </strong>was successfully added and assigned the recording id = <b>1373</b>.SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-35403781778189922882014-06-05T07:09:00.003-07:002014-06-05T07:10:53.082-07:00Tagging<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS2tARSS7-1_czV9h8yNGk8rbDtQWDfkr4sKyQAUhyT3RmoVKm8GYryCZ1i0vHmjbOAf2RSARTjoyouQIFx5SGdqWi9E7jPyG2DRUQQdsWf0DJNN5qFymvCl8eDK-g2mrS1VJUxx6wvIU/s1600/tagging.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS2tARSS7-1_czV9h8yNGk8rbDtQWDfkr4sKyQAUhyT3RmoVKm8GYryCZ1i0vHmjbOAf2RSARTjoyouQIFx5SGdqWi9E7jPyG2DRUQQdsWf0DJNN5qFymvCl8eDK-g2mrS1VJUxx6wvIU/s1600/tagging.png" height="286" width="320" /></a>Every registered user has a set of 6 tags. Each tag is marked with a different color and can have a title. Tags come greatly in handy when you want to filter your recordings. For example, you have plenty of recordings and you work on their transcriptions. If you are happy with the transcription, you mark the recording with a tag titled <b>done</b>. As tags are displayed on the top of recording thumbnails, you will immediately spot those recordings which do not need to be transcribed anymore.<br />
<br />
Tag titles can be changed at any time. Tagging is allowed on your dashboard or in the transcription editing mode.<br />
<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-2838561141716990652014-05-22T00:59:00.000-07:002014-05-22T01:05:43.147-07:00Integration of BrainTree payments<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhELEkU_yFZupwdXNkeJZSgBkvC3ETOQspQPzqThJBRBzop1bggCvJBUWSk7rPLoqOMeJYBu2hn-3tH1LbUAw9cVckfKsJr0SSdr857iG-nRqHWq71Uy6IMqoK55GcTFkeHSHDPa_JzkfI/s1600/buy.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhELEkU_yFZupwdXNkeJZSgBkvC3ETOQspQPzqThJBRBzop1bggCvJBUWSk7rPLoqOMeJYBu2hn-3tH1LbUAw9cVckfKsJr0SSdr857iG-nRqHWq71Uy6IMqoK55GcTFkeHSHDPa_JzkfI/s1600/buy.jpg" height="123" width="200" /></a>From now on, every registered user can get the whole transcription of their multimedia files. To lower the load of our computation cloud, we decided to charge for processing of files longer than 15 minutes. Processing of shorter files remains free and is available to anyone.<br />
<br />
If you are not sure about the quality of the automatic transcription, start with letting the system transcribe first 15 minutes. Based on the result, you can then decide whether it is worth buying the whole transcription.<br />
<br />
As the first payment method, we decided to integrate <a href="https://www.braintreepayments.com/">BrainTree</a> payments into <a href="http://spokendata.com/">SpokenData</a>. This allows users to pay by a card. Cards like Visa, MasterCard, Discover and American Express are accepted. All payments are now in EUR.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.braintreepayments.com/" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://ignitiondeck.com/id/wp-content/uploads/2013/06/braintree-logo.png" height="50" width="200" /></a></div>
<br />
The great thing about BrainTree is that the user is not redirected to a third-party payment gateway but remains on the SpokenData website. All sensitive data is encrypted and not stored. Read more about Client-side encryption <a href="https://www.braintreepayments.com/braintrust/client-side-encryption">here</a>.SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-51504542105177002782014-05-20T00:22:00.000-07:002014-05-20T00:58:18.259-07:00WebVTT subtitles support<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzwfqEZwRzYEwVVlN5Syz-G6RWqzqlCseZraI1rqLglEY9LQ6ur38B3na5sgWdjcNF-Ys3r7UBoRi26FBchxlr7THsmexVHT3qSmUJX4sfam3okzNVVkXIVvgAlRO21LC6UpjkBnkpZbo/s1600/subtitles-webvtt.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzwfqEZwRzYEwVVlN5Syz-G6RWqzqlCseZraI1rqLglEY9LQ6ur38B3na5sgWdjcNF-Ys3r7UBoRi26FBchxlr7THsmexVHT3qSmUJX4sfam3okzNVVkXIVvgAlRO21LC6UpjkBnkpZbo/s1600/subtitles-webvtt.png" height="259" width="320" /></a></div>
We extended the number of SpokenData supported subtitles formats. From now on, the subtitles can also be downloaded in <a href="http://dev.w3.org/html5/webvtt/"><b>WebVTT</b></a>. Here is the complete list of currently supported subtitles formats:<br />
<br />
<ul>
<li>HTML</li>
<li>SRT</li>
<li>TRS</li>
<li>TXT</li>
<li>WebVTT</li>
</ul>
<div>
The subtitles can be downloaded from the web user interface or through API calls. </div>
<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-5208882467718170292014-05-14T07:04:00.001-07:002014-05-14T07:23:31.429-07:00There is no audio/video limitation on spokendata free plan now.<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8sihhsP7EK_odXn80CFbzJQoh-6PngTr2OxI55tNIOJV950B94gwknlMyxUChG490n1ygg5QeT0mRwZW7vuhadK4v1VkMtxF_AE6jqlCbWgU2J044wV-uvuyj-XR19vdPIn39D4YC0lo/s1600/limits.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8sihhsP7EK_odXn80CFbzJQoh-6PngTr2OxI55tNIOJV950B94gwknlMyxUChG490n1ygg5QeT0mRwZW7vuhadK4v1VkMtxF_AE6jqlCbWgU2J044wV-uvuyj-XR19vdPIn39D4YC0lo/s1600/limits.jpg" height="148" width="200" /></a>We changed the way how we limit the processed data for free accounts. To get a free SpokenData account, <a href="http://spokendata.com/register">register here</a>.<br />
<br />
We previously had a hard limit of processed data set to 15 minutes. All data you uploaded over this limit was trimmed and discarded. So if you uploaded 25 minutes of audio or video and wanted the automatic transcript for free, you got only 15 minutes long video/audio with the transcript in <a href="http://spokendata.com/features">your dashboard</a> later (after the processing finished).<br />
<br />
Currently, we do not limit your data upload. We just limit the length of the transcript we provide you for free. So if you upload 25 minute long video/audio, you will find the whole (25 minute long) video/audio in your dashboard. The generated transcript has the length limit set to 15 minutes, so you will see only the first 15 minutes of transcript for your data. But if you want, you can easily create the rest of the transcript for free yourself - this was not possible in the previous version because the video was trimmed.<br />
<br />
We hope you welcome this change.<br />
<br />
..and stay tuned. More interesting things are coming soon.. <br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-88681212482549323332014-04-30T05:29:00.000-07:002014-05-14T07:01:58.111-07:00Interactive audio waveform<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgraInHcVJqI6h1FwDG3Pcs-XKhk9nFP4rdkNnMZy6xJ1MX3dqtwdplviZJH6SsmzVVBAXhn3XBHdnh0je0yUqZXLnblbGv0AQvYWXF_E1QyCia3Ary8sh3dX6kiufVzPFcLMixYCsWxHw/s1600/news.jpg" height="101" width="200" /></a>From now on, when you play any recording, you will see an interactive waveform on the left of the transcription. The waveform is only displayed on larger screens with a minimum horizontal resolution of 1220px. The waveform can significantly help you detect moments of speech. When you click into it, the player seeks to that exact moment.<br />
<br />
This audio waveform viewer for the web can be downloaded from <a href="https://github.com/vdot/outwave.js">https://github.com/vdot/outwave.js</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNsMFkW8ZX5LLYxGc9sETjx-PTVs3JRL0s49G7p_cmAf4XkPUCMh7S7jvPgUnX0VLhWKP_C-Um6VNTn5O3GjYB-50LP5-l4LcsTTPoZrGXcYVYd1CMIHf63q-Kn8dkr_yXaWSNOBs3AhQ/s1600/waveform.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNsMFkW8ZX5LLYxGc9sETjx-PTVs3JRL0s49G7p_cmAf4XkPUCMh7S7jvPgUnX0VLhWKP_C-Um6VNTn5O3GjYB-50LP5-l4LcsTTPoZrGXcYVYd1CMIHf63q-Kn8dkr_yXaWSNOBs3AhQ/s1600/waveform.jpg" height="247" width="400" /></a></div>
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0tag:blogger.com,1999:blog-1970119719488733768.post-49087515256095668132014-03-31T02:41:00.000-07:002014-05-14T07:24:18.609-07:00What does the speaker segmentation technology<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl1cnXE2xMTOSKhcVKGiOZKX__OMHVgPmYQxGhLMWLFE8FrWDyKCThIhFHfLCTWpHTlWvYJyLgffwXaKcIDPU-K0NGo-3j9gDFOtdwMppTAutpYYYvafduTC0yBuSpzYAaVIH82xbc42Y/s1600/spkid.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl1cnXE2xMTOSKhcVKGiOZKX__OMHVgPmYQxGhLMWLFE8FrWDyKCThIhFHfLCTWpHTlWvYJyLgffwXaKcIDPU-K0NGo-3j9gDFOtdwMppTAutpYYYvafduTC0yBuSpzYAaVIH82xbc42Y/s1600/spkid.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl1cnXE2xMTOSKhcVKGiOZKX__OMHVgPmYQxGhLMWLFE8FrWDyKCThIhFHfLCTWpHTlWvYJyLgffwXaKcIDPU-K0NGo-3j9gDFOtdwMppTAutpYYYvafduTC0yBuSpzYAaVIH82xbc42Y/s1600/spkid.jpg" height="192" width="200" /></a>Speaker segmentation (diarization) is a <a href="http://blog.spokendata.com/2013/10/what-information-is-in-your-spokedata.html">speech technology</a> allowing you to segment audio (or video) into particular speakers. What is it good for? You can more easily identify speaker turns in a dialog while making speech transcript. <br />
<br />
Even if you do not directly need the speaker information, the speaker segmentation is very helpful for speech-to-text technology (STT). The STT technology contains unsupervised speaker adaptation module. This module takes parts of speech belongings to a particular speaker and adapts an acoustic model towards them. Adaptation of the model leads to more accurate speech transcript.<br />
<br />
The adaptation - even if it is called speaker adaptation - adapts the system to the whole acoustic channel. It consists of speaker's voice characteristics, room acoustics (echo), <a href="http://blog.spokendata.com/2014/01/what-is-difference-between-narrowband.html">microphone characteristics</a>, environment noise, etc.<br />
<br />
Speaker segmentation is theoretically independent on speaker, language and acoustic conditions. But - practically - it is dependent. The reason is, that it uses something called a universal background model (UBM). The UBM should model all speech, languages and acoustics of the world - theoretically. But you need to train it on some speaker labeled data - to learn how to distinguish among speakers. And it holds (as in other speech technologies) that the more far the data you process is from the training data, the worse accuracy you get.<br />
<a name='more'></a><br />
The worse accuracy shows as segmenting one speaker in different acoustic conditions as different speaker labels.<br />
<br />
The second important thing is, that the speaker segmentation needs prior knowledge of the number of speakers in the audio document. If you do not provide the information, there is some preset or estimated value. However, if you know this information, it is good to provide it.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh05c7A02bSNVfdrsdedo1uAUVp7NP4Hc3P2JfMz8fZUL3cmIb73ABByjw6xn3vKuDlx-Ha12FzFBFWcOlmordIeeyx29uAFHvvpzBCTLe5dXMhfr_wvPp85LG6wmWUabWPqUyeD7gXFcI/s1600/sdcom2v1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh05c7A02bSNVfdrsdedo1uAUVp7NP4Hc3P2JfMz8fZUL3cmIb73ABByjw6xn3vKuDlx-Ha12FzFBFWcOlmordIeeyx29uAFHvvpzBCTLe5dXMhfr_wvPp85LG6wmWUabWPqUyeD7gXFcI/s1600/sdcom2v1.png" height="227" width="320" /></a></div>
<br />
In our service, we <a href="http://spokendata.com/demo/start">automatically preset</a>:<br />
<ul>
<li>7 speakers in the speech-to-text engine for broadcast news. We expect more interviews and thus more speakers.</li>
<li>3 speakers in to other speech-to-text engines. We expect rather monologue or dialog in the recording.</li>
<li>10 speakers in the speaker segmentation mode (just <a href="http://blog.spokendata.com/2013/10/voice-activity-detection-where-is-speech.html">voice activity detector</a> and speaker segmentation without speech-to-text).</li>
</ul>
You can preset the number of speakers a priori in the <a href="http://spokendata.com/pricing">paid service</a>.<br />
<br />
As said at the beginning, lower accuracy is not necessarily "bad". Yes, it can be annoying in case you need accurate speaker turns in your speech transcript. On the other hand, segment particular speaker into more "sub-speakers" according to varying acoustic conditions (noise, environment, ...) can be helpful for automatic speech recognizer.<br />
<br />SuperLectureshttp://www.blogger.com/profile/11962779404564217923noreply@blogger.com0