We have modified the annotation xml file structure. Now it is a way easier to parse. You can get this file through SpokenData API. See this short example:
<segment>
<start>63.25</start>
<end>65.40</end>
<speaker>A</speaker>
<text>Hello, this is the first caption</text>
</segment>
<segment>
<start>72.92</start>
<end>74.49</end>
<speaker>B</speaker>
<text>and here comes the second one</text>
</segment>
Start and end tags represent the subtitle appearance time in seconds. We store values with precision of 2 decimal places. Speaker tag identifies the person who is speaking. It can keep whatever alphanumerical value. And the text tag serves for storing the subtitle content.
You can see a live example from the SpokenData demo here:
http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/846/annotation.xmlYou can see a live example from the SpokenData demo here: