Tuesday, November 11, 2014

SpokenData API - Search in Speech

SpokenData API has a new function that enables users to search in recording transcriptions. This means that you can quickly get a list of captions matching the search query with their start and end time, caption content and speaker identity. The search can be performed either in all user recordings or in a list of selected recordings.

An example of a basic SpokenData search API call can be:
http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=student

It simply means to search for occurrences of student in all recording transcriptions of the DEMO account.

The returned XML shows the elapsed time for parsing the search query and for performing the search. As the number of results can be very high, the search API call supports paging. By default, the maximum number of results per page is set to 10. In the output XML, there are 2 types of results - recordings and captions. Each has different paging.

Recordings
The value of number_of_occurrences shows the number of recording captions matching the search query. Recordings are sorted by number of occurrences (from higher to lower). 

Recordings paging:
  • recordingPageSize = 10 by default
  • recordingPageNumber = 0 by default
Captions
Every caption has several values - start and end time of caption, speaker identity and the caption content. Captions are ordered by the caption start time (from lower to higher).

Captions paging:
  • captionPageSize = 10 by default
  • captionPageNumber = 0 by default
Caption results can be omitted by adding a parameter recordingListOnly=1 to the search API call.

Search in selected recordings
The optional parameter recordingId can specify which recordings are selected for the search. The value of this parameter is a list of recording IDs delimited by a comma (e.g.: recordingId=845,446).

An example of an advanced SpokenData API search call:
http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/search?q=what&recordingId=846&captionPageSize=2&captionPageNumber=1

This API call returns the third and fourth caption result (the caption page size is 2 and the page number is 1) and the search is performed only in transcription of recording with id=846.

More information on search and other API calls is shown after signing in and enabling SpokenData API.

No comments:

Post a Comment