Maybe you are thinking... "What information is in my spoken data?"
Well, lots of information! To have some idea, look at the following image.
You and your speech are in the middle. Now, let us go clockwise and I will briefly introduce you particular information hidden in your speech.
So on the 12th hour, there is speaker identity. Only 10 second long recording is enough to identify you by voice.
Gender identification is the next. It is the most simple type of classification of voice into 2 classes.
On the 3rd hour, there is speech transcript. A technology, which can convert speech into text. Keyword spotting and speech search can be considered as a part of this technology.
Next one is age estimation. To estimate the age might be helpful in some security applications.
Communication channel is usually not that important, but the information through which codecs or networks the voice recording was transmitted is there! Together with the type of the device.
Do not forget, that the recording does not contain only speech. There is also lots of noises, tones or music. All these noises can make your speech less intelligible. This technology is called Voice Activity Detection.
And finally, there is the language identity you are speaking. Similarly to your speaker identity, 10 seconds of your speech is enough to estimate your spoken language.
So that is at least some information hidden in your spoken data recordings.
No comments:
Post a Comment