Speech
Difficulties Encountered by Automatic Speech Recognition Systems

The performance of an ASR system degrades due to the following reasons:

  • Significant variations in the way different words of a language are pronounced by users from different regions and dialectal backgrounds.
  • The variability of the voice characteristics from one speaker to another.
  • Distortion introduced into the recorded speech due to the telephone/microphone and transmission channel characteristics in wire-line telephony, distortion due to reverberation (multiple echoes) in case of speakerphones and other hands free devices, distortion due to compression, packet loss and multi-path propagation in cellular telephony and distortion due to packet loss in Voice over IP networks.
  • Degradation of the recorded speech due to the ambient noise (e.g. traffic noise or noise in an office environment).
  • When voice barge-in is enabled, there is a certain amount of echo in the telephone line, which needs to be cancelled (called echo-cancellation), before the recorded speech data can be useful for ASR.

    In Sub-Section 5 of this article, we explain how each of the above problems have been addressed in the ASR technology developed by SST.

 

 
Search  Go
Home | About Us | Products | Services