|
The
performance of an ASR system degrades due to the following
reasons:
-
Significant
variations in the way different words of a language are pronounced
by users from different regions and dialectal backgrounds.
-
The
variability of the voice characteristics from one speaker
to another.
-
Distortion introduced into the recorded speech due to the
telephone/microphone and transmission channel characteristics
in wire-line telephony, distortion due to reverberation (multiple
echoes) in case of speakerphones and other hands free devices,
distortion due to compression, packet loss and multi-path
propagation in cellular telephony and distortion due to packet
loss in Voice over IP networks.
-
Degradation of the recorded speech due to the ambient noise
(e.g. traffic noise or noise in an office environment).
-
When voice barge-in is enabled, there is a certain amount
of echo in the telephone line, which needs to be cancelled
(called echo-cancellation), before the recorded speech data
can be useful for ASR.
In Sub-Section 5 of this article, we explain how each of the
above problems have been addressed in the ASR technology developed
by SST.
|