|
SST has developed the ASR modeling algorithms in house and
owns the IPR. Hence, the ASR models can be trained for any
language. The specific problems that are mentioned in Sub-Section
2 of this article, are all addressed by SST's ASR algorithms
and the methods used to train the ASR systems. Some of the
techniques used to counter the problems are:
-
Speech data of speakers from different geographical locations
is collected for training the speech models using clustering
techniques, to achieve robustness against pronunciation variations.
-
Speaker normalization is done to achieve accurate speaker
independent recognition.
Channel
normalization techniques and noise robust algorithms are used
to combat the degradations caused by the telephone channel.
Echo cancellation algorithms are developed in house
and ported onto the computer telephony interface cards, which
are also developed in house, for a better match between the
ASR, the TTS, the echo cancellation algorithms and the telephony
interface cards.
-
In a voice barge-in situation,
when a user interrupts the system prompt, the SLI system detects
the user's speech and stops the play back of its own prompt
immediately. But at the same time, the SLI system should be
insensitive to background (ambient) noise and the speech of
people talking in the background. Otherwise, if there are
false alarms due to the background noise or speech, there
will be frequent drop-outs in the system prompts, causing
a lot of irritation to the user.
-
SST has designed Spoken Language
Interfaces for several IVR systems, both ASR based and non
ASR based and there are live systems used by thousands of
people everyday (see projects section on the web site).
-
A Call Completion Rate
(CCR) parameter is used in all of SST's SLI systems to measure
the percentage of calls successfully completed by the SLI
system. If the CCR is not satisfactory, the user interface
is continuously modified to improve the CCR. The user interface
design needs to take care of the needs of both the novice
users and the experienced users, which is a conflicting requirement.
For example, a novice user needs as much help as possible
to use the system, which requires prompts which are relatively
longer. On the other hand, an experienced user wants to finish
his work as soon as possible and terminate the call and would
naturally get irritated by prompts which are long. This is
where voice barge-in feature would be extremely useful, so
that the experienced user can interrupt the system prompts
and get his work done.
|