Voice Biometrics: ‘Who’s that speaking?’


Osama bin Laden was on the FBI’s most wanted list for nearly twelve years, and for most of that time (more precisely, from 2001 to 2011) he would periodically release video or audio recordings that presumably were intended to encourage his supporters and taunt his enemies. In the case of the videos, there was very little doubt as to whether it was actually bin Laden who was being shown; the question that bedeviled his pursuers was more where the video was filmed. In the case of the audio tapes, on the other hand, it often took the Americans several days before they could confirm that the voice in question really belonged to bin Laden; and in a few instances, it was not possible to confidently make a positive identification.

What was going on behind the scenes during those few days of suspense is called, in technical terms, speaker recognition, which is defined as the computing task of validating someone’s (claimed) identity using characteristics extracted from his or her voice. This is quite different, notice, than speech recognition, where the task is to have a system correctly recognize the content, or the actual words that were spoken, not who said them. Speaker recognition is thus a branch of the broad field of biometrics, the goal of which is to uniquely identify persons based on their intrinsic physical or behavioral traits. The detailed techniques that are employed in voice biometrics need not concern us here; suffice it to say that an imprint, or a model, can now be made of our voice that identifies us as uniquely and as reliably as our finger prints. What is more, this branch of speech technology is now finding more and more applications in our daily life.

Telephone banking offers an interesting case in point. While some financial institutions are content to have their clients punch in their username and password on the telephone keypad,  a few offer an additional level of security by asking the user identify himself vocally and then answer one or more screening questions. The performance of speaker recognition technology has now reached a level where the overwhelming majority of callers can be correctly identified, with only a small proportion incorrectly rejected. The following would be a good test for this new technology:

“Osama bin Laden here. I want you to transfer the entire contents of my account to …”