Matt Peachey, VP & GM International, Pindrop
When it comes to accessing your own data, two things are paramount: security and an easy user experience. Voice command is an interface that’s seeping in to more and more everyday consumer technology. This extends to the global banking and finance sector where the ease of leveraging voice for authentication is widely used. This is primarily because it means there’s no requirement for a separate security interface, ultimately resulting in a simplified system and an improved user experience. However, for voice biometrics to be trusted and more widely used, it must be seamless, robust and secure.
It’s unfortunate that a large proportion of security systems tend to be obtrusive; take airport screening lines for example, that must be navigated in order to board a flight. It’s an added frustration because the vast majority of passengers are harmless, but have to pay the security toll so the few malicious ones can be detected. Perhaps the biggest promise of biometrics is seamless security. This is especially true of voice biometrics since voice command is increasingly becoming embedded part of consumer life.
The mechanics of voice biometric systems require users to go through an enrolment process so it can effectively ‘learn’ the unique ‘thumbprint’ of their voice. Enrolment can be active or passive. Active enrolment must be used with text-dependent systems. This is a manual process where the user is required to speak an agreed upon phrase. In addition to enrolment, this phrase must be spoken every time the user is authenticated. Therefore, text-dependent systems fail to make security a seamless experience.
On the other hand, passive enrolment occurs during the user’s normal interaction with the system. It is used by text-independent voice biometric systems. Instead, these systems are able to learn the user’s voice during normal speech and don’t require a specific phrase. As a result, text-independent systems are truly unobtrusive to the user because they’re almost completely invisible to them.
In order to be operable and effective, voice biometrics must be resistant to changes in noise. For instance, background noise interrupting a phone call. Anything from the ping of a microwave, whir of an air conditioner, a radio or children talking that could be picked up by a microphone. Voice biometric systems must also be adaptable to the different channels users decide to access them on, and changes in the users’ accent which naturally occur over time. If a voice biometric system is not robust to any of these potential complications, authentication during enrolment may fail.
Voice biometric systems should also be robust to whichever communication channel the consumer chooses to use. For example, it should be possible to enrol a user calling from their landline and authenticate them on another occasion when they call from their mobile phone. Likewise, someone other than the user calling from their device should fail authentication because their voice is different and independent of the communication channel. Thus, voice biometric systems need to minimise learning channel artefacts.
Given the fact that the human voice changes as we age, voice biometric authentication that cannot understand and adapt to this, will fail. In order for voice biometric systems to be robust from aging, they must continually learn and adapt to the user’s ever changing voice. However, they must be careful not to adapt too easily and learn from the voice of an imposter.
Something at the forefront of consumer’s minds will be around what to do if it’s compromised. The beauty of this type of authentication is that unlike passwords, your voice cannot be changed. However, one way an attacker could steal a user’s voice would be to use a recording. While an attacker could record the user directly, websites like YouTube have recorded audio that could be used for this purpose. In the case of a bank call centre, the attacker could simply replay part of the audio at the beginning of the call during authentication, for example during IVR navigation and speak with their normal voice when connected to an agent.
An attacker could also use speech synthesis to impersonate a user. Given enough audio, modern systems can build a voice that sounds very similar to the person being modelled. This cloned voice could not only be used during authentication, but to also carry on a conversation with the agent or system, potentially accessing or compromising more sensitive data.
Lastly, the attributes that are extracted from the user’s voice for enrolment and authentication can be stolen. They typically consist of a list of floating point numbers, calculated when a user’s speech is analysed. For instance, if the attributes are extracted on a user’s device, an attacker could steal the attributes, for example through a compromised mobile phone and then inject them into a new session. The stolen attributes would then be used for authentication instead of those derived from the attacker’s voice.
Above all, for voice biometrics to be trusted and widely adopted, it must be seamless, robust and secure. However, when introducing new technologies, it is essential that all angles are covered. Without multiple layers of security covering all channels, from face-to-face verification, to online and over the telephone, fraudsters are able to manipulate particular points of exposure. For instance, biometrics is one way to solve the credential recovery issue, but it cannot detect fraud on its own.
More advanced technology that builds upon some of the security validation foundations of voice biometrics, but introduces multi-factor authentication is Phoneprinting™. This solution identifies specific components about each call such as the location a call is coming from, the device, whether it’s a mobile or landline and whether the phone has been used to call the company before. Combined, this can aid in detecting fraudulent activity before it becomes an issue, a great example of multi-factor authentication.