Editorial & Advertiser disclosure

Global Banking & Finance Review® is an online platform offering news, analysis, and opinion on the latest trends, developments, and innovations in the banking and finance industry worldwide. The platform covers a diverse range of topics, including banking, insurance, investment, wealth management, fintech, and regulatory issues. The website publishes news, press releases, opinion and advertorials on various financial organizations, products and services which are commissioned from various Companies, Organizations, PR agencies, Bloggers etc. These commissioned articles are commercial in nature. This is not to be considered as financial advice and should be considered only for information purposes. It does not reflect the views or opinion of our website and is not to be considered an endorsement or a recommendation. We cannot guarantee the accuracy or applicability of any information provided with respect to your individual or personal circumstances. Please seek Professional advice from a qualified professional before making any financial decisions. We link to various third-party websites, affiliate sales networks, and to our advertising partners websites. When you view or click on certain links available on our articles, our partners may compensate us for displaying the content to you or make a purchase or fill a form. This will not incur any additional charges to you. To make things simpler for you to identity or distinguish advertised or sponsored articles or links, you may consider all articles or links hosted on our site as a commercial article placement. We will not be responsible for any loss you may suffer as a result of any omission or inaccuracy on the website.

Home > Technology > IS VOICE BIOMETRICS SUFFICIENT AS THE UK’S COMMONPLACE AUTHENTICATOR?

IS VOICE BIOMETRICS SUFFICIENT AS THE UK’S COMMONPLACE AUTHENTICATOR?

Published by Gbaf News

Posted on January 5, 2017

10 min read

Last updated: January 22, 2026

GAC reprimands Stellantis over joint venture plans in China - Global Banking & Finance Review — The image illustrates the tense relationship between Guangzhou Automobile Group (GAC) and Stellantis over the proposed increase in stake for their joint venture, highlighting the complexities of foreign investment in China's automotive market.

Matt Peachey, VP & GM International, Pindrop

When it comes to accessing your own data, two things are paramount: security and an easy user experience. Voice command is an interface that’s seeping in to more and more everyday consumer technology. This extends to the global banking and finance sector where the ease of leveraging voice for authentication is widely used. This is primarily because it means there’s no requirement for a separate security interface, ultimately resulting in a simplified system and an improved user experience. However, for voice biometrics to be trusted and more widely used, it must be seamless, robust and secure.

Seamless

Matt Peachey

Matt Peachey

It’s unfortunate that a large proportion of security systems tend to be obtrusive; take airport screening lines for example, that must be navigated in order to board a flight. It’s an added frustration because the vast majority of passengers are harmless, but have to pay the security toll so the few malicious ones can be detected. Perhaps the biggest promise of biometrics is seamless security. This is especially true of voice biometrics since voice command is increasingly becoming embedded part of consumer life.

The mechanics of voice biometric systems require users to go through an enrolment process so it can effectively ‘learn’ the unique ‘thumbprint’ of their voice. Enrolment can be active or passive. Active enrolment must be used with text-dependent systems. This is a manual process where the user is required to speak an agreed upon phrase. In addition to enrolment, this phrase must be spoken every time the user is authenticated. Therefore, text-dependent systems fail to make security a seamless experience.

On the other hand, passive enrolment occurs during the user’s normal interaction with the system. It is used by text-independent voice biometric systems. Instead, these systems are able to learn the user’s voice during normal speech and don’t require a specific phrase. As a result, text-independent systems are truly unobtrusive to the user because they’re almost completely invisible to them.

Robust

In order to be operable and effective, voice biometrics must be resistant to changes in noise. For instance, background noise interrupting a phone call. Anything from the ping of a microwave, whir of an air conditioner, a radio or children talking that could be picked up by a microphone. Voice biometric systems must also be adaptable to the different channels users decide to access them on, and changes in the users’ accent which naturally occur over time. If a voice biometric system is not robust to any of these potential complications, authentication during enrolment may fail.

Voice biometric systems should also be robust to whichever communication channel the consumer chooses to use. For example, it should be possible to enrol a user calling from their landline and authenticate them on another occasion when they call from their mobile phone. Likewise, someone other than the user calling from their device should fail authentication because their voice is different and independent of the communication channel. Thus, voice biometric systems need to minimise learning channel artefacts.

Given the fact that the human voice changes as we age, voice biometric authentication that cannot understand and adapt to this, will fail. In order for voice biometric systems to be robust from aging, they must continually learn and adapt to the user’s ever changing voice. However, they must be careful not to adapt too easily and learn from the voice of an imposter.

Secure

Something at the forefront of consumer’s minds will be around what to do if it’s compromised. The beauty of this type of authentication is that unlike passwords, your voice cannot be changed. However, one way an attacker could steal a user’s voice would be to use a recording. While an attacker could record the user directly, websites like YouTube have recorded audio that could be used for this purpose. In the case of a bank call centre, the attacker could simply replay part of the audio at the beginning of the call during authentication, for example during IVR navigation and speak with their normal voice when connected to an agent.

An attacker could also use speech synthesis to impersonate a user. Given enough audio, modern systems can build a voice that sounds very similar to the person being modelled. This cloned voice could not only be used during authentication, but to also carry on a conversation with the agent or system, potentially accessing or compromising more sensitive data.

Lastly, the attributes that are extracted from the user’s voice for enrolment and authentication can be stolen. They typically consist of a list of floating point numbers, calculated when a user’s speech is analysed. For instance, if the attributes are extracted on a user’s device, an attacker could steal the attributes, for example through a compromised mobile phone and then inject them into a new session. The stolen attributes would then be used for authentication instead of those derived from the attacker’s voice.

Above all, for voice biometrics to be trusted and widely adopted, it must be seamless, robust and secure. However, when introducing new technologies, it is essential that all angles are covered. Without multiple layers of security covering all channels, from face-to-face verification, to online and over the telephone, fraudsters are able to manipulate particular points of exposure. For instance, biometrics is one way to solve the credential recovery issue, but it cannot detect fraud on its own.

More advanced technology that builds upon some of the security validation foundations of voice biometrics, but introduces multi-factor authentication is Phoneprinting™. This solution identifies specific components about each call such as the location a call is coming from, the device, whether it’s a mobile or landline and whether the phone has been used to call the company before. Combined, this can aid in detecting fraudulent activity before it becomes an issue, a great example of multi-factor authentication.

Matt Peachey, VP & GM International, Pindrop

When it comes to accessing your own data, two things are paramount: security and an easy user experience. Voice command is an interface that’s seeping in to more and more everyday consumer technology. This extends to the global banking and finance sector where the ease of leveraging voice for authentication is widely used. This is primarily because it means there’s no requirement for a separate security interface, ultimately resulting in a simplified system and an improved user experience. However, for voice biometrics to be trusted and more widely used, it must be seamless, robust and secure.

Seamless

Matt Peachey

Matt Peachey

It’s unfortunate that a large proportion of security systems tend to be obtrusive; take airport screening lines for example, that must be navigated in order to board a flight. It’s an added frustration because the vast majority of passengers are harmless, but have to pay the security toll so the few malicious ones can be detected. Perhaps the biggest promise of biometrics is seamless security. This is especially true of voice biometrics since voice command is increasingly becoming embedded part of consumer life.

The mechanics of voice biometric systems require users to go through an enrolment process so it can effectively ‘learn’ the unique ‘thumbprint’ of their voice. Enrolment can be active or passive. Active enrolment must be used with text-dependent systems. This is a manual process where the user is required to speak an agreed upon phrase. In addition to enrolment, this phrase must be spoken every time the user is authenticated. Therefore, text-dependent systems fail to make security a seamless experience.

On the other hand, passive enrolment occurs during the user’s normal interaction with the system. It is used by text-independent voice biometric systems. Instead, these systems are able to learn the user’s voice during normal speech and don’t require a specific phrase. As a result, text-independent systems are truly unobtrusive to the user because they’re almost completely invisible to them.

Robust

In order to be operable and effective, voice biometrics must be resistant to changes in noise. For instance, background noise interrupting a phone call. Anything from the ping of a microwave, whir of an air conditioner, a radio or children talking that could be picked up by a microphone. Voice biometric systems must also be adaptable to the different channels users decide to access them on, and changes in the users’ accent which naturally occur over time. If a voice biometric system is not robust to any of these potential complications, authentication during enrolment may fail.

Voice biometric systems should also be robust to whichever communication channel the consumer chooses to use. For example, it should be possible to enrol a user calling from their landline and authenticate them on another occasion when they call from their mobile phone. Likewise, someone other than the user calling from their device should fail authentication because their voice is different and independent of the communication channel. Thus, voice biometric systems need to minimise learning channel artefacts.

Given the fact that the human voice changes as we age, voice biometric authentication that cannot understand and adapt to this, will fail. In order for voice biometric systems to be robust from aging, they must continually learn and adapt to the user’s ever changing voice. However, they must be careful not to adapt too easily and learn from the voice of an imposter.

Secure

Something at the forefront of consumer’s minds will be around what to do if it’s compromised. The beauty of this type of authentication is that unlike passwords, your voice cannot be changed. However, one way an attacker could steal a user’s voice would be to use a recording. While an attacker could record the user directly, websites like YouTube have recorded audio that could be used for this purpose. In the case of a bank call centre, the attacker could simply replay part of the audio at the beginning of the call during authentication, for example during IVR navigation and speak with their normal voice when connected to an agent.

An attacker could also use speech synthesis to impersonate a user. Given enough audio, modern systems can build a voice that sounds very similar to the person being modelled. This cloned voice could not only be used during authentication, but to also carry on a conversation with the agent or system, potentially accessing or compromising more sensitive data.

Lastly, the attributes that are extracted from the user’s voice for enrolment and authentication can be stolen. They typically consist of a list of floating point numbers, calculated when a user’s speech is analysed. For instance, if the attributes are extracted on a user’s device, an attacker could steal the attributes, for example through a compromised mobile phone and then inject them into a new session. The stolen attributes would then be used for authentication instead of those derived from the attacker’s voice.

Above all, for voice biometrics to be trusted and widely adopted, it must be seamless, robust and secure. However, when introducing new technologies, it is essential that all angles are covered. Without multiple layers of security covering all channels, from face-to-face verification, to online and over the telephone, fraudsters are able to manipulate particular points of exposure. For instance, biometrics is one way to solve the credential recovery issue, but it cannot detect fraud on its own.

More advanced technology that builds upon some of the security validation foundations of voice biometrics, but introduces multi-factor authentication is Phoneprinting™. This solution identifies specific components about each call such as the location a call is coming from, the device, whether it’s a mobile or landline and whether the phone has been used to call the company before. Combined, this can aid in detecting fraudulent activity before it becomes an issue, a great example of multi-factor authentication.