26th November 2008

Paul Weald shows us how the customer experience, application design and technology implementation all need to be aligned for speech self-service to be accepted.
We all understand that speech recognition can have a dramatic impact on call centre resourcing needs. Consider an operation that has an average call duration of 5 minutes, with the first part of the call requiring you to verify the identification of the caller. This will be done by collecting some structured information – for example name, address, postcode and date of birth, etc.
This combination of verification data is suitable for a speech application to automate and is consistent with many different types of call. Once the call purpose has been identified, then the call can be routed as appropriate. If this application were to save just 30 seconds in the agent’s call-handling time then there would be a 10% reduction in resource needs.
So why haven’t speech applications taken off?
Customer impact
The key question to ask is will the customer choose to use the self-service application as a genuine alternative to talking to an agent? The key failings in self-service customer experience can arise from:
A good reference point to understand the customer experience of telephone self-service is to benchmark the telephone activity with an equivalent process online. Identify equivalent KPIs – such as conversion rate and session time and then compare the performance both online and through telephone self-service.
Application design
Studies have shown that customers can only easily navigate a menu with 3 options. Above this number confusion begins to set in. IVR designers therefore have to limit the number of services that are offered to customers, otherwise the customer experience degrades. Offering more than two layers of menu will end up confusing the customer – they may end up several layers down and realise they’ve taken the wrong route; it’s a bit like a maze. Speech recognition applications need to model the different ways customers request the services you offer so rather than having multiple layers you only need one, leaving customers the freedom to ask directly for what they want: “I’d like to check my balance”, “Could you send me a new PIN”, etc.
With telephone self-service, the usual test of recognition accuracy is that a better than 95% of users complete their transactions, or convey their information accurately. For example, consider an application that needs to identify a user’s postal address. The usual approach in a call centre is for an agent to collect the postcode and house number from the caller, and then use postcode look-up software to find the full address. Speech recognition applications can increase the accuracy of their postcode recognition algorithms by asking a supplementary question of the caller, such as their road name. This means the automated application can combine its interpretation of the alphanumeric postcode (for instance, saying “RG40” might be variously interpreted by the technology as “RG40”, “RT40”, or “RG14”) with a road name (say, “Castle Road”) to produce a unique match with one of the three potential postcodes. Hence, the system would respond with something like, “I think your address is 17 Castle Road, Wokingham. Is that correct?”
One other subtle factor is for the application to train the user in how to provide their information consistently. Consider a financial application which has to capture fields with a nil amount. Think for a moment about all the different ways in which a caller could describe this – “nil”, “none”, “zero”, “naught”, “nothing”, “not applicable”, etc. Such a simple input could have multiple ways that the caller could provide the information. The best way to drive up the accuracy of the application is not to invest time and money in developing the application to accept such a range of inputs but rather to train the user how to state the ‘nil’ input.
One client that we worked with solved this problem by playing the caller the following message as part of the introduction to the application “When you tell us an amount of money we need you to say numbers naturally, for example, two hundred and fifty pounds. If the amount to give is 0, say zero. After you have given us an amount of money we will repeat it back to you. If we have got it wrong just say NO.” A very high level of accuracy resulted to the mutual benefit of both the caller and application designers!
The technology doesn’t work!
Let’s break this issue down into its component parts:
With accents playing such a very important role in understanding the speaker, it is important to select the right recognition engine for the task in hand and if necessary ensure the speech application developer tunes the system to take account of the regional spread of your customers. This is achieved through the sampling of literally thousands of conversations. This is a task for experts and you need to be sure that you have selected a provider with this depth of experience.Also if the user is struggling to complete the task at hand then it is important that they have the ability to transfer quickly to an agent. The technology solution should inform the agent (through either a CTI screen pop or ‘agent whisper’ feature) as to the steps that the user has taken within the application. This is particularly important if the identity of the user has already been verified by the application. There is nothing more frustrating to the customer than having to explain to the agent both who they are as well as what they have ‘failed’ to do within the self-service application.
The appropriate role for self-service
The good news is that there are some well proven areas where self service really does overcome these issues:
As lunar eclipses are not yet an everyday occurrence, then perhaps this explains why so few organisations are enjoying the cost savings benefits that automated speech technology can offer.
Paul Weald is director of the consultancy RXPerience Limited