Concentrix and the Case For Machine Learning and Tacit Domain Knowledge (#MachineLearning #Spark #Hadoop)

The initial story ran on 20th February in the Independent newspaper, “People in need at risk of losing tax credits after being wrongly accused of cheating“(1). Now a story like this is going to an emotive issue at the best of times as it involves low incomes and the potential removal of money.

What it also illustrates is two common problems in process. The lack of tacit domain knowledge and a lack of refined process.

hmrc

The Case For Tacit Domain Knowledge

First of all let me be clear, I’m not aware of how Concentrix currently do this decision making. So I’ll guess my way around it from what the headlines are saying.

One of the arguments from staff at Concentrix (allegedly I may add) is:

“Staff at Concentrix’s office in Belfast, where the contract is based, have told The Independent that they haven’t been given enough training to differentiate between genuine claims for tax credits and fraudulent ones.”

Tacit domain knowledge is, according to (Blandford & Rugg, 2002) is, ‘knowledge which is not accessible to introspection via any elicitation technique.’ And this mirrors what the staff are saying in not so many words, it boils down to one single issue, “experience”.

“The reason for not being able to gain easy access to this deep level of knowledge is because it is what we humans call ‘experience’, something which we gain through time and exposure to different environments and situations. It is precisely the experience factor that creates experts in certain fields around specific subjects or subject matter.”(2)

Training doesn’t lead to experience it only acts as a baseline to what should and should not happen. With something as complex as tax credit claims then tacit domain knowledge is going to be key to the decision making process. So at the start of the project, before staff get to use any system, the domain experts should be able to define the process and the expected decisions any system should be making. With something like the tax credit system there are multiple sources of data on which a decision is made.

Even with tacit domain knowledge not every case handled is going to pass through a system without error. Some will be needed to be routed to a domain expert for analysis for a final yes/no on the claimant. The key to issues like this is how the knowledge is put back into the system to enable the team to make better decisions in the future.

A Case For Machine Learning?

Well I believe there is here, what I don’t believe though that it’s a black and white solution. Here machine learning should aid the process and make a recommendation on the final decision, ultimately though it’s up to a qualified employee voice to make the final call based on the data presented.

Since the introduction of tax credits in April 2003 there will be a trail of data and decisions, therefore there is historical data to train a system to make decisions. Like I say, this sort of system shouldn’t be making the final decision but merely aiding those who do based on previous cases.

With this sort of scenario it’s going to based on claimant documents so scanning and text mining is going to be playing a key part. Couple this with a decision trees and you have the basis for a decision making process.

The most important part of the mix though is not to have the system guess when it can’t decide but rather making the domain expert decide and thus enabling the system to learn from the new experience. This is important especially in borderline cases where the final decision is not clear.

The advantage of using machine learning in the training stage, along with a domain expert, is that at a guess 66% of claims would fall within the average (i.e. 1 standard deviation from the mean) it’s the outlier cases that take time on analysis.

Human Factors

Let’s not beat around the bush here. This is call centre staff being put under pressure to perform.

“They also say they are being encouraged to hit a target of making 20 decisions a day, or about three an hour, on whether to stop, amend or leave a tax claim unchanged.”

Automation can help in such cases and a trained system can certainly lead to better decision making once the machine learning training has been performed and evaluated.

Balancing the Cost Of Machine Learning

Machine Learning systems don’t come cheap and they also take time to develop, train and refine before being let out in the real world. The implementation of process also takes time and money to implement. The costs should only strengthen the benefits of the final process. I say all this but there is a “but”…

“The company is not paid on the number of letters issued, but on the basis of savings to public finances arising from correcting tax credits claims that are incorrect.”

It seems to me though that designing a system that would decide based on prior evidence and training may not work to the advantage of Concentrix (or any other company doing this work under these contractual measures) as the payment is based on money saved to the public purse. It would make sense for such a system NOT to exist in this case, this is a real shame.

Summary

There is a strong case here for a process driven algorithmic system to deliver aid in the final decision making process. Removing part of the analytical side away from staff means they can spend time with the customer, final decisions that require more analysis then are sent to a domain expert, the knowledge gained from that case is then fed back into the system for learning. The more cases the better the learning patterns.

Issues do arise though when the bottom line is based on money saved and not the amount of customers processed. This leads to rules being relaxed in the favour of the service company (though there is no evidence for that here may I add).

References

1 – People in need at risk of losing tax credits after being wrongly accused of cheating – The Independent 20th February 2015 – http://www.independent.co.uk/news/uk/home-news/people-in-need-at-risk-of-losing-tax-credits-after-being-wrongly-accused-of-cheating-10060745.html

2 – “Towards a Methodology to Elicit Tacit Domain Knowledge From Users” – http://www.ijikm.org/Volume2/IJIKMv2p179-193Friedrich328.pdf – Wernher R. Friedrich and John A. van der Poll

 


Jason Bell is a Data/Hadoop consultant based in Northern Ireland but helps companies globally with various BigData, Hadoop and Spark projects. He also offers training on Hadoop, the Hadoop Ecosystem and Spark to developers and anyone interested in what these technologies can do. He’s also the author of “Machine Learning – Hands On For Developers and Technical Professionals“.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: