Understand What your Data is Saying
Data is everywhere in today’s world. There’s more and more of it available and to ignore it would be to fall behind the curve and ultimately get overtaken by competitors who make more strategic use of it. Every set of data and every model will naturally embed certain assumptions. Therefore, exposing those assumptions in order to allow for better decisions is valuable. In any event, the journey starts with a clear recognition of what exactly our data is saying.
Why Understanding Matters
I worked for six years in auto insurance pricing, an industry that is about as data driven as it gets, and from my standpoint, there are three glaring reasons to be sure to fully understand what is under the proverbial hood when making decisions based on analytics.
- It is the right thing to do. Companies have a moral and social responsibility to treat people fairly, which requires companies to know whether people are being treated fairly. This means that whenever data is being collected and people’s livelihoods are being impacted, the way the data is being used must be fully known.
- It is the right business move. Information breeds learning which breeds wiser, more prudent actions than those that might otherwise be taken without any rhyme or reason. Successful organizations strive above all else to understand the climate in which they operate in order to get a competitive edge. Information, though, is only useful if it is accurate. Data biases and black box models obscure the truth underneath the data and – ethical considerations aside – lead business conclusions astray. Note the ties between this and the first point above: ethical practices are good business practices.
- There are legal considerations, too. Auto insurance is highly regulated so any action we took that affected the price someone was charged had to be fully explainable. Similarly, as data collection and usage expands, corresponding regulations are beginning to emerge. The EU in 2016 adopted its General Data Protection Regulation, determining that companies have a responsibility to be thoughtful and careful about how they collect data, how they use data, and how they respond when there is a data breach. Taken directly from the GDPR, the European Commission (executive branch of the EU) may “provide mutual assistance in the enforcement of legislation” to other countries, meaning the regulations could be enforced outside of Europe.
The aforementioned outlines the harmful consequences of misleading data, but what exactly are the warning signs that there may be a problem? In the insurance world, we would test our models by running datasets through them that deliberately had certain representations of race, gender, income and education levels, zip codes, and so on to find out how the model handled those characteristics. Even if there was no malicious intent when building a model, underlying bias in data on which a model is trained can lead to problematic model outputs. As noted earlier, the goal is an understanding of what story the data is telling in order to make the right ethical and business decisions.
The process can also start prior to collecting new data. Pursuing diverse perspectives can ensure you are getting a representative sample. Iterative testing and continual validation are key to exposing flawed assumptions early on so you can correct and fine tune as you go.
For more on where data usage can go awry, take a look at Ethical OS’s 8 Risk Zones.
Investigating and identifying data biases and/or ethical concerns is really only half the battle; the question of what to do about it remains. The answer depends on context. Sometimes, when data is only being used for informational purposes, it is enough just to understand it. In other cases, there may be a legal or moral obligation to adjust a model. This is not just about risk or illegality, though; a more accurate model will give a more accurate picture of the available data.
Have additional thoughts on data and understanding its implications? Let us know!