Data is everywhere in today’s world. There’s more and more of it available and to ignore it would be to fall behind the curve and ultimately get overtaken by competitors who make more strategic use of it. Every set of data and every model will naturally embed certain assumptions. Therefore, exposing those assumptions in order to allow for better decisions is valuable. In any event, the journey starts with a clear recognition of what exactly our data is saying.
Data is everywhere in today’s world. There’s more and more of it available and to ignore it would be to fall behind the curve and ultimately get overtaken by competitors who make more strategic use of it. Every set of data and every model will naturally embed certain assumptions. Therefore, exposing those assumptions in order to allow for better decisions is valuable. In any event, the journey starts with a clear recognition of what exactly our data is saying.
I worked for six years in auto insurance pricing, an industry that is about as data driven as it gets, and from my standpoint, there are three glaring reasons to be sure to fully understand what is under the proverbial hood when making decisions based on analytics.
The aforementioned outlines the harmful consequences of misleading data, but what exactly are the warning signs that there may be a problem? In the insurance world, we would test our models by running datasets through them that deliberately had certain representations of race, gender, income and education levels, zip codes, and so on to find out how the model handled those characteristics. Even if there was no malicious intent when building a model, underlying bias in data on which a model is trained can lead to problematic model outputs. As noted earlier, the goal is an understanding of what story the data is telling in order to make the right ethical and business decisions.
The process can also start prior to collecting new data. Pursuing diverse perspectives can ensure you are getting a representative sample. Iterative testing and continual validation are key to exposing flawed assumptions early on so you can correct and fine tune as you go.
For more on where data usage can go awry, take a look at Ethical OS’s 8 Risk Zones.
Investigating and identifying data biases and/or ethical concerns is really only half the battle; the question of what to do about it remains. The answer depends on context. Sometimes, when data is only being used for informational purposes, it is enough just to understand it. In other cases, there may be a legal or moral obligation to adjust a model. This is not just about risk or illegality, though; a more accurate model will give a more accurate picture of the available data.
Have additional thoughts on data and understanding its implications? Let us know!
Cynefin is a sensemaking framework that helps teams conceptualize different types of problems and agree...
Two major announcements hit headlines in June 2019: Salesforce acquired Tableau Software and Google acquired...
Four Reflections from Judging the 2019 MERL Tech Dashboards Competition
Data visualization (viz) has come a long way in our MERL Tech community. Four years...