On April 19th, 2010, IATA, the chief trade group representing airlines, issued a press release which states in part: “IATA criticized Europe’s unique methodology of closing airspace based on theoretical modeling of the ash cloud. ‘This means that governments have not taken their responsibility to make clear decisions based on facts …’ said [Giovanni] Bisignani [IATA’s Director General and CEO].”
This statement leaves the unfortunate impression that decisions based on theoretical models are somehow suspect and of little value. The better, but less soundbite-friendly, question should have been whether the model adequately predicts the risk. To make useful predictions, a model needs to do two things:
account for all key factors that influence the outcome
quantify how each factor influences the outcome.
To the frustration of passengers, airlines and businesses around the world, the eruption of the Eyjafjallajokull volcano in Iceland presented several challenges for modelers. Some of the key factors playing a role here include:
Ash composition: how damaging is it?
Extent of the cloud: how high, how far and where?
Aircraft capability: how much ash exposure can airplanes handle?
The costs of staying grounded vs. the costs of catastrophic failure
These four bullets of course represent just the tip of the ice berg: each of them summarizes a long list of related factors. For example, while current data is unavailable, scientists have to rely on past experience when estimating the size and weight of ash particles, their chemical make-up and the density of the ash. To confound matters, predictions regarding wind speed, direction and turbulence need to be considered as well.
So the real question needs to be: how well does the model represent the current situation? How well do we understand what the actual key factors are? How well can we measure them? Are we relying purely on past insight or can we refine our knowledge with data from the current situation? To what extent can current satellite images, air samples and meteorological measurements improve our ability to predict the risk to life and well-being of people and property?
This interactive map on the BBC we site shows one example of how output from such modeling looks by mapping the extent of the plume over several days along with normal flight routes across the Atlantic. This post in Business Week sheds some light on our spotty knowledge regarding the real risks of volcanic ash.
The other critical question revolves around acceptable risk. Risk not only originates from ash clouds. It also comes from “playing it safe.” Billions of dollars in lost revenues and productivity put the livelihood of hundreds of thousands of people at risk. Every day we accept the risks of driving our car – so at what point do ash clouds represent a higher risk than driving a car? Even if we could come up with an exact numeric value for these risks, how does the value of human life fit into these equations?
Therein lies the apparent disconnect between statistical models and real life: intangible values sometimes outweigh what can be measured. The decisions we make depend on how we actually perceive the risk. Yet, in order to put our perceptions into perspective, we need to have good numbers to guide us – and getting good numbers requires a good model of reality.
So, before we start talking about law suits, we need to accept that risk is inherent to anything we do. Blaming people for doing the best they can to balance public safety with economic considerations wastes resources that would be better spent on improving our ability to assess and manage risks. We send unmanned drones to gather combat intelligence, why not modify them to collect air samples? Why not fund research to create better models for volcanic plumes? Especially if history should repeat itself and Eyjafjallajokull continues to sputter for the next year or two.
From Eruptions, a blog dedicated to volcanism:
Airlines lobby to reopen European airspace closed by Eyjafjallajökull
Posted on: April 18, 2010 2:30 PM, by Erik Klemetti
Today’s shrinking resources may tempt us into rushing things along, yet we need to be careful when relying on graphics to make decisions. Good graphics make their point more quickly than a wall of text. On the other hand, poor graphics easily create the wrong impression. Distinguishing between the two is not as easy as one might think.
Data analysis tools have advanced to the point where seemingly anyone with basic computer skills can develop meaningful insights. As with any tool, however, operator skill determines the ultimate outcome. Anyone can create graphs with trend lines, but knowing how to graph data properly requires skill.
For example, the graphic below appeared in a recent blog post about the impact of Health Care Reform on President Obama’s popularity. While the trend lines look impressive, this graph misleads the reader.
Four scatter graphs contrasting public opinion about Obama with public opinion about Health Care Reform
It is true that statisticians use scatter plots to show the relationship between two variables, but in this case a third variable plays an overriding role. This third variable is Time. Public opinions shift over time, depending on the headlines and the proposed changes in legislation. A scatter plot cannot take this into consideration. A timeline chart such as the one below proves to be more informative: it shows trends over time and provides possible explanations for shifts in public opinion.
Annotated timeline chart showing public opinion over time
Think about the many graphics we consume on a daily basis, whether in business meetings or in the media. What do we really know about the skill and motivation of the author behind the graphic? If a graphic supports an opinion we already hold, we may never question it – even if our opinion deserves questioning!
Herein lies the Catch-22: during times of change we have even less time than usual to deal with the intricacies of data analysis. Yet, precisely at those times do we need to question our assumptions, adapt to new realities and update our opinions. One way of solving this dilemma: delegate data analysis to people with the appropriate experience and skills.
How often have we seen a graph in an opinion piece without knowing how it was created, but somewhere in the back of our mind we suspected that it was tweaked somehow to make a point? How can we ferret out “creative analytics” from the true story? Remember Mark Twain’s famous quote about ” … lies, damned lies and statistics.” It is much more difficult to identify “lies” when we cannot inspect the data behind them.
By necessity, we always make choices about how to present data. After all, we *are* trying to make a point when we share information. But even if we do not intent to spin the message, we may be unable to see the whole story until someone else adds their insight. By making our data available for download, we can level the debating field somewhat and hopefully reach better informed conclusions.
Whether by accident or by design, one way to spin the message involves the use of data ranges. In the example below, we have divided US obesity rates into three different ranges. The first range uses intervals of 11, the second range uses intervals of 10 and the last range uses intervals of 5.
Look at the graphs about soda taxes in vending machines and see how each graph may lead to a different conclusion about obesity and soda taxes in vending machines. Then take a look at the graphs for the other taxes and notice how those graphs support similar conclusions regardless of the range size.
When deciding how to present information we have to balance “information overload” with the need to present important details. Which graph we choose ultimately depends on the point we are trying to make. Some might call that spin, others call it effective communication. If we are the audience, we need to be skeptical and ask questions.
The viz below uses Tableau Public and allows visitors to explore differences in food consumption and food/soda taxes based on obesity rates in the US. Notice the dramatic differences in soft drink consumption among populations with different obesity rates. At first glance, it appears that cutting back on sodas is our best bet for reducing waistlines!
By selecting different obesity ranges, one can observe that soda taxes tend to be low in states with high obesity rates. Just check out Mississippi, Alabama and South Carolina. Low soda taxes also occur in states with low obesity rates, so it would be premature to conclude that soda taxes are a good way to reduce obesity. Another interesting observation about populations with the highest obesity rates: general food taxes are high while soda taxes are low. Is this encouraging soda purchases over food?
Of course there may be issues with the underlying data as I explain here. I do not know this data well enough to draw definitive conclusions from it. Besides, this post is more about illustrating how we can move beyond static charts & graphs when discussing issues on the web. By the way, using the icons at the bottom of this viz, you can download this data and even change the cursor behavior so it will zoom in on a particular area on the map.
beware of borders and shading – they may look very differently in a web browser
preview your viz & make sure everything works as planned
keep it simple and have fun!
If you are using a blog with a theme, be sure you know how much display space your theme allows. If you use an HTML editor like the one in WordPress, make sure to paste Tableau’s HTML code into the HTML section of the editor, not the visual editor. I found it best to add the Tableau HTML at very end before publishing/updating the post.
Happy authoring and exploring. Please shoot me a note with your comments.
Often we have to work with data without knowing all the details of how it was collected and processed. In those situations we first need to determine what information the data contains and what it can and cannot tell us. We need to ask questions of the data and determine whether it makes sense, given what we already know. To hone in on the time saving questions it helps to be a subject matter expert. But even if we are unfamiliar with the subject area, we can start by inspecting the different pieces of data to see how everything fits together. Visual analysis tools like Tableau software make that job much easier than it used to be.
Here is an example of how such an exploration may look: we are exploring data about obesity, soda consumption and sales taxes on soda. We are told this data came from the US Department of Agriculture and a quick look reveals that we are looking at county level data. As one might expect, a scatter plot reveals a strong relationship between rising soda consumption and increased obesity.
Adult obesity rates increase as soda consumption increases
Now we get to the real questions: do sales taxes on soda help with lowering obesity rates? What relationship do we see between sales tax rates on soda and obesity? As luck would have it, the data we received also provides two measures about sales taxes for soda: one rate for vending machines and another rate for retail stores.
First we look at the relationship between soda taxes for retail stores versus obesity rates. One might expect that taxes discourage soda consumption and, yes, there appears to be a small downward trend as tax rates increase. Maybe soda taxes actually help with bringing down obesity?
Adult Obesity Rates, Retail Sales Tax Rates and Soda Consumption by US County
Now let’s take a look at sales taxes on soda coming from vending machines. Interesting observation: diabetes rates seem to increase slightly as these tax rates increase. Counter intuitive? How do vending machine purchases differ from purchases in a retail store? Are we observing a real relationship here, or is the data fooling us?
Adult Obesity Rates, Vending Machine Sales Tax Rates and Soda Consumption by US County
Before answering these questions, let’s take a closer look at all those data points on the y-axis. Do they really indicate that these counties levy a 0% soda tax? A quick inspection of the underlying data shows that, yes indeed, all records indicate a 0% tax rate. Not a single “null” value among them. However, without knowing how the data was processed, we cannot be sure that “zero” really means “no taxes” – it could also mean “no data.”
To explore further we start by placing the three graphs side by side. This way we can see more easily what happens when we exclude “zeroes.”
Soft Drink Consumption, Obesity and Soda Sales Taxes
First we exclude “zeroes” for retail sales taxes. Then we’ll do the same with taxes levied on soda in vending machines. The following graphs illustrate this.
Excluding Records with 0% Soda Sales Tax Rate (Retail). The center graph shows the relationship between the remaining retail records and obesity. The trend line still points downward.
Excluding Records with 0% Soda Sales Tax Rate (Vending). The right hand graph shows the relationship between the remaining vending machine records and obesity. We now see a downward trend as tax rates increase.
Wait a minute, though. When we exclude “zeroes” from one set of taxes, all data points for “greater than 0% taxes” disappear from the other graph. In other words, this data indicates that the two types of taxes are mutually exclusive! Hmm, does this even make sense in real life? Why would every US county tax soda either in retail stores or in vending machines but never in both?
Without further knowledge about this data we have to reframe our questions and conclusions:
When soda taxes are levied, higher tax rates appear to go hand in hand with decreasing obesity rates
We cannot draw any conclusions about the impact of “no sales taxes” versus “sales taxes”
Before we continue with a detailed analysis, we probably need to ask questions about this data. At first glance it makes little sense that counties levy soda taxes either on vending machines or on retail stores but never on both. Then again, I’m not a tax expert.
Chances are that we will uncover other areas about which we need to ask questions. Instead of taking the scattershot approach to learning about this data, data exploration helps us to develop very specific questions to ask. With specific questions, we stand a better chance of finding the right subject matter experts to consult.
This was a quick example for exploring data about which we knew nothing when we started. To gain new insights, we sometimes need to apply this “beginners mind” approach even to data about which we already know a lot. After all, errors can happen, collection and processing systems can change without our knowledge and sometimes we find nuggets that were hidden until we started looking for them. One final thought: the next time your boss or client asks to hurry up with the analysis, ask these two questions:
What are the consequences of making poor decisions because we hurried too quickly through the data exploration?
Do we need to go for more accuracy or is a ballpark analysis good enough at this time?
In order to make profitable decisions, we need good information. Whether we base our decisions on sales, customer perceptions or the number of widgets we shipped last month, our information comes from some system that collects and measures relevant data for us.
In my Six Sigma Black Belt class we recently discussed the challenges of developing a meaningful measurement system. As usual, the theory sounds easy – until it hits the road of reality. A very simple class room exercise illustrated that point neatly: our instructor had gone through the effort of individually placing twenty M&M candies into twenty numbered plastic bags and then asked us to “accept” or “reject” each M&M based on three criteria. The criteria were written down and no additional verbal cues were given nor did we have a “master” M&M on which to base our judgment.
We realized very quickly that these criteria were not nearly as clear cut as they appeared to be. For example, one criterion specified that the letter “m” on the candy should be “100% visible.” Sounds clear cut, right? After all, is has a numeric qualifier to help us make our decision! Reality check: have you ever looked at an M&M up close? The next time you do, look for tiny spots where the white ink is thin enough for the underlying color of the candy to bleed through the letter “m.” Question: if the entire outline of the letter “m” appears on the candy but these little flecks of color are bleeding through, does this mean that the “m” is no longer 100% visible?
The graph below shows the result of the M&M exercise. It illustrates just how far apart the judgment of perfectly reasonable people can be when they are asked to interpret someone else’s instructions. The left hand graph shows how much each team agreed with itself after reviewing all 20 candies twice in a row. The right hand graph shows how much each team agreed with an external standard for evaluating the candies. The fact that the two red lines barely line up with each other illustrates just how far apart the two teams were with their assessment of the same group of M&Ms.
M&M Attribute Agreement Analysis - click the picture to enlarge it.
The real issue, of course, has nothing to do with the candy and how it looks. The bigger point lies in something the Six Sigma folks call “operational definitions” and how we use them. The M&M example illustrates just how unpredictable individual judgments can be and how much training and feedback may be required before team members reach similar conclusions – which, in turn, will allow the team to work toward a common goal.
As the M&M example shows, developing operational definitions can be tricky. Definitions may be less clear cut than we think. We have a limited amount of time in which to develop them. In group settings, we also have to figure in personalities and hidden agendas. Good leadership and negotiation skills are needed to keep everyone focused without suppressing critical input. In the world of sales and marketing we have the additional challenge of dealing with missing and incomplete data. While statistical models go a long way toward filling in the picture, they are difficult to explain and are not always accepted by those whose paycheck depends on them or by those whose experience seems to indicate something else.
Some ideas for dealing with all this will be the subject of future posts. For today I simply want to ask these questions: with so many changes in the health care marketplace, how well are we prepared to make decisions? Which operational definitions do we need to add, update or toss out in order to ensure good decisions for the future?
P.S.: Additional Information About The M&M Graph
This data mimics the results from a Measuring System Audit (MSA) project with M&M candies. The assignment was to inspect 20 pieces of candy and to determine whether each met these three criteria:
1: the letter ‘m’ is 100% visible
2: the ink for the letter ‘m’ is not smudged
3: there are no chips
Only these written criteria were given. Neither team received additional instructions nor a “Master” against which to evaluate the candy. Each team was asked to review the candies in two rounds. During the first round, Team 2 decided to fail all 20 pieces of candy, hence that team’s low rate of agreement.
Conclusion: gaining agreement about operational definitions is critical. Make sure that everyone has the same training and verify that everyone in a decision making role can reach decisions that support the established goal. Repeat training and offer opportunities for feedback & refinement of criteria.
One can reasonably argue that processes don’t produce results, people do. In and of itself a process does nothing. It takes people to engage in a process – for better or for worse – to produce something. On the other hand are quality pioneers like Edwards Deming who says: “Eighty-five percent of the reasons for failure to meet customer expectations are related to deficiencies in systems and process . . . rather than the employee.” “The role of management is to change the process rather than badger individuals to do better.” This quote does not take people completely out of the equation, but it places the focus squarely on the process rather than people.
Whether processes or people fail is not merely an academic question – it determines how we run our business. Every day we make dozens of business decisions. Both the decision maker and the information on which the decision is based are part of the decision making process. To make business decisions we to rely on information. Sometimes this information is based on “hard” data that has been collected, analyzed and interpreted – at other times we rely on “gut level instinct” that has been honed by years of experience. Regardless of where the information originates and how it was derived, the decision maker controls whether and how it used.
Decision makers are influenced by more than their perception of the information itself. Other factors, such as a vested interest in the outcome and one’s ability to understand the full significance of a piece of information, also play an important role. Bextra, Seroquel and Vioxx are just a few of the better known Pharma industry examples to illustrate how difficult the interpretation of data can be – and how much of its interpretation and perceived significance can be motivated by a vested interest. The drug dilution scandal involving Robert Courtney provides an excellent case study of what it takes before individual data points come together to tell a compelling story.
Neither people nor processes are perfect – simply because no one can really define what “perfection” means. No matter how well designed, processes are prone to failure when they do not keep pace with changes and when people lack adequate training, experience and time to do the work. Can a shrinking economy and vanishing jobs sustain processes that manage thousands of details? When people worry about their jobs, how do we decide which details to stop paying attention to? When people are overworked and pressed to do more than one job, can they still absorb all the information necessary to do everything well? When an emergency takes place, how many resources will it drain from other vital matters?
Let us leave the discussion of whether Six Sigma is a process, a methodology or a philosophy for another day and simply call it a “process” for making business decisions to improve the quality of our goods and services. This said, do the massive recalls from Toyota indicate that quality processes like Six Sigma are slow to adapt to a world in recession? Are they simply too resource intensive and complicated? Rather than blaming the process, is the company at fault for not having the right people and incentives in place to adapt processes to a changing world? What are the implications for those of us who collect, analyze and consume data to make business decisions?
The Significance of Sigma: Toyota’s Lessons in Corporate Decision Making
With the massive recall due to sudden acceleration problems, Toyota’s reputation for superior quality has suffered a black eye – if not more. The future will tell how serious this injury is and whether it represents the tip of an ominous iceberg. Sprinkled amongst the news coverage are hints that Toyota has known about accelerator problems for some time. From an outsider’s perspective this raises several questions about corporate decision making, including this one:
VOC or “Voice of the Customer” is a key concept in Six Sigma, the quality methodology used by Toyota and many other companies. Needless to say that with millions of customers, there are millions of opportunities for feedback – hence the potential for noise.
Wordplay aside, any communication from a customer contains some useful information, but not all feedback carries the same weight. For example, a broken radio most likely has less impact on car safety than a stuck gas pedal – but we can’t be sure until we have more information: the broken radio may be a symptom of an electrical problem that also affects the accelerator.
Therein lies the problem: how do we assign the “appropriate” value to the information we receive? How much effort and money do we put into researching the (hypothetical) “radio problem” versus other problems? How can we quickly assess whether the “radio problem” can turn into a “safety problem” that requires thorough attention? With the myriad of active and passive ways in which we can listen to customers, we need a good triaging system to help us separate critical information from information clutter.
While everyone can agree that data needs to be used “appropriately,” it is much more difficult to agree on what “appropriate use” actually means. Assuming for the moment that we can collect accurate data, what do we need to know in order to elevate an incident from “routine” to “requires immediate attention?” Here are several key factors that influence appropriate use:
The ability to recognize the potential for significant harm
The ability to draw a correlation between the incident and significant harm
The ability to develop a solution to the problem
The ability to implement a solution to the problem
The ability to make that solution pay off in the long run
Each of these bullet points shares two characteristics: to accomplish them, we need good information as well as sound judgment – neither of which comes easily. This applies to all types of corporate decisions – whether we are dealing with product safety issues or the most profitable allocation of sales and marketing resources. The major differences between types of decisions typically revolve around their scale and the level of detail required to make a decision.
It is impractical to go through all the possible ways in which we can identify “appropriate” information. Instead, here are a few guidelines:
Assess the potential harm
Identify actionable information
Prioritize timeliness, accuracy and budget
Identify who needs to know what and when
Incorporate the means to review requirements from time to time
Keeping these bullets in mind goes a long way toward selecting the tools and resources needed to supply appropriate information.
Toyota knew of accelerator pedal problem in UK a year ago
From The Times
February 2, 2010
Recently, while working on input for a decision tree, I ran into a scenario that reminded me of the fact that we cannot improve a decision simply by applying a tool or technique. We also need good data.
Here is a hypothetical example: Let us assume we are a contractor who is evaluating a fixed bid contract. This contract will pay $115,000 if we accept a clause for liquidated damages of $50,000 in the event we do not meet some project conditions. We can remove this clause from the contract, but in that case it only pays $100,000.
From past experience we know that our project costs will fall somewhere between $80,000 and $90,000 and that the likelihood of coming in at the lower cost estimate is around 20%. This leaves an 80% chance that our costs will come in around $90,000. Looking at our current capabilities we estimate that we have a 90% chance of being able to meet all conditions and thus avoid having to pay damages.
Putting all of this into the decision tree pictured below, we conclude that accepting the liquidated damages clause is the better business decision.
Decision Tree showing the EMV of two contract options
But how good is our estimate for avoiding damages? Can we really trust it? What data do we have to back it up? Have we really considered all the factors that can influence our estimate? After all, as the image below shows, if we are off by only 20 percentage points, the decision becomes a toss up.
A decision tree showing what happens when we lower the assumption for avoiding damages from 90% to 70%
In a decision tree each chance node acts as a weighting factor, so it is worthwhile to pay special attention to events that are estimated to have a very high or very low chance of occurring. We want to be sure that we have good data to back up these optimistic (or pessimistic) numbers.
Of course it is not always feasible to gather all the data we need. Sometimes the data is too expensive given what is at stake, sometimes it is unavailable and sometimes the quality of the data is too unreliable for a given purpose. In that case, experience and judgment need to fill in the data holes. We also call this “making assumptions.”
When making assumptions, we should clearly identify them and decide what to do when one or more of them has to change. We need to
identify which factors influence our assumptions
determine how these factors influence the result
be able to recognize when a significant change in our assumptions is needed
have a process in place to handle these changes when they do occur.
No one can predict the future with certainty. But the more we understand the probabilities, the better prepared we are.
Ted Cuzzillo, the author behind the datadoodle blog, got me thinking about data details today. When do they matter and when do they distract from what matters?
Being a data analyst means that I love details: the more the better, so I can understand how they form the Big Picture. Intrinsically, I am drawn to graphs like this one:
A scatter plot showing individual data points and 90th percentile reference lines with their respective values
The spray of dots and their colors actually tell me something. They give me a feel for the data and point me toward what is driving the overall result. I can dig into individual data points and learn from them. On the other hand, many people need a more abstract view of the world – a view that boils down to the overall shape of things. After all, meaningful abstractions – like the graph below – are needed to make strategic, big picture decisions.
A line graph averaging out the data points from the previous graph
The graph above only plots 18 data points and connects them through a line to show the overall shape of the data. Of course, the more we abstract information, the more we loose the ability to derive meaningful insights.
In order to generate this line graph, I had to create bins into which I could group the many data points from the first graph. This means I now only have 18 data points from which to differentiate between the bottom 90% and the top 10% of the data. In the graph below, the numbers along each line indicate the number of records that have been binned to create each data point. As we can see from the 90th percentile reference lines below, the bottom 90% of the handful of data points in each section fall below 9 and 8 respectively.
The same line graph as above, including 90th percentile reference lines
However, the very first graph in this story shows us just how misleading the percentiles from the abstracted data are. According to the more detailed data, the 90th percentile values come out to 6.083 and 5.334 respectively. The abstracted values point in the right direction, but they are quite bit removed from the true values. The more detail we use, the closer we get to the truth.