<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pharma BI &#187; Know Your Data</title>
	<atom:link href="http://pharma-bi.com/category/analytics/know-your-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://pharma-bi.com</link>
	<description>Business Intelligence Blog</description>
	<lastBuildDate>Thu, 15 Dec 2011 05:52:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>The Power of In-House Analytics</title>
		<link>http://pharma-bi.com/2010/07/the-power-of-in-house-analytics/</link>
		<comments>http://pharma-bi.com/2010/07/the-power-of-in-house-analytics/#comments</comments>
		<pubDate>Mon, 26 Jul 2010 22:40:06 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[BI Solutions]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Modelling]]></category>
		<category><![CDATA[Sales]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=695</guid>
		<description><![CDATA[<p>A business associate recently forwarded a white paper by one of the global BI software companies with the comment “… it all sounds so simple, yet we both know the complexities are just under the table.”  Like all good marketing materials, this white paper talked about the current pain of the target audience and provided [...]]]></description>
			<content:encoded><![CDATA[<p>A business associate recently forwarded a white paper by one of the global BI software companies with the comment “… it all sounds so simple, yet we both know the complexities are just under the table.”  Like all good marketing materials, this white paper talked about the current pain of the target audience and provided glowing examples of a possible solution.  Part of the proposed solution included this: free yourself from expensive consultants by bringing the power of predictive analytics in-house.</p>
<p>Coincidentally, this white paper arrived while I was working through the intricacies of sales transactions for a client who is looking for quick – and accurate – ways to answer questions like “What happened to my sales?” and “What happened to my margin?”  Both are high level questions that require a thorough understanding of “low level data” in order to provide meaningful answers.  This got me thinking about the complexities of performing predictive analytics.</p>
<p>Complexities lurking under the predictive analytics table include issues such as data quality.  For instance,</p>
<ul>
<li>Customer      IDs and customer names not always matching</li>
<li>Customer      ratings changing over time</li>
<li>Master      invoices being used to track transactions over an extended period of time</li>
<li>Inconsistent      data entry – for instance, credits sometimes showing up as negative      numbers and sometimes as positive numbers – depending on how the data      entry person coded the transaction.</li>
</ul>
<p>More important than data quality is the question of “how do we interpret what we see?”  Statistical outliers serve as an example here, since they do not require a lot of explanation and their meaning is open to interpretation.  They could be the first sign of a new trend, a fluke, a data error, or the result of factors beyond our control.  How we deal with outliers when building our predictive model depends on what caused them.</p>
<p>Non-repeatable exceptions, a.k.a. flukes, are meaningless when we are trying to build a model of the future.  Usually they are noise and become part of our margin for error rather than a factor we would include in our model. In order to separate meaningful facts from flukes, we need to dig further into the details and determine their influence on the big picture.</p>
<p>For example, the chart below shows an “Outlier Territory” that performed particularly well in terms of achieving sales goals.</p>
<div id="attachment_697" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/07/Outlier_Terr.jpg"><img class="size-medium wp-image-697" title="Outlier Territory" src="http://pharma-bi.com/wp-content/uploads/2010/07/Outlier_Terr-300x260.jpg" alt="Graph showing territory performance, including a statistical outlier." width="300" height="260" /></a><p class="wp-caption-text">Graph showing territory performance, including a statistical outlier.</p></div>
<p>As we refine our bonus plan for the next pay period, how should we proceed?  Should we assume this territory will continue to have high sales and therefore raise its quota?  The answer depends in part on whether we are dealing with</p>
<ul>
<li>A      real issue, such as our bonus model not working for that territory, or</li>
<li>A      fluke, like a one-time-only buy in by a major customer, or</li>
<li>Data      errors, as in “somehow we summed up the sales data incorrectly,” or</li>
<li>Factors      beyond our control, like an uptick in demand because of an unexpected and      short-lived emergency.</li>
</ul>
<p>Sometimes the sales rep can provide the insight we need to understand what caused the outlier.  Usually, though, we need to look for likely causes using the data we already have and relating it to information from other sources.</p>
<p>As we can see, our crystal ball is only as good as the answers we derive from data collected in the past.  Building it also requires us to make assumptions about how pieces fit together, how they influence each other and how important they are in shaping the future.  We can improve our assumptions using statistical tools like t-Tests, ANOVAs and various regression models.  We can look to proxies and draw on our understanding of the market place.  No matter how we develop our assumptions, we need to understand their limitations or they might turn us into <a href="http://en.wikipedia.org/wiki/Donkey">Jacks and Jennys</a> down the road.</p>
<p>Long story short: to build a crystal ball we need more than powerful tools.  We need skilled and experienced people, good data and the commitment to adapt over time.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2010/07/the-power-of-in-house-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beware of Creative Analytics: Lies, Damned Lies and Statistics</title>
		<link>http://pharma-bi.com/2010/03/beware-of-creative-analytics-lies-damned-lies-and-statistics/</link>
		<comments>http://pharma-bi.com/2010/03/beware-of-creative-analytics-lies-damned-lies-and-statistics/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 15:41:05 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=623</guid>
		<description><![CDATA[<p>How often have we seen a graph in an opinion piece without knowing how it was created, but somewhere in the back of our mind we suspected that it was tweaked somehow to make a point?  How can we ferret out &#8220;creative analytics&#8221; from the true story?  Remember Mark Twain&#8217;s famous quote about [...]]]></description>
			<content:encoded><![CDATA[<p>How often have we seen a graph in an opinion piece without knowing how it was created, but somewhere in the back of our mind we suspected that it was tweaked somehow to make a point?  How can we ferret out &#8220;creative analytics&#8221; from the true story?  Remember Mark Twain&#8217;s famous quote about &#8221; &#8230; lies, damned lies and statistics.&#8221;  It is much more difficult to identify &#8220;lies&#8221; when we cannot inspect the data behind them.  </p>
<p>By necessity, we always make choices about how to present data.  After all, we *are* trying to make a point when we share information.  But even if we do not intent to spin the message, we may be unable to see the whole story until someone else adds their insight. By making our data available for download, we can level the debating field somewhat and hopefully reach better informed conclusions.    </p>
<p>Whether by accident or by design, one way to spin the message involves the use of data ranges.  In the example below, we have divided US obesity rates into three different ranges.  The first range uses intervals of 11, the second range uses intervals of 10 and the last range uses intervals of 5.  </p>
<p>Look at the graphs about soda taxes in vending machines and see how each graph may lead to a different conclusion about obesity and soda taxes in vending machines.  Then take a look at the graphs for the other taxes and notice how those graphs support similar conclusions regardless of the range size.  </p>
<p><script type="text/javascript" src="http://public.tableausoftware.com/javascripts/api/viz_v1.js"></script><object class="tableauViz" width="554" height="689" style="display:none;"><param name="name" value="OB_CalorieSource_Taxes_bin/DB2" /><param name="toolbar" value="yes" /></object><noscript>DB2 <br /><a href="#"><img alt="DB2 " src="http://public.tableausoftware.com/static/images/OB_CalorieSource_Taxes_bin-DB2_rss.png" height="100%" /></a></noscript>
<div style="width:554px;height:22px;padding:0px 10px 0px 0px; margin-top: -6px; color:black;font:normal 8pt verdana,helvetica,arial,sans-serif;">
<div style="padding-left: 438px;"><a href="http://www.tableausoftware.com/public?ref=http://public.tableausoftware.com/views/OB_CalorieSource_Taxes_bin/DB2" target="_blank">Powered by Tableau</a></div>
</div>
<p>When deciding how to present information we have to balance &#8220;information overload&#8221; with the need to present important details.  Which graph we choose ultimately depends on the point we are trying to make.  Some might call that spin, others call it effective communication.  If we are the audience, we need to be skeptical and ask questions.</p>
<p>Related Posts:</p>
<p><a href="http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data">http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data/</a><br />
<a href="http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data/">http://pharma-bi.com/2010/03/tableau-public-interactive-obesity-data-on-the-web/ </a></p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2010/03/beware-of-creative-analytics-lies-damned-lies-and-statistics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Avoid Misleading Conclusions: Explore Your Data</title>
		<link>http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data/</link>
		<comments>http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 07:29:39 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Modelling]]></category>
		<category><![CDATA[Six Sigma]]></category>
		<category><![CDATA[Tableau]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=537</guid>
		<description><![CDATA[<p>Often we have to work with data without knowing all the details of how it was collected and processed.  In those situations we first need to determine what information the data contains and what it can and cannot tell us.   We need to ask questions of the data and determine whether it makes sense, [...]]]></description>
			<content:encoded><![CDATA[<p>Often we have to work with data without knowing all the details of how it was collected and processed.  In those situations we first need to determine what information the data contains and what it can and cannot tell us.   We need to ask questions of the data and determine whether it makes sense, given what we already know.   To hone in on the time saving questions it helps to be a subject matter expert.  But even if we are unfamiliar with the subject area, we can start by inspecting the different pieces of data to see how everything fits together.  Visual analysis tools like <a href="http://www.tableausoftware.com/" target="_blank">Tableau software</a> make that job much easier than it used to be.</p>
<p>Here is an example of how such an exploration may look: we are exploring data about obesity, soda consumption and sales taxes on soda.  We are told this data came from the US Department of Agriculture and a quick look reveals that we are looking at county level data.  As one might expect, a scatter plot reveals a strong relationship between rising soda consumption and increased obesity.</p>
<div id="attachment_538" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/1_ob_drink.jpg"><img class="size-medium wp-image-538" title="1_ob_drink" src="http://pharma-bi.com/wp-content/uploads/2010/03/1_ob_drink-300x248.jpg" alt="Adult Obesity Rates and Soda Consumption by US County" width="300" height="248" /></a><p class="wp-caption-text">Adult obesity rates increase as soda consumption increases</p></div>
<p>Now we get to the real questions: do sales taxes on soda help with lowering obesity rates?  What relationship do we see between sales tax rates on soda and obesity? As luck would have it, the data we received also provides two measures about sales taxes for soda: one rate for vending machines and another rate for retail stores.</p>
<p>First we look at the relationship between soda taxes for retail stores versus obesity rates.  One might expect that taxes discourage soda consumption and, yes, there appears to be a small downward trend as tax rates increase.  Maybe soda taxes actually help with bringing down obesity?</p>
<div id="attachment_539" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/2_ob_drink_store_tax.jpg"><img class="size-medium wp-image-539" title="2_ob_drink_store_tax" src="http://pharma-bi.com/wp-content/uploads/2010/03/2_ob_drink_store_tax-300x256.jpg" alt="Adult Obesity Rates, Retail Sales Tax Rates and Soda Consumption by US County" width="300" height="256" /></a><p class="wp-caption-text">Adult Obesity Rates, Retail Sales Tax Rates and Soda Consumption by US County</p></div>
<p>Now let’s take a look at sales taxes on soda coming from vending machines.  Interesting observation: diabetes rates seem to increase slightly as these tax rates increase.  Counter intuitive?  How do vending machine purchases differ from purchases in a retail store?  Are we observing a real relationship here, or is the data fooling us?</p>
<div id="attachment_541" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/3_ob_drink_vending_tax1.jpg"><img class="size-medium wp-image-541" title="3_ob_drink_vending_tax" src="http://pharma-bi.com/wp-content/uploads/2010/03/3_ob_drink_vending_tax1-300x256.jpg" alt="Adult Obesity Rates, Vending Machine Sales Tax Rates and Soda Consumption by US County" width="300" height="256" /></a><p class="wp-caption-text">Adult Obesity Rates, Vending Machine Sales Tax Rates and Soda Consumption by US County</p></div>
<p>Before answering these questions, let’s take a closer look at all those data points on the y-axis.  Do they really indicate that these counties levy a 0% soda tax?  A quick inspection of the underlying data shows that, yes indeed, all records indicate a 0% tax rate.  Not a single “null” value among them.   However, without knowing how the data was processed, we cannot be sure that “zero” really means “no taxes” &#8211; it could also mean &#8220;no data.&#8221;</p>
<p>To explore further we start by placing the three graphs side by side.  This way we can see more easily what happens when we exclude “zeroes.”</p>
<div id="attachment_542" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/4_ob_tax_drink.jpg"><img class="size-medium wp-image-542" title="4_ob_tax_drink" src="http://pharma-bi.com/wp-content/uploads/2010/03/4_ob_tax_drink-300x100.jpg" alt="Soft Drink Consumption, Obesity and Soda Sales Taxes" width="300" height="100" /></a><p class="wp-caption-text">Soft Drink Consumption, Obesity and Soda Sales Taxes</p></div>
<p>First we exclude “zeroes” for retail sales taxes. Then we’ll do the same with taxes levied on soda in vending machines.  The following graphs illustrate this.</p>
<div id="attachment_543" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/5_ob_tax_drink_xr.jpg"><img class="size-medium wp-image-543" title="5_ob_tax_drink_xr" src="http://pharma-bi.com/wp-content/uploads/2010/03/5_ob_tax_drink_xr-300x106.jpg" alt="Soft Drink Consumption, Obesity and Sales Taxes: Excluding Records with 0% Soda Sales Tax Rate (Retail)" width="300" height="106" /></a><p class="wp-caption-text">Excluding Records with 0% Soda Sales Tax Rate (Retail).  The center graph shows the relationship between the remaining retail records and obesity. The trend line still points downward.</p></div>
<div id="attachment_544" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/6_ob_tax_drink_xv.jpg"><img class="size-medium wp-image-544" title="6_ob_tax_drink_xv" src="http://pharma-bi.com/wp-content/uploads/2010/03/6_ob_tax_drink_xv-300x106.jpg" alt="Soft Drink Consumption, Obesity and Sales Taxes: Excluding Records with 0% Soda Sales Tax Rate (Vending)" width="300" height="106" /></a><p class="wp-caption-text">Excluding Records with 0% Soda Sales Tax Rate (Vending).  The right hand graph shows the relationship between the remaining vending machine records and obesity.  We now see a downward trend as tax rates increase.</p></div>
<p>Wait a minute, though.  When we exclude “zeroes” from one set of taxes, all data points for “greater than 0% taxes” disappear from the other graph.  In other words, this data indicates that the two types of taxes are mutually exclusive!  Hmm, does this even make sense in real life?  Why would every US county tax soda either in retail stores or in vending machines but never in both?</p>
<p>Without further knowledge about this data we have to reframe our questions and conclusions:</p>
<ul>
<li>When soda taxes are levied, higher tax rates appear to go hand in hand with decreasing obesity      rates</li>
<li>We      cannot draw any conclusions about the impact of “no sales taxes” versus      “sales taxes”</li>
<li>Before we continue with a detailed analysis, we      probably need to ask questions about this data.  At first glance it makes little sense      that counties levy soda taxes either on vending machines or on retail      stores but never on both.  Then      again, I’m not a tax expert.</li>
</ul>
<p>Chances are that we will uncover other areas about which we need to ask questions.  Instead of taking the scattershot approach to learning about this data, data exploration helps us to develop very specific questions to ask.  With specific questions, we stand a better chance of finding the right subject matter experts to consult.</p>
<p>This was a quick example for exploring data about which we knew nothing when we started.  To gain new insights, we sometimes need to apply this &#8220;beginners mind&#8221; approach even to data about which we already know a lot.  After all, errors can happen, collection and processing systems can change without our knowledge and sometimes we find nuggets that were hidden until we started looking for them.  One final thought: the next time your boss or client asks to hurry up with the analysis, ask these two questions:</p>
<ul>
<li>What      are the consequences of making poor decisions because we hurried too quickly      through the data exploration?</li>
<li>Do      we need to go for more accuracy or is a ballpark analysis good enough at this time?</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2010/03/how-to-avoid-misleading-conclusions-explore-your-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Decision Making During Turmoil: Are We Prepared?</title>
		<link>http://pharma-bi.com/2010/03/decision-making-during-turmoil-how-well-are-we-prepared/</link>
		<comments>http://pharma-bi.com/2010/03/decision-making-during-turmoil-how-well-are-we-prepared/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 00:34:56 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[Modelling]]></category>
		<category><![CDATA[Sales]]></category>
		<category><![CDATA[Six Sigma]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=528</guid>
		<description><![CDATA[<p>In order to make profitable decisions, we need good information.  Whether we base our decisions on sales, customer perceptions or the number of widgets we shipped last month, our information comes from some system that collects and measures relevant data for us.</p>
<p>In my Six Sigma Black Belt class we recently discussed the challenges of developing [...]]]></description>
			<content:encoded><![CDATA[<p>In order to make profitable decisions, we need good information.  Whether we base our decisions on sales, customer perceptions or the number of widgets we shipped last month, our information comes from some system that collects and measures relevant data for us.</p>
<p>In my Six Sigma Black Belt class we recently discussed the challenges of developing a meaningful measurement system.  As usual, the theory sounds easy &#8211; until it hits the road of reality.  A very simple class room exercise illustrated that point neatly: our instructor had gone through the effort of individually placing twenty M&amp;M candies into twenty numbered plastic bags and then asked us to “accept” or “reject” each M&amp;M based on three criteria.  The criteria were written down and no additional verbal cues were given nor did we have a “master” M&amp;M on which to base our judgment.</p>
<p>We realized very quickly that these criteria were not nearly as clear cut as they appeared to be.  For example, one criterion specified that the letter “<strong>m”</strong> on the candy should be “100% visible.”   Sounds clear cut, right?  After all, is has a numeric qualifier to help us make our decision!  Reality check: have you ever looked at an M&amp;M up close? The next time you do, look for tiny spots where the white ink is thin enough for the underlying color of the candy to bleed through the letter “<strong>m</strong>.”  Question: if the entire outline of the letter “<strong>m</strong>” appears on the candy but these little flecks of color are bleeding through, does this mean that the “<strong>m</strong>” is no longer 100% visible?</p>
<p>The graph below shows the result of the M&amp;M exercise. It illustrates just how far apart the judgment of perfectly reasonable people can be when they are asked to interpret someone else’s instructions.  The left hand graph shows how much each team agreed with itself after reviewing all 20 candies twice in a row.  The right hand graph shows how much each team agreed with an external standard for evaluating the candies.  The fact that the two red lines barely line up with each other illustrates just how far apart the two teams were with their assessment of the same group of M&amp;Ms.</p>
<div id="attachment_529" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2010/03/MSA_Exercise_SSBB2010.jpg"><img class="size-medium wp-image-529 " title="MSA_Exercise_SSBB2010" src="http://pharma-bi.com/wp-content/uploads/2010/03/MSA_Exercise_SSBB2010-300x200.jpg" alt="M&amp;M Attribute Agreement Analysis" width="300" height="200" /></a><p class="wp-caption-text">M&amp;M Attribute Agreement Analysis - click the picture to enlarge it.</p></div>
<p>The real issue, of course, has nothing to do with the candy and how it looks.  The bigger point lies in something the Six Sigma folks call “operational definitions” and how we use them.  The M&amp;M example illustrates just how unpredictable individual judgments can be and how much training and feedback may be required before team members reach similar conclusions – which, in turn, will allow the team to work toward a common goal.</p>
<p>As the M&amp;M example shows, developing operational definitions can be tricky.  Definitions may be less clear cut than we think.  We have a limited amount of time in which to develop them.  In group settings, we also have to figure in personalities and hidden agendas. Good leadership and negotiation skills are needed to keep everyone focused without suppressing critical input.  In the world of sales and marketing we have the additional challenge of dealing with missing and incomplete data.  While statistical models go a long way toward filling in the picture, they are difficult to explain and are not always accepted by those whose paycheck depends on them or by those whose experience seems to indicate something else.</p>
<p>Some ideas for dealing with all this will be the subject of future posts.  For today I simply want to ask these questions: with so many changes in the health care marketplace, how well are we prepared to make decisions?  Which operational definitions do we need to add, update or toss out in order to ensure good decisions for the future?</p>
<p><strong>P.S.:  Additional Information About The M&amp;M Graph</strong></p>
<p>This data mimics the results from a Measuring System Audit (MSA) project with M&amp;M candies.  The assignment was to inspect 20 pieces of candy and to determine whether each met these three criteria:</p>
<p>1: the letter &#8216;m&#8217; is 100% visible<br />
2: the ink for the letter &#8216;m&#8217; is not smudged<br />
3: there are no chips</p>
<p>Only these written criteria were given. Neither team received additional instructions nor a &#8220;Master&#8221; against which to evaluate the candy.  Each team was asked to review the candies in two rounds.  During the first round, Team 2 decided to fail all 20 pieces of candy, hence that team&#8217;s low rate of agreement.</p>
<p>Conclusion: gaining agreement about operational definitions is critical.  Make sure that everyone has the same training and verify that everyone in a decision making role can reach decisions that support the established goal.  Repeat training and offer opportunities for feedback &amp; refinement of criteria.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2010/03/decision-making-during-turmoil-how-well-are-we-prepared/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Significance of Sigma: Toyota’s Lessons in Corporate Decision Making</title>
		<link>http://pharma-bi.com/2010/02/the-significance-of-sigma-toyota%e2%80%99s-lessons-in-corporate-decision-making/</link>
		<comments>http://pharma-bi.com/2010/02/the-significance-of-sigma-toyota%e2%80%99s-lessons-in-corporate-decision-making/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 23:39:05 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Current Topics]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[Six Sigma]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=515</guid>
		<description><![CDATA[<p>With the massive recall due to sudden acceleration problems, Toyota’s reputation for superior quality has suffered a black eye – if not more.  The future will tell how serious this injury is and whether it represents the tip of an ominous iceberg.  Sprinkled amongst the news coverage are hints that Toyota has known about accelerator [...]]]></description>
			<content:encoded><![CDATA[<p>With the massive recall due to sudden acceleration problems, Toyota’s reputation for superior quality has suffered a black eye – if not more.  The future will tell how serious this injury is and whether it represents the tip of an ominous iceberg.  Sprinkled amongst the news coverage are hints that <a href="http://business.timesonline.co.uk/tol/business/industry_sectors/transport/article7011671.ece">Toyota has known about accelerator problems for some time</a>.  From an outsider’s perspective this raises several questions about corporate decision making, including this one:</p>
<ul>
<li>How      does one differentiate between the “<a href="http://en.wikipedia.org/wiki/Voice_of_the_customer" target="_blank">voice of the customer</a>” and the “noise      of the customer?”</li>
</ul>
<p>VOC or &#8220;Voice of the Customer&#8221; is a key concept in Six Sigma, the quality methodology used by Toyota and many other companies.  Needless to say that with millions of customers, there are millions of opportunities for feedback &#8211; hence the potential for noise.</p>
<p>Wordplay aside, any communication from a customer contains some useful information, but not all feedback carries the same weight.  For example, a broken radio most likely has less impact on car safety than a stuck gas pedal – but we can’t be sure until we have more information: the broken radio may be a symptom of an electrical problem that also affects the accelerator.</p>
<p>Therein lies the problem: how do we assign the “appropriate” value to the information we receive?  How much effort and money do we put into researching the (hypothetical) “radio problem” versus other problems?  How can we quickly assess whether the “radio problem” can turn into a “safety problem” that requires thorough attention?  With the myriad of active and passive ways in which we can listen to customers, we need a good triaging system to help us separate critical information from information clutter.</p>
<p>While everyone can agree that data needs to be used “appropriately,” it is much more difficult to agree on what “appropriate use” actually means.  Assuming for the moment that we can collect accurate data, what do we need to know in order to elevate an incident from “routine” to “requires immediate attention?” Here are several key factors that influence appropriate use:</p>
<ul>
<li>The      ability to recognize the potential for significant harm</li>
<li>The      ability to draw a correlation between the incident and significant harm</li>
<li>The      ability to develop a solution to the problem</li>
<li>The      ability to implement a solution to the problem</li>
<li>The      ability to make that solution pay off in the long run</li>
</ul>
<p>Each of these bullet points shares two characteristics: to accomplish them, we need good information as well as sound judgment – neither of which comes easily.  This applies to all types of corporate decisions – whether we are dealing with product safety issues or the most profitable allocation of sales and marketing resources.  The major differences between types of decisions typically revolve around their scale and the level of detail required to make a decision.</p>
<p>It is impractical to go through all the possible ways in which we can identify “appropriate” information.  Instead, here are a few guidelines:</p>
<ul>
<li>Assess      the potential harm</li>
<li>Identify      actionable information</li>
<li>Prioritize      timeliness, accuracy and budget</li>
<li>Identify      who needs to know what and when</li>
<li>Incorporate      the means to review requirements from time to time</li>
</ul>
<p>Keeping these bullets in mind goes a long way toward selecting the tools and resources needed to supply appropriate information.</p>
<p><strong>Additional Reading</strong></p>
<p><strong>Toyota</strong><strong> knew of accelerator pedal problem in UK a year ago</strong><br />
From The Times<br />
February 2, 2010</p>
<p><a href="http://business.timesonline.co.uk/tol/business/industry_sectors/transport/article7011671.ece">http://business.timesonline.co.uk/tol/business/industry_sectors/transport/article7011671.ece</a></p>
<p><strong>Unintended Acceleration: Toyota Addresses the Issues</strong><br />
November 06, 2009 by Irv Miller</p>
<p><a href="http://pressroom.toyota.com/pr/tms/our-point-of-view-post.aspx?id=2234">http://pressroom.toyota.com/pr/tms/our-point-of-view-post.aspx?id=2234</a></p>
<p>Wikipedia entry for <a href="http://en.wikipedia.org/wiki/Six_Sigma">Six Sigma</a>, the quality control methodology used by Toyota and many other companies.  <a href="http://en.wikipedia.org/wiki/Voice_of_the_customer" target="_blank">Voice of the Customer</a> (VOC) is a key concept of the Six Sigma methodology.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2010/02/the-significance-of-sigma-toyota%e2%80%99s-lessons-in-corporate-decision-making/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why We Need Good Data</title>
		<link>http://pharma-bi.com/2009/12/why-we-need-good-data/</link>
		<comments>http://pharma-bi.com/2009/12/why-we-need-good-data/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 17:54:59 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Case Studies]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Modelling]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=497</guid>
		<description><![CDATA[<p>Recently, while working on input for a decision tree, I ran into a scenario that reminded me of the fact that we cannot improve a decision simply by applying a tool or technique. We also need good data.</p>
<p>Here is a hypothetical example: Let us assume we are a contractor who is evaluating a fixed bid [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, while working on input for a <a href="http://en.wikipedia.org/wiki/Decision_tree">decision tree</a>, I ran into a scenario that reminded me of the fact that we cannot improve a decision simply by applying a tool or technique. We also need good data.</p>
<p>Here is a hypothetical example: Let us assume we are a contractor who is evaluating a fixed bid contract.  This contract will pay $115,000 if we accept a clause for liquidated damages of $50,000 in the event we do not meet some project conditions.  We can remove this clause from the contract, but in that case it only pays $100,000.</p>
<p>From past experience we know that our project costs will fall somewhere between $80,000 and $90,000 and that the likelihood of coming in at the lower cost estimate is around 20%.  This leaves an 80% chance that our costs will come in around $90,000.  Looking at our current capabilities we estimate that we have a 90% chance of being able to meet all conditions and thus avoid having to pay damages.</p>
<p>Putting all of this into the decision tree pictured below, we conclude that accepting the liquidated damages clause is the better business decision.</p>
<div id="attachment_498" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2009/12/EMV_Original.jpg"><img class="size-medium wp-image-498" title="Decision Tree: 90% Probability of Avoiding Damages" src="http://pharma-bi.com/wp-content/uploads/2009/12/EMV_Original-300x159.jpg" alt="Decision Tree: 90% Probability of Avoiding Damages" width="300" height="159" /></a><p class="wp-caption-text">Decision Tree showing the EMV of two contract options</p></div>
<p style="text-align: center;">
<p>But how good is our estimate for avoiding damages?  Can we really trust it?  What data do we have to back it up?  Have we really considered all the factors that can influence our estimate?  After all, as the image below shows, if we are off by only 20 percentage points, the decision becomes a toss up.</p>
<div id="attachment_499" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2009/12/EMV_TossUp.jpg"><img class="size-medium wp-image-499 " title="Decision Tree: 70% Probability of Avoiding Damages" src="http://pharma-bi.com/wp-content/uploads/2009/12/EMV_TossUp-300x159.jpg" alt="Decision Tree: 70% Probability of Avoiding Damages" width="300" height="159" /></a><p class="wp-caption-text">A decision tree showing what happens when we lower the assumption for avoiding damages from 90% to 70%</p></div>
<p style="text-align: center;">
<p>In a decision tree each chance node acts as a weighting factor, so it is worthwhile to pay special attention to events that are estimated to have a very high or very low chance of occurring.  We want to be sure that we have good data to back up these optimistic (or pessimistic) numbers.</p>
<p>Of course it is not always feasible to gather all the data we need.  Sometimes the data is too expensive given what is at stake, sometimes it is unavailable and sometimes the quality of the data is too unreliable for a given purpose.  In that case, experience and judgment need to fill in the data holes.  We also call this “making assumptions.”</p>
<p>When making assumptions, we should clearly identify them and decide what to do when one or more of them has to change.  We need to</p>
<ul>
<li>identify      which factors influence our assumptions</li>
<li>determine      how these factors influence the result</li>
<li>be      able to recognize when a significant change in our assumptions is needed</li>
<li>have      a process in place to handle these changes when they do occur.</li>
</ul>
<p>No one can predict the future with certainty.  But the more we understand the probabilities, the better prepared we are.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2009/12/why-we-need-good-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When Data Details Matter</title>
		<link>http://pharma-bi.com/2009/11/when-data-details-matter/</link>
		<comments>http://pharma-bi.com/2009/11/when-data-details-matter/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 00:37:10 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Modelling]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=482</guid>
		<description><![CDATA[<p>Ted Cuzzillo, the author behind the datadoodle blog, got me thinking about data details today.  When do they matter and when do they distract from what matters?</p>
<p>Being a data analyst means that I love details: the more the better, so I can understand how they form the Big Picture.  Intrinsically, I am drawn to graphs [...]]]></description>
			<content:encoded><![CDATA[<p>Ted Cuzzillo, the author behind the <a href="http://datadoodle.com/">datadoodle</a> blog, got me thinking about data details today.  When do they matter and when do they distract from what matters?</p>
<p>Being a data analyst means that I love details: the more the better, so I can understand how they form the Big Picture.  Intrinsically, I am drawn to graphs like this one:</p>
<div id="attachment_483" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2009/11/Individual_Points_90th.jpg"><img class="size-medium wp-image-483" title="Individual_Points_90th" src="http://pharma-bi.com/wp-content/uploads/2009/11/Individual_Points_90th-300x188.jpg" alt="A scatter graph showing individual data points and 90th percentile reference lines with their respective values" width="300" height="188" /></a><p class="wp-caption-text">A scatter plot showing individual data points and 90th percentile reference lines with their respective values</p></div>
<p>The spray of dots and their colors actually tell me something.  They give me a feel for the data and point me toward what is driving the overall result.  I can dig into individual data points and learn from them.  On the other hand, many people need a more abstract view of the world &#8211; a view that boils down to the overall shape of things.  After all, meaningful abstractions – like the graph below – are needed to make strategic, big picture decisions.</p>
<div id="attachment_484" class="wp-caption aligncenter" style="width: 266px"><a href="http://pharma-bi.com/wp-content/uploads/2009/11/Line_18_Data_Points.jpg"><img class="size-medium wp-image-484" title="Line_18_Data_Points" src="http://pharma-bi.com/wp-content/uploads/2009/11/Line_18_Data_Points-256x300.jpg" alt="A line graph averaging out the data points from the previous graph" width="256" height="300" /></a><p class="wp-caption-text">A line graph averaging out the data points from the previous graph</p></div>
<p>The graph above only plots 18 data points and connects them through a line to show the overall shape of the data.  Of course, the more we abstract information, the more we loose the ability to derive meaningful insights.</p>
<p>In order to generate this line graph, I had to create bins into which I could group the many data points from the first graph.  This means I now only have 18 data points from which to differentiate between the bottom 90% and the top 10% of the data.  In the graph below, the numbers along each line indicate the number of records that have been binned to create each data point.  As we can see from the 90<sup>th</sup> percentile reference lines below, the bottom 90% of the handful of data points in each section fall below 9 and 8 respectively.</p>
<div id="attachment_485" class="wp-caption aligncenter" style="width: 269px"><a href="http://pharma-bi.com/wp-content/uploads/2009/11/Line_18_Data_Points_90th.jpg"><img class="size-medium wp-image-485" title="Line_18_Data_Points_90th" src="http://pharma-bi.com/wp-content/uploads/2009/11/Line_18_Data_Points_90th-259x300.jpg" alt="The same line graph as above, including 90th percentile reference lines" width="259" height="300" /></a><p class="wp-caption-text">The same line graph as above, including 90th percentile reference lines</p></div>
<p>However, the very first graph in this story shows us just how misleading the percentiles from the abstracted data are.  According to the more detailed data, the 90<sup>th</sup> percentile values come out to 6.083 and 5.334 respectively.  The abstracted values point in the right direction, but they are quite bit removed from the true values.  The more detail we use, the closer we get to the truth.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2009/11/when-data-details-matter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to Create a Misleading Quadrant Analysis – by Accident</title>
		<link>http://pharma-bi.com/2009/11/how-to-create-a-misleading-quadrant-analysis-%e2%80%93-by-accident/</link>
		<comments>http://pharma-bi.com/2009/11/how-to-create-a-misleading-quadrant-analysis-%e2%80%93-by-accident/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 22:23:29 +0000</pubDate>
		<dc:creator>Christine Muser</dc:creator>
				<category><![CDATA[Know Your Data]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://pharma-bi.com/?p=475</guid>
		<description><![CDATA[<p>When we use analysis tools like Tableau software, it becomes very important to keep our bearings about the data we are investigating.  For example, we need to keep in mind that Tableau retrieves and calculates information based only on the data needed to generate the graph.  That statement sounds really, duh, obvious.   But we can [...]]]></description>
			<content:encoded><![CDATA[<p>When we use analysis tools like <a href="http://www.tableausoftware.com/" target="_blank">Tableau software</a>, it becomes very important to keep our bearings about the data we are investigating.  For example, we need to keep in mind that Tableau retrieves and calculates information based only on the data needed to generate the graph.  That statement sounds really, duh, obvious.   But we can get into trouble when we don’t think about it <img src='http://pharma-bi.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Let’s look at an example.  Below are two graphs based on exactly the same underlying data – but why do the colors look different?   Each graph appears to show a quadrant analysis that compares two web sites based on their search engine rank and trust.</p>
<div id="attachment_476" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2009/11/QuadsDB1.jpg"><img class="size-medium wp-image-476" title="QuadsDB1" src="http://pharma-bi.com/wp-content/uploads/2009/11/QuadsDB1-300x224.jpg" alt="Graphs comparing two web sites based on their SEO rank and trust" width="300" height="224" /></a><p class="wp-caption-text">Graphs comparing two web sites based on their SEO rank and trust</p></div>
<p>The difference lies in the way each graph is generated: the first graph really represents a set of eight data points, while the second graph represents two sets of four data points each &#8211; a subtle, but important distinction.  The second graph shows the quadrants for each individual site using a separate scale for each site.  This allows us to compare each site quadrant by quadrant without having to worry about one site having vastly more links than the other.  In other words, we can answer questions like: which site did a better job of getting high quality links vs. low quality links?</p>
<p>The first graph combines the data for both sites and plots each quadrant on a scale for the combined data.   If one site has many more links than the other site, it will skew the scale toward the higher linked site.  In essence, we are comparing all eight quadrants against each other as opposed to comparing how each site performed on a particular quadrant.</p>
<p>The second graph therefore is the “correct” quadrant analysis if we want to compare each site quadrant by quadrant.  But why even talk about the first graph?</p>
<p>That’s because in Tableau it may be tempting to generate the first graph to save time &#8211; especially when one is new to Tableau.  We only have to drag the “Site Name” dimension onto the column shelf and, voila, we can show both sites next to each other.  The problem is this: the shading is now determined based on all 8 data points together – rather than using a set of 4 data points for each individual site.  This becomes obvious once we add color scales to the graphs:</p>
<div id="attachment_477" class="wp-caption aligncenter" style="width: 310px"><a href="http://pharma-bi.com/wp-content/uploads/2009/11/QuadsDB2.jpg"><img class="size-medium wp-image-477" title="QuadsDB2" src="http://pharma-bi.com/wp-content/uploads/2009/11/QuadsDB2-300x224.jpg" alt="Graphs comparing two web sites based on their SEO rank &amp; trust - includes color scales" width="300" height="224" /></a><p class="wp-caption-text">Graphs comparing two web sites based on their SEO rank &amp; trust - includes color scales</p></div>
<p>The first graph really does not compare the two sites to each other. Instead it takes a look at all the links for both sites combined and creates 8 data points from all those links.  The second graph uses data from one site at a time.   A small – but critical – difference.</p>
<p>While this example may seem trivial, it actually has deep implications when we deal with more complex visualizations.  For example, when we use bins or when we filter records based on certain values, we may add misleading reference lines or create inaccurate charts &#8211; but that’s a topic for another day.</p>
]]></content:encoded>
			<wfw:commentRss>http://pharma-bi.com/2009/11/how-to-create-a-misleading-quadrant-analysis-%e2%80%93-by-accident/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

