According to recent estimates by the CDC, the use of electronic health and medical records in physician offices continues to climb. According to data from the November 2011 NCHS Data Brief, around 57 percent of office based physicians report using some kind of electronic health/medical record system and around 34 percent report having a system that meets basic performance criteria. However, EHR/EMR usage varies widely by state, as can be seen in the graph below.
The graph shows states in descending order based on the usage of any type EHR/EMR system. The series of dots illustrates usage of any system, while the bars illustrate usage of systems that meet basic criteria. Median percentages are illustrated by dotted lines. Colored shading indicates the percentage range into which most (68%) states fall. The shaded area on the left applies to usage of basic systems, the shaded area to the right applies to usage of any EHR/EMR system.
As we can see from the jagged outline of the blue bars, adoption of systems that meet basic performance criteria is not necessarily consistent with overall usage of electronic record systems. For example, Utah ranks number 2 in terms of overall EHR/EMR usage, but four states rank higher than Utah in terms of systems with basic functionality. These four states are Minnesota, Wisconsin, Washington and Oregon.
North Dakota, Utah, Minnesota, Wisconsin, Washington, Oregon and Hawaii not only lead the nation in terms of overall EHR/EMR usage, but also in terms of having basic systems in place. Looking toward the lower end of the list, we can see that states like Mississippi, South Carolina, Nevada, New Jersey and Louisiana have the lowest percentages of systems that meet basic criteria.
Given the current state of the debate around health care reform, one might be tempted to ask whether the political climate at the state level plays a role. Using information about “Red States and Blue States” from Wikipedia, we can assign party affiliations to states based on results from the past five presidential elections (1992 to 2008). At first glance, it appears that democratic states tend to be further along in terms of EHR/EMR system usage. Their median percentages for overall usage (61%) and basic system usage (36%) are higher than those of republican states (57% and 31% respectively).
click on the graph to enlarge it
However, reducing this question to simple partisan politics ignores the many factors influencing the adoption of complex systems. Issues like perceptions and attitudes, computer experience, workflow impact and concerns about patient-doctor relationships play a much bigger role in the day to day decisions doctors make. These issues are beyond the scope of this writing, though a very small sampling of relevant academic research appears at the end of this post.
Even without delving into those issues, we can get a sense that there is more at play here by looking at socio-economic data from the US Census Bureau. The graph below shows the usage of any type of EHR/EMR system, along with census data about poverty, annual pay, higher education, age, and number of physicians per capita.
click on the graph to enlarge it
Using this graph, we notice that our breakout of Democratic and Republican states shows some significant differences. The thick grey lines indicate the median value for each variable within each party category, while the grey shading shows the range into which the majority (68%) of states in each category fall. Based on this data, residents in democratic states tend to be
wealthier (lower percentage of people living below the poverty line)
earn higher incomes (higher average annual pay)
more educated (higher percentage of residents with a bachelor degree or higher)
able to choose among more doctors (higher rate of doctors per capita)
Since we are dealing with state level data we cannot say how much each of these factors influence EHR/EMR system adoption (see scatter plots at the end), except to say that higher levels of poverty appear to be linked to lower levels of EHR/EMR system usage – apparently more so if the state happens to lean republican.
Unfortunately this exploration suffers from the fact that we are dealing with high level data which hinders our ability to match up relevant details. None the less, it provides some hints that socio-economic issues play at least an indirect role in the adoption and use of EHR/EMR systems.
click on the graph to enlarge it
click on the graph to enlarge it
Resources:
Explore the data details behind the graphs in this post via this interactive Tableau Workbook.
Electronic Health Record Systems and Intent to Apply for Meaningful Use Incentives Among Office-based Physician Practices: United States, 2001–2011
Definitions used in this report (direct quote from the NCHS Data Brief):
“Physician office: A place where nonfederally employed physicians provide direct patient care in the 50 states and the District of Columbia; excludes radiologists, anesthesiologists, and pathologists.
Any EMR/EHR system: Obtained from “yes” responses to the question, “Does this practice use electronic medical records or electronic health records (not including billing records)?”
Basic EMR/EHR system: A system that has of all of the following functionalities: patient history and demographics, patient problem list, physician clinical notes, comprehensive list of patient’s medications and allergies, computerized orders for prescriptions, and ability to view laboratory and imaging results electronically (4). Having a comprehensive list of patient’s medications and allergies was asked as two separate questions in 2010 (one about medications and the other about allergies); the questions were collapsed into one question in 2011 (5).”
The following classification of red and blue states (as well as purple/battleground states) was determined by compiling the average margins of victory in the five presidential elections between 1992 and 2008. Three of these past elections were won by Democrats, Bill Clinton in 1992 and 1996, and Barack Obama in 2008, while two were won by Republican George W. Bush in 2000 and 2004.”
A Framework for Predicting EHR Adoption Attitudes: A Physician Survey
by Mary E. Morton, PhD, RHIA, and Susan Wiedenbeck, PhD
The term “dashboard” provides a convenient metaphor because everyone has at least some idea of what a dashboard looks like – and therein lies the problem: our own idea of a dashboard may differ wildly from someone else’s idea of a dashboard. When people talk about dashboards, there may be a huge communications gap and it pays to build a bridge across that gap before taking any action toward developing a dashboard.
There’s a big difference between the dashboard of a car and that of a passenger jet! For one thing, understanding a car dashboard requires significantly less training and experience than a cockpit dashboard. Of course each is designed to meet different needs: a pilot has to worry about many more things than a car driver when ferrying passengers safely from Point A to Point B.
When developing information dashboards for a business we also need to keep user needs in mind: a VP of Marketing will need a much more high-level overview than a Product Director, who in turn has quite different requirements from a sales rep preparing an action plan. In practical terms this means that we first have to figure out who will use the dashboard and how.
A dashboard is only useful when it can become an integral part of decision making. Some decisions need to be made on a daily basis, such as which prospects to call or how to follow-up with a client. Other decisions take more time. Those decisions tend to be more organizational and strategic in nature. They require input from many people and data sources, they require observation and a longer term perspective.
The implication here is that a dashboard needs to be capable of providing information at the speed with which decisions are made. A dashboard also needs to provide the appropriate amount of summary and detail. Before we can build anything, we need to define what information is actionable and how soon it needs to be available.
From a technical standpoint, this initial exploration provides the framework for determining how the dashboard should be build. The following questions are just a starting point:
What information is required?
How will the information be used?
What kinds of summaries and calculations make the most sense?
How much detail should be included?
How can we best present the information?
How often does data need to be refreshed?
Which databases and information sources are necessary?
Which software/hardware best meets our needs?
The answers to these questions always involve tradeoffs. Which tradeoffs make sense depends on the impact the dashboard will have and whether we can demonstrate a positive ROI. At some point, the dashboard has to help improve the bottom line.
Drawing a direct line between dashboard and bottom line dollars can be complicated. Dashboard benefits tend to be intangible. To measure ROI, we have to think about how the dashboard enables better decisions, how it helps users focus on profitable actions and whether it helps to save time or other resources. Often we need to offset these intangible benefits against very real budgets and real money that needs to be spent on development, training, software, hardware, and so on.
As we can see, if we are serious about building a dashboard, we also have to be serious about spending some time upfront to map out a plan for getting from the Basic Idea to an Actual Dashboard. To get there, everyone involved needs to develop a common language, that is, a set of common definitions and goals. For instance, if our dashboard is supposed to track progress following a product launch we need a common understanding of what market penetration means. Does it mean “number of actual customers vs. potential customers” or “number of actual customers vs. number of targets” or “number of customers who bought at least X number of widgets.” How we define our key metrics has a direct impact on the usefulness of our dashboard and on the effort required to build it.
Developing common definitions and goals is a key step toward building a bridge across the communication gap. Just because we are using terminology that seems to be well defined doesn’t mean we really are talking about the same thing. For example, we might be talking about lighthouses and we might even have the same general idea of what a light house is and how a typical lighthouse looks. But when it comes to actually building the lighthouse we need something more concrete.
What’s a lighthouse?
The Idea
An Actual Lighthouse
Another Actual Lighthouse
When discussing the concept of a lighthouse, usually this type of image comes to mind.
Most of us would expect an actual lighthouse to look something like this. It meets the typical expectations for a lighthouse.
Sometimes a lighthouse needs to address special situations: when a client needs a special lighthouse and the developer thinks of building a typical lighthouse, the project is in trouble.
For the most part it’s fairly easy to calculate fiscal periods in Tableau. Here is one approach that requires two parameters and two date functions. This approach allows us to create custom Fiscal Periods on the fly – which can be useful for building what-if scenarios or for working with data that trickles in over a long period of time, such as rebates or claims, where we may need to exclude a chunk of the most current data because it’s incomplete.
To get started we need to indicate when the Fiscal Period starts and when it ends, hence the two parameters: [CFY Start] and [CFYTD End]. Then we need to determine which data to include in our calculation. The DATEDIFF and DATEADD functions will come in handy for that.
Let’s assume we want to create a Fiscal Year summary for a measure called [Sales] based on the [Order Date]:
FYTD Sales Current =
IF [Order Date] < [CFY Start] THEN Null
ELSEIF [Order Date] > [CFYTD End] THEN Null
ELSE [Sales]
END
To compare the current fiscal year sales with the same period during the previous year, we need to shift everything back by 12 months using the DATEADD function:
FYTD Sales Previous =
IF [Order Date] < DATEADD(‘month’,-12,[CFY Start]) THEN Null
ELSEIF [Order Date] > DATEADD(‘month’,-12,[CFYTD End]) THEN Null
ELSE [Sales]
END
Note: when using the DATEADD function to add or subtract months, we don’t have to worry about leap days – pretty nifty.
A similar approach works for Month to Date calculations. With the DATEDIFF function we get rid of all the data that doesn’t fall into the current month, then we just add up the days we want to include:
FMTD Sales Curent =
IF DATEDIFF(‘month’,[Order Date],[CFYTD End]) <> 0 THEN NULL
ELSEIF [Order Date] > [CFYTD End] THEN NULL
ELSE [Sales]
END
When comparing to the same month a year ago, we can use the same trick of shifting time as we used for our Year to Date Previous calculation:
FMTD Sales Previous =
IF DATEDIFF(‘month’,[CFYTD End],[Order Date]) <> -12 THEN NULL
ELSEIF [Order Date] > DATEADD(‘month’,-12,[CFYTD End]) THEN NULL
ELSE [Sales]
END
Bonus Calculation:
If you need to calculate the number of days in a year, here’s a way that considers Leap Years:
IF (datepart(‘year’,[Order Date]) % 400) = 0 then 366
ELSEIF (datepart(‘year’,[Order Date]) % 100) = 0 then 365
ELSEIF (datepart(‘year’,[Order Date]) % 4) = 0 then 366
ELSE 365
END
The % sign is Tableau’s syntax for modulo calculations.
This quick follow-up on dual axis graphs shows another take on their potential use. The first suggestion comes from Naomi Robbins in her book “Creating More Effective Graphs.” 1 She suggests that dual axis graphs may be useful to represent data in different – but equivalent – measurement units such as Centigrade (Celsius) and Fahrenheit. Some of us can understand temperature better when it is expressed in Fahrenheit, while others relate better to temperatures on a Celsius scale. This example works because we are graphing the same thing, we’re just expressing it in different units and provide the two scales as an easy reference.
When doing this, however, we have to ensure that both scales are synched up as illustrated below.
The other example comes from Dona Wong’s recent Wall Street Journal. Guide to Information Graphics. 2 In it she suggests using dual axis graphs to “ … help show how two directly related series move together.” The picture below – courtesy of Kaiser Fung’s Junk Charts blog – shows the example from her book. 3
I am somewhat ambivalent about this chart since we have to be careful in selecting each scale in order avoid distorting the data (note that the Market Share scale starts at 30% and that the increments for both scales line up).
When done properly, both examples help us when we need to make a point with a single picture: we can save space and keep two related pieces of information within easy view. This works best when illustrating a specific issue and when we can use a static picture in a newspaper, printed report or a presentation. We’re asking for trouble when using this approach in a dynamic environment, like a dashboard, where scales need to adapt to the latest data.
Acknowledgements:
1. Naomi B. Robbins, Creating More Effective Graphs (New Jersey: John Wiley & Sons, Inc., 2005), page 262
2. Dona M. Wong, The Wall Street Journal. Guide to Information Graphics (New York: W.W. Norton & Company, Inc.), page 59
I have to admit that even after reading Stephen Few’s article on dual axis graphs, I am not quite ready to rule them out entirely. As is so often the case with data visualization, what we use depends on what we’re trying to do. I agree with Joe Mako and Stephen Few that, when used as a communication tool, dual axis graphs often confuse rather than communicate. Therefore it makes little sense to use them in dashboards and other situations where we need to communicate information quickly and at a glance.
But I do find dual axis graphs useful as a shortcut when exploring data – precisely because I can more clearly see the shape of the two curves in relation to each other. Below is an example that looks at Dollars and Units. When graphing them separately, the lines look somewhat similar and it takes a second look to notice that Units are declining while Dollars are increasing.
Two graphs, one showing Dollars the other showing Units
In the dual axis graph below, I got that message without needing a second look:
Dual axis graph showing Dollars and Units
When exploring data with visual analytics tools like Tableau, a dual axis graph can help me compare metrics quickly to see whether something jumps out. Are the curves moving in synch with each other? Are they trending in a similar direction? Do the curves indicate anything useful from a business point of view? For instance, in this example it looks like our margin might be improving. If the curves moved in the opposite direction, we may need to ask whether we’re discounting too much.
As Joe Mako and Stephen Few point out, in a dual axis graph we cannot compare the magnitude of change, but we can get a general feel for the direction and whether there are further questions we need to ask. In this example, a simple ratio calculation of Dollars per Unit helps to confirm whether we’re onto something. When we report results from our analysis, a graph of this ratio will do a better job showing the relationship between Dollars and Units than the dual axis graph shown above.
Many charting tools allow us to combine bar charts and line graphs in the same graphic – but should we? This question came up when I did a double take the other day while reading a market research report. In order to focus on function rather than content, I have re-created a similar graph below:
Click the image to enlarge it
If the author’s intent was to induce a double take, it worked for me – in a confusing sort of way:
It caught my eye that the line graph is trending down while the bars are going up
But it took a few moments to realize what the line graph actually represents
And it took a few more moments to realize the message behind the graph: Widget volume is going up, but the rate of growth has turned from “healthy” to “anemic”
That was way too much work to get the gist of this graph! Why was this so difficult to get?
Instead of providing a scale, the individual data points are labeled – this clutters up the picture and distracts the eye. Not only that, it is difficult to get a ballpark idea for the maximum values unless one scans each data point – a scale for each data series would have provided that contextual information much more quickly.
When I re-created the graph I realized something else: in order to display the line graph within the colored bars, I had to set the maximum value for the percentage scale higher than what would be expected: instead of setting the top percentage value to 10% I had to use 15% to push the line graph down far enough to completely overlap with the bars.
Also, fewer words in the legend labels would have been better (the original had even more text!) – and it would have helped to spell out the word “percent” instead of hiding the percent symbol amongst all that text. Better yet: be consistent and stick with the same nomenclature: the chart title says “Volume and Growth” so the series labels should say the same.
Maybe this is a nit pick, but I prefer legends at the bottom or the side of a graph. The top should be reserved for the title – IMHO.
Finally, the question of whether we should mix bar charts with line graphs: I don’t think so. The bars overpower the picture and clutter things up. Call me a minimalist, but it’s much easier to see how volume and rate of growth relate to each other when we draw two clean lines like so:
Click the image to enlarge it
With less clutter, the eye can very quickly take in the overall picture and focus on what matters:
Number of Widgets is going up,
Percent Growth is going down.
If the data values really matter to us, we can look them up in the table below the graph.
Years ago my Communications 101 professor told us in class that we should care about the proper definition of words and use words properly “lest we confuse the people who actually do know the difference.” Sage advice, of course, and also tricky to implement in real life since many words represent something specific to experts and aficionados, while they are mere buzzwords to everyone else.
So, why should we care whether we call something a “dashboard” or a “scorecard?” After all, they both use visual gadgets and graphs, don’t they? Both provide information about what’s going on in the business, right? In my experience, most end users really don’t care about the difference in nomenclature: what matters to them is whether the information they get enables profitable business decisions.
Yet, there is a critical difference we need to ferret out during requirements gathering: are we measuring performance or are we monitoring progress? Measuring performance simply means keeping track of things – like fuel consumption or engine temperature in a car. Monitoring progress implies that some thresholds and goals have been established: when the needle on the gas gauge nears “Empty” we need to start looking for a gas station.
Ergo, the critical difference is this: measuring things just provides information – what we do with that information is open to interpretation. Unlike progress monitoring, measuring things doesn’t tell us when to do something nor does it provide information on what needs to get done. In order to measure progress, though, we have to set goals and determine acceptable performance standards – in essence, we have to figure out what needs to get done and when to do it. To stick with the car dash metaphor: someone has to determine how low the gasoline level can sink before the “fuel empty” warning light turns on.
Much has been written about scorecards and dashboards and their respective differences. As a practical matter, I like this short hand definition from Wayne Eckerson, formerly the director of research and services at the Data Warehousing Institute (TDWI):
Dashboard
Scorecard
Purpose
Measures performance
Charts progress
Users
Managers, staff
Executives, managers, staff
Updates
Real-time to right-time
Periodic snapshots
Data
Events
Summaries
Top-Level Display
Charts and tables
Symbols and icons
This chart provides a good starting point from which to explore requirements. Often a dashboard becomes the starting point for a scorecard: while building a dashboard, the organization discusses what needs to be measured and how. With measurements in place it’s now easier to set thresholds and develop SMART goals for a scorecard.
So, does it matter whether we call something a “dashboard” or a “scorecard”? Yes – but not as much as one might think. What matters more is that everyone is on the same page about the main purpose of the tool: are we measuring performance or are we monitoring progress?
A business associate recently forwarded a white paper by one of the global BI software companies with the comment “… it all sounds so simple, yet we both know the complexities are just under the table.” Like all good marketing materials, this white paper talked about the current pain of the target audience and provided glowing examples of a possible solution. Part of the proposed solution included this: free yourself from expensive consultants by bringing the power of predictive analytics in-house.
Coincidentally, this white paper arrived while I was working through the intricacies of sales transactions for a client who is looking for quick – and accurate – ways to answer questions like “What happened to my sales?” and “What happened to my margin?” Both are high level questions that require a thorough understanding of “low level data” in order to provide meaningful answers. This got me thinking about the complexities of performing predictive analytics.
Complexities lurking under the predictive analytics table include issues such as data quality. For instance,
Customer IDs and customer names not always matching
Customer ratings changing over time
Master invoices being used to track transactions over an extended period of time
Inconsistent data entry – for instance, credits sometimes showing up as negative numbers and sometimes as positive numbers – depending on how the data entry person coded the transaction.
More important than data quality is the question of “how do we interpret what we see?” Statistical outliers serve as an example here, since they do not require a lot of explanation and their meaning is open to interpretation. They could be the first sign of a new trend, a fluke, a data error, or the result of factors beyond our control. How we deal with outliers when building our predictive model depends on what caused them.
Non-repeatable exceptions, a.k.a. flukes, are meaningless when we are trying to build a model of the future. Usually they are noise and become part of our margin for error rather than a factor we would include in our model. In order to separate meaningful facts from flukes, we need to dig further into the details and determine their influence on the big picture.
For example, the chart below shows an “Outlier Territory” that performed particularly well in terms of achieving sales goals.
Graph showing territory performance, including a statistical outlier.
As we refine our bonus plan for the next pay period, how should we proceed? Should we assume this territory will continue to have high sales and therefore raise its quota? The answer depends in part on whether we are dealing with
A real issue, such as our bonus model not working for that territory, or
A fluke, like a one-time-only buy in by a major customer, or
Data errors, as in “somehow we summed up the sales data incorrectly,” or
Factors beyond our control, like an uptick in demand because of an unexpected and short-lived emergency.
Sometimes the sales rep can provide the insight we need to understand what caused the outlier. Usually, though, we need to look for likely causes using the data we already have and relating it to information from other sources.
As we can see, our crystal ball is only as good as the answers we derive from data collected in the past. Building it also requires us to make assumptions about how pieces fit together, how they influence each other and how important they are in shaping the future. We can improve our assumptions using statistical tools like t-Tests, ANOVAs and various regression models. We can look to proxies and draw on our understanding of the market place. No matter how we develop our assumptions, we need to understand their limitations or they might turn us into Jacks and Jennys down the road.
Long story short: to build a crystal ball we need more than powerful tools. We need skilled and experienced people, good data and the commitment to adapt over time.
I am happy to report that I can now call myself a certified Lean Six Sigma Black Belt. While I consider this a worthwhile achievement, some friends and colleagues have questioned why I was spending time, effort and money on “just getting a piece of paper” that doesn’t mean much in the world of sales and marketing. True enough, we usually associate the words “Lean” and “Six Sigma” with manufacturing and service optimization, but in reality the tools and principles associated with Lean Six Sigma can be applied to a host of business issues. Let me explain.
Define, Measure, Analyze, Improve, Control
Whether we are aware of it or not, we employ these five steps all the time. In order to solve a problem we first have to understand it (“Define” and “Measure”), then we have to choose a solution (“Analyze” and “Improve”) and to make sure the solution sticks, we have to put some “Controls” in place. Lean Six Sigma shortens these five steps to form the acronym DMAIC and organizes projects into five phases called Define, Measure, Analyze, Improve and Control. Each phase deserves careful attention. Faults in any of them will create problems down the road, either by solving the “wrong” problem, by implementing the “wrong” solution or by creating an atmosphere where the habits that created the problem can re-emerge. Whether we have to manage a project or try to solve a less complex problem, DMAIC is a good place to start.
Lean Six Sigma Provides Tools and Techniques to Ensure Success
Good business decisions require relevant information and the ability to get it. Elsewhere in this blog I have discussed some of the finer points of relevant information. Let me focus here on “… the ability to get it” because that often depends on the skills, knowledge and motivation of the humans who provide the information.
Throughout our certification course we spent considerable time sharing real life stories and discussing what it takes to build team consensus, to make team decisions and to prioritize solutions. Lean Six Sigma provides a toolbox of methodologies from which the adept practitioner can choose the ones that fit the team dynamics and the problem at hand. The mechanics of these tools are easily learned – the human element can be more difficult to manage.
It takes human judgment and input from people to determine which factors are relevant, to discover where the problems are and to identify which solutions are feasible and should be pursued. Motivations such as job protection, maintaining a good reputation, demonstrating leadership and controlling one’s destiny are powerful factors that affect not only team dynamics but also what information people are willing to share. Lean Six Sigma calls itself a “data driven” methodology, but that doesn’t mean it ignores human input. When used appropriately and with skill, Lean Six Sigma tools help to transcend these human factors by approaching the problem from many different angles and by placing the emphasis on processes and problem solving rather than blaming people.
Lean Six Sigma Is Data Driven
Data and statistical analysis play a central role in Lean Six Sigma and go far beyond the measurement of technical specifications. Our “gut” will often point us in a good direction, but to get funding and to understand whether and where we are making progress, we need some numbers. That reliance on “numbers” is explicitly built into the Lean Six Sigma process by requiring us to “Define” our problem, to “Measure” the current state and then to “Analyze” it to determine the best solution. Hypothesis testing, Chi Square tests, ANOVA, regression analysis, t-tests and a host of other statistical tools used in Lean Six Sigma also work away from the factory floor: they enable us to understand patient motivation, provider opinions, sales rep performance and driving forces in the market place – to name just a few.
Subject Matter Expertise Still Matters
Being able to use Lean Six Sigma jargon like “Cause & Effect Matrix”, “Design of Experiment” or “Value Stream Mapping” doesn’t mean much unless we provide the necessary context. Usually this means dropping the jargon and applying relevant subject matter expertise. A “Cause and Effect Matrix” may provide the foundation for translating business priorities into a bonus plan – complete with performance goals and payout curve. Concepts from “Design of Experiment” apply to “Survey Design” in Marketing Research as well as to “Conjoint Analysis” when we are trying to understand the impact of various market forces. Creating a “Value Stream Map” may help with restructuring departments and job descriptions to support growth for a provider of healthcare or other services. It’s not the tool that matters, it’s how we use it.
The Take-Away
Whenever conversations with friends and colleagues turned from abstract to more details about Lean Six Sigma, I started to hear comments like “oh, you’re doing a mini-MBA” or “that’s what I learned as part of my PMP certification” or “hey, this is an idea I can use.” The discussion above illustrates how these comments came about. When choosing among consultants, shouldn’t we give priority to someone who has demonstrated their ability to solve problems effectively and efficiently? I am banking on it, and together with some new insights from class, I now have more than a “piece of paper” to demonstrate that ability to anyone who needs to know 🙂
A while back, SEO guru Glenn Crocker and I were talking about how visual analytics can help with search engine optimization. Getting useful SEO information usually requires crunching data for thousands of links, so it’s quite useful to have something that takes us from a quick overview to the interesting details.
To illustrate how visual analytics can help with this, we decided to look at the web sites for two of our favorite charities: Feeding America, formerly known as Second Harvest, and an affiliated organization called the Harvesters Community Food Network in Kansas City.
Using data from SEOmoz, we combined the links for both sites into one database and compared their link performance. A quick glance at the graph below tells us that Feeding America has the better SEO profile: more links in general and also more high quality links. This shouldn’t be too surprising since Feeding America is a national organization, while Harvesters serves the greater Kansas City area. But comparing the two sites provided some interesting data for us to review.
Two things are worth noting here:
Some domains linking to Feeding America are very highly ranked – up to a rank of 10, while the domains linked to the Harvester’s site top out at 8. This is also reflected in the overall average for the domain ranks — 5.5 and 4.5 respectively.
When looking at the color near the peak of each curve, we notice that the rank for domains with the most links to Feeding America is higher than for highly linked domains at Harvesters.
The quality of the links is indicated by the color (green is better) and the Domain mozRank (10 is best). The number of links is indicated by the height of each curve.
Next we tried to figure out which sites contributed the most to the SEO performance. Below I have highlighted a few of them. These scatter plots show domains that have 5 or more links across both sites. In the interactive version we can hover over data points to see more details about each domain. I am not sure that the folks at SEOmoz would be happy about me uploading their data, so I am showing just this picture to get the general idea across.
Depending on time and interest, we can perform even more fine grained analysis. For example, just because a highly ranked domain sends us links doesn’t mean the links rank equally well. The graph below shows that only two highly ranked domains send links of a similarly high rank.
Let’s pretend we are Harvesters and we want to reach an audience beyond the Kansas City area. By looking at these three graphs, we now know that
we need more and better links
we need web content that attracts more highly trusted domains
we have plenty of links from the local community (many of the Top 15 Domains are from Kansas City based organizations). Maybe we should broaden our horizons and reach out to the owners of more nationally focused, highly trusted domains.
P.S.: Just in case anyone is curious: at least in the Kansas City area, Harvesters does rank at top for the search term “Harvesters”