Joyful or informative charts? Best practices in visual analytics

Small_packed_bubble_chartStephen Few, noted visual analytics expert and the original inspiration for our work in the field, recently wrote about criticisms of best data visualizations practices. In particular, Amanda Cox of the New York Times said, “There’s a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy.” And Nathan Yau of Flowing Data wrote, “in visualization you eventually learn that there’s more to the process than efficient graphical perception and avoidance of all things round. Design matters, no doubt, but your understanding of the data matters much more.” These are both people who have a body of work that I admire but I am also surprised at these comments.

This discussion reminds me of a similar problem in marketing and web analytics. Generating traffic that leads to sales is good. Eventually, someone finds a way to generate traffic that leads to not many new sales, but management is misled to think this must be good since traffic leads to sales. This is similar to “look, this chart is beautiful“, but hard to interpret or understand. So, while we delivered fun graphs, minimal information is shared. This may be good for traffic, but not so much for higher sales.

I suspect that part of this recent criticism can be traced back to Stephen’s recent criticism of Tableau, “Tableau Veers from the Path“. In it, he mentions a new graph type in Tableau, packed bubble charts and contrasts them with bar charts. This is an example of the “avoidance of all things circular”. Is Stephen truly anti-joy@f16 Will an example show him to be wrong@f17 Let’s give it a try and you can judge for yourself.

Here’s a packed bubble chart example Continue reading

Estimating future success rates from initial experience
surveys and observation (tutorial)

A wide range of common business questions are often decided incorrectly because decision-makers overlook, forget or neglect the application of a simple concept from statistics. In this tutorial we will walk you through several examples to avoid this potentially costly mistake. Examples where this technique can help include:

Is my ad worth the price?
Conversion (CTR): how many customers converted to a paying customer after clicking on an Google ad and visiting a special offer web page? Based on the revenue generated is the ad price too high?

How many of my customers have children?
Estimating customer demographics: based on a one day survey in every store, what percent of our entire customer base have children?

Who will win the election?
Survey results: what percent of likely voters will vote for Obama based on the responses from a 1,000 people in a poll?

Bringing down the house?
Winning a bet: if my friend flips a coin 10 times and it landed on heads 9 times, is this a “fair” coin?

All of these questions and many others can be answered with the technique explained and demonstrated in this article.


Which states have the most Miss America winners?

Here is a fun example about the Miss America pageant, it appeared on the home page.

Notice that 27% of users picked the correct state for the most Miss America winners, is that good? Well, we should ask how you would perform if you had no information and simply guessed at the answer. With four choices and only one correct answer, you have a 1 in 4 chance (that’s 1/4 = 25%) of guessing the answer even if you have no clue.

So, is 27% actually better than all of these people just guessing@f4 The answer is “it depends” on a missing piece of information- how many people answered this question. If 100 people answered it and 27 answered correctly, there is a good chance that they are all simply guessing. However, if 10,000 answered this question and 2,700 answered it correctly, there is a good chance that some of them answered better than just guessing.


The classic illustration of success- flip a coin

You may be puzzled at this point. Don’t fear. Let me move to a simpler example, flipping a coin. Believe it or not, it is very similar to the multiple choice question above, with the main difference being the chance of “success”- guessing heads or tails correctly, which is 1 in 2 or 50%. So, if I flip it once and you are right, then 100% of flips were guessed correctly. However, this one flip being guessed correctly wouldn’t lead me to believe that you had the ability to see the future (or that the coin is an unfair coin that is always heads). How many flips guessed correctly would it take@f5 Like I have seen followed in many business situations, what does your intuition or gut say@f6

Five out of five correct@f7
Twelve out of fourteen@f8
80 out of 100@f9

Here’s the good news, there is a simple Continue reading

Data Driven Conference 2012

We are having a great time at the Data Driven Conference in Columbus! Our first session was standing room only and we are presenting the same talk a second time at 1:30 in E161.

Interesting questions include “how do you become better at asking the right questions that lead to better analysis” and “how do you communicate with IT to get better data”?

To buy a copy of The Accidental Analyst, please visit

Here is our infographic that we created Continue reading

2012 NCAA football rankings–ranks per poll & overall ranking

Teams inlcuded in this week:
USC, LSU, Alabama, Oklahoma, Oregon, Georgia, Florida State, Michigan, South Carolina, Arkansas, West Virginia, Michigan State, Wisconsin, Clemson, Texas, Ohio State, Stanford, Nebraska, TCU, Virginia Tech, Oklahoma State, Kansas State, Florida, Boise State, Notre Dame, Louisville, Washington, Auburn, Georgia Tech, North Carolina, Utah, NC State, Baylor, South Florida, Texas A&M, Cincinnati, Brigham Young, Tennessee, Mississippi State, Virginia, Louisiana Tech, UCF, Houston, Rutgers, Southern Miss, Missouri, Florida Intl, Northern Illinois, Texas Tech

Conferences included in this week:
SEC, Big 12, ACC, Big Ten, Pac-12, Big East, USA, Ind, Mid-Amer, Mntn Wst, Sun, WAC
Continue reading

Free Webcast: “Big Data” in US History, Exploring the 1790 US Census

NOTE: This fun review of “big data” was inspired by a recent presentation I gave on behalf of Tableau Software at the Big Data Conference in Chicago. You can find the 1st part of this 3 part webcast here, “Performance to Cost Index & my personal history with “Big Data”. Part 3 is here, “Big Data” on your laptop, fast, informative and at your command.

In this presentation, I share a review of the original big data in US history, the 1790 US Census. Some surprises are found along the way, including data quality issues in the Census reports and a surprising Continue reading

Free Webcast: Performance to Cost Index & my personal history with “Big Data”

NOTE: This fun review of “big data” was inspired by a recent presentation I gave on behalf of Tableau Software at the Big Data Conference in Chicago. You can find the 2nd part of this 3 part webcast here, ““Big Data” in US History, Exploring the 1790 US Census”. Part 3 is here, “Big Data” on your laptop, fast, informative and at your command.

Many people ask me, what is “big data”?   For most of them, the right answer is that big data is any data that is difficult to use or understand (yes, I know the official, “correct” answers, which often vary and typically include topics like Hadoop and Cloudera.)

In this presentation, I share my experience with the Commodore 64, the PS/2, DEC Stations, VAX servers, Solaris Servers, PC’s and a MacBook Pro.   Products and languages covered include BASIC, FORTRAN, SAS, Oracle, Teradata and Tableau.

It is truly astonishing Continue reading

History of US House representation from 1910 through 2010

A few observations from this example
Examining this dashboard with the initial decades of 1960 and 2010, you can see that the control of the House has shifted toward the West and the South. Exceptions include Louisiana, Mississippi and Oklahoma in the South and Montana in the West.
If you adjust the first decade slider to 1910, an even more dramatic pattern appears! People love the sunshine and the West coast with California, Florida and Nevada growth at 300%+ and Washington, Oregon, Utah and Colorado at 67% or more growth.
Dashboard topics in this example
Download the workbook to peek at a few cool features of this dashboard, including:
1) Using table calculations Continue reading

Avoid flatline charts—visual analytics best practices

Balancing analysis of multiple years by filtering through the same month/day as today
Topics in this example
1) With a line chart, by placing Year(Order Date) on the Color shelf and Month(Date) on the columns, you can easily compare multiple years on the same pane of the graph. Just use Running Totals from the Quick Table Calculation dialog.
2) If this were real-world data, you would likely want to keep data through today; otherwise, prior years would likely be much higher since they are based on a full month while this year’s latest month is partially complete, unless it is the last day of the month!
3) By creating a calculated field that can check if the month/day is before today’s month/day and placing it on the filter shelf and selecting True, you can keep year-to-date data Continue reading

Bringing clarity out of an infographic, “Income Inequality in the US” from Mother Jones

A colleague shared this Mother Jones infographic, which attempts to explain the disparity in income between richer and poorer families in the US.   The data is indeed fascinating, but quite difficult to read in their flashy infographic.

There are two major issues that hinder understanding when viewing this infographic:

1) Using areas of circles to encode the incomes is very difficult for most people to interpret.   Additionally, with the difference in income being so large, it is nearly impossible to fit this on a normally sized web page.   The largest group, the yellow circle, is mostly cut off in their infographic.

2) The infographic is overloaded Continue reading

Helping You Show Your Data Who's Boss!