Category Archives: Data Science

Download data to predict gender using first name (US data)

Download US American first names and initials to predict gender sex 1Do you have data with just first names or even just first initials but no information on the person’s gender/sex? If you would like better insights on your customers, based on whether they are likely male or female, then this data download is a great way to maximize your ROI! Download it today and begin using it to tailor your messaging and improve future communications.

There are three licenses available for this data- individual, corporate and corporate for multi-company consumers. The individual version is available free (with discount code) for a limited time. Simply select the Individual license for purchase and use discount code discfreepers at the checkout page- this will deduct $3.99 from your purchase price.

The primary table in this data download is First names by Freakalytics with 5164 rows (distinct names and common misspellings). You can use this data to guess if someone is a male or female based on their first name or find the probability that they are male or female based on their first name.

Here is the column information and simple summaries for this table:

Data Column Max Min Average Median Mode
Name mixed case Zulma Aaron N/A N/A James
Most likely gender Male Female N/A N/A Female
Rank Overall 4,019 1 2354 2397 4019
Male Probability 100% 0% 22% 0% 0%
Female Probability 100% 0% 78% 100% 100%
Count Either Gender 99,989 32 1,079 127 32
Male Count 99,671 0 524 0 0
Female Count 83,718 0 555 64 32
Male Probability Within 3.68% 0.00% 0.08% 0.01% 0.00%
Female Probability Within 2.92% 0.00% 0.02% 0.00% 0.00%
Male Rank 1,054 1 584 608 1,054
Female Rank 3,052 1 1,825 2,131 3,052
Name first initial Z A N/A N/A J
Name upper case ZULMA AARON N/A N/A JAMES

The top few rows from this table (as a snapshot of the data in Excel 2003 format and in text):

Download US American first names and initials to predict gender sex 1

Access this valuable data download here.

Free Webinar—Analyzing Your Data With Excel:
Simple Steps for Actionable Results

Recorded August 22nd, 2013
Do what you can with what you have where you are.
Cowboy, soldier, historian and
26th President of the U.S.

Answer everyday business questions like an analyst with Microsoft Excel. This webinar is based on a selected set of techniques from The 7C’s of Data Analysis, as covered in our book, The Accidental Analyst: Show Your Data Who’s Boss. An abbreviated case study will be used to demonstrate common techniques that can start you on the course to analyzing data with Microsoft Excel 2007, 2010 or 2013.

This presentation is approximately 1 hour and 15 minutes in length.

Continue reading

Thoughts on “Business Analytics Software Still Has Skeptics”

Synopsis of the article and summary chart from Pete Barlas at Investor’s Business Daily
Many companies still view the promise of analytics software as a glass half-empty.

One of the strongest sectors in enterprise software, business analytics has many doubters among companies skeptical it truly is helping improve the top and bottom lines.

So found a survey released last month by accounting and professional services firm Deloitte Touche Tohmatsu.

“What we are seeing in the analytics front is a real skepticism among business leaders about whether this works and how it can make a difference,” said Tim Phillipps, global leader of Deloitte’s analytics practice.

The findings, he admits, were a surprise.

Here is Stephen’s comment on this article:

I have worked in analytics for many years at over 100 companies (employee, consultant and leading teams.)  It has always been easier to lean on costs savings as a clear measure of success with analytics and data warehouse investments.  Continue reading

Free Webinar—Quick & dirty analysis with Tableau
in 13 lucky steps!

July 31st, 2013, Noon Pacific, 3 PM Eastern, 8 PM London
So much data, so little time!
–Stephen McDaniel
Co-founder of Freakalytics

Let’s face it: in the daily world of work, you often are asked to provide an answer to a new problem in less than a day. Of course, your boss tends to forget about the other three project deadlines you are currently facing, so you really have only 10 or 20 minutes to squeeze in a quick and dirty analysis.

If this sounds familiar to you, this webinar will walk you through the thirteen flexible steps that can take you from being clueless to looking smart with Tableau in just a few minutes. Hopefully you’ll be able to obtain enough information to come up with ideas for an e-mail update or talking points for the unexpected meeting that is looming large over your day, showing your boss and colleagues that you can deliver great results in time to be useful.

So, if you’re already a user of Tableau, this webinar will guide you in the critical path of many analyses in Tableau. If you are totally new to Tableau, you can see the possibilities of what you can accomplish in a short amount of time, once you get started and practice these techniques.
A preview of the first few steps

1 What question will you examine?



Okay, in reality this step might take hours or even days! But let’s assume you have your question, and if it is complex, break it down into several, simpler questions.

2 Grab the closest, readily available dataset Continue reading

Free Webinar—Business analytics and more
with SAS Enterprise Guide

Recorded on July 10th, 2013
True genius resides in the capacity for evaluation of
uncertain, hazardous, and conflicting information.

Prime Minister of England during WWII

In this webinar, Stephen will analyze multiple real-world case studies using SAS Enterprise Guide by following the 7 C’s of Data Analysis. He will collect data from a range of sources, explore the data for common problems, apply quick data fixes, demonstrate best practices of visual analytics and use powerful predictive models that go beyond the limits of standard analysis techniques.
Click here for the full post and to register below Continue reading

Free Webinar—Visual Analytics Best Practices
Why Can’t You See My Point?!?


You can have brilliant ideas,
but if you can’t get them across,
your ideas won’t get you anywhere.

-Lee Iacocca


The webinar is past but you can watch the recording and view the slides below.
This post is currently being updated with the slides and videos.

Why do visual analytics best practices matter?

Why can’t people see your point when you present data-oriented presentations?

Whether you are using big data, small data or summarized data that has been prepared for you, this webinar will explore these vital questions. If you are concerned with getting the most from your data, this complimentary webinar is a great step in learning how to clearly communicate with people as they make better informed decisions in the hectic world of modern business.

Are you clearly communicating the message that you want to deliver from your data? If you’re tired of your tables and charts being “good enough”, learn some tips and tricks to help make them great! We’ll demonstrate how choosing the right table, chart and metrics to answer the question at hand and how to simplify your visuals for maximum impact. Regardless of whether you use Excel, SAS, R, PowerPoint, Qlikview, Tableau, Business Objects, Cognos, Microstrategy or most any other analytics tool for your analysis, you will benefit from this thought-provoking presentation.

For everyone who joined, thanks for your support and participation during the Q&A!

20130626 webinar comments 2
Chat comments at the end of the webinar

Click here for the video and presentation Continue reading

SAS versus R for business analysts

Click to read this post

Over on R4Stats, I replied to Bob Muenchen’s article, Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

Personally, I think SAS is a wonderful application, with my SAS experience starting in SAS programming back in 1989 (mainframes, along with Fortran), SAS Enterprise Guide (I wrote SAS for Dummies, the first two editions with Chris Hemedinger) and SAS Enterprise Miner.   Additionally, I have used JMP, SAS Data Integration Studio, SAS Forecast Studio and several other SAS tools.

On the other hand, I have used R since 2004 on several projects and S (precursor to R) since the 90’s in biopharm. I find R truer to being a modern programming language while SAS is truer to being an analyst programming language. Perhaps I am biased? But, the way I think of attacking problems with data and my typical need to massage the data in a wide range of ways, SAS is simply superior in my opinion. The flow of the language, the ease of readability and the powerful DATA step are still my favorite programming world. However, if I am seeking most any statistical test under the sun, R is clearly superior.

Unfortunately, R doesn’t have a clear, de-facto GUI (graphical user interface) that is well-designed Continue reading

Joyful or informative charts? Best practices in visual analytics

Small_packed_bubble_chartStephen Few, noted visual analytics expert and the original inspiration for our work in the field, recently wrote about criticisms of best data visualizations practices. In particular, Amanda Cox of the New York Times said, “There’s a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy.” And Nathan Yau of Flowing Data wrote, “in visualization you eventually learn that there’s more to the process than efficient graphical perception and avoidance of all things round. Design matters, no doubt, but your understanding of the data matters much more.” These are both people who have a body of work that I admire but I am also surprised at these comments.

This discussion reminds me of a similar problem in marketing and web analytics. Generating traffic that leads to sales is good. Eventually, someone finds a way to generate traffic that leads to not many new sales, but management is misled to think this must be good since traffic leads to sales. This is similar to “look, this chart is beautiful“, but hard to interpret or understand. So, while we delivered fun graphs, minimal information is shared. This may be good for traffic, but not so much for higher sales.

I suspect that part of this recent criticism can be traced back to Stephen’s recent criticism of Tableau, “Tableau Veers from the Path“. In it, he mentions a new graph type in Tableau, packed bubble charts and contrasts them with bar charts. This is an example of the “avoidance of all things circular”. Is Stephen truly anti-joy@f16 Will an example show him to be wrong@f17 Let’s give it a try and you can judge for yourself.

Here’s a packed bubble chart example Continue reading

Estimating future success rates from initial experience
surveys and observation (tutorial)

A wide range of common business questions are often decided incorrectly because decision-makers overlook, forget or neglect the application of a simple concept from statistics. In this tutorial we will walk you through several examples to avoid this potentially costly mistake. Examples where this technique can help include:

Is my ad worth the price?
Conversion (CTR): how many customers converted to a paying customer after clicking on an Google ad and visiting a special offer web page? Based on the revenue generated is the ad price too high?

How many of my customers have children?
Estimating customer demographics: based on a one day survey in every store, what percent of our entire customer base have children?

Who will win the election?
Survey results: what percent of likely voters will vote for Obama based on the responses from a 1,000 people in a poll?

Bringing down the house?
Winning a bet: if my friend flips a coin 10 times and it landed on heads 9 times, is this a “fair” coin?

All of these questions and many others can be answered with the technique explained and demonstrated in this article.


Which states have the most Miss America winners?

Here is a fun example about the Miss America pageant, it appeared on the home page.

Notice that 27% of users picked the correct state for the most Miss America winners, is that good? Well, we should ask how you would perform if you had no information and simply guessed at the answer. With four choices and only one correct answer, you have a 1 in 4 chance (that’s 1/4 = 25%) of guessing the answer even if you have no clue.

So, is 27% actually better than all of these people just guessing@f4 The answer is “it depends” on a missing piece of information- how many people answered this question. If 100 people answered it and 27 answered correctly, there is a good chance that they are all simply guessing. However, if 10,000 answered this question and 2,700 answered it correctly, there is a good chance that some of them answered better than just guessing.


The classic illustration of success- flip a coin

You may be puzzled at this point. Don’t fear. Let me move to a simpler example, flipping a coin. Believe it or not, it is very similar to the multiple choice question above, with the main difference being the chance of “success”- guessing heads or tails correctly, which is 1 in 2 or 50%. So, if I flip it once and you are right, then 100% of flips were guessed correctly. However, this one flip being guessed correctly wouldn’t lead me to believe that you had the ability to see the future (or that the coin is an unfair coin that is always heads). How many flips guessed correctly would it take@f5 Like I have seen followed in many business situations, what does your intuition or gut say@f6

Five out of five correct@f7
Twelve out of fourteen@f8
80 out of 100@f9

Here’s the good news, there is a simple Continue reading


Data Driven Conference 2012

We are having a great time at the Data Driven Conference in Columbus! Our first session was standing room only and we are presenting the same talk a second time at 1:30 in E161.

Interesting questions include “how do you become better at asking the right questions that lead to better analysis” and “how do you communicate with IT to get better data”?

To buy a copy of The Accidental Analyst, please visit

Here is our infographic that we created Continue reading