the angry customer index

If you were creating a statistical model and were setting a percentage goal for how good you wanted it to be, what would you pick? 90%? 95% 99%?

It turns out that you should decide based on how much data you’re working with. In economics, we wanted to be at least 90% confident (statistically speaking) that a variable in our model was meaningful. Other disciplines would use 99%. However, as I’m learning in school, those seemingly strict levels are not strict enough when you get into larger data sets. As your data gets larger, you tend to reach your confidence goal more easily by virtue of the math. According to research, you actually want to select a confidence level of at least 99.7% for 1,000 observations, or 99.98% for 100,000 observations.

The problem is that this is when people’s eyes start to glaze over. It’s like when I read a soap bottle that says it kills 99.7% of germs. Is that much better than the one that kills 99.5% of germs? (Actually maybe I don’t want either of those.) So how do you explain to a manager why your model that you’re 99.00% confident in might not be good enough?

I suggest the “Angry Customer Index”.

Imagine you built a model based on 100 customers out of your entire global customer base. You are 99% confident that the model is correct, which in the context of your model means that on average 99 customers will be happier than they were before. Conversely, you might have up to 1 angry customer.

Now let’s say that you build the same model based on 100,000 customers, and you are similarly 99% confident the model is correct. On average you will have 99,000 happy customers, but potentially 1,000 angry customers.

This is an oversimplification, of course. But it’s a heck of a lot easier to explain than pointing to research discussing p-values and sample sizes. When you’re dealing with large numbers, you don’t have the luxury of a 1% margin of error. When 1 customer is dissatisfied with a product, it’s not really a big deal. But 1,000 customers upset in a short span of time could make headlines, and no executive will want that.

visualizing leptokurtic vs. platykurtic

I keep hearing kurtosis described graphically as the “thickness of the tails”, but I know I’m not alone in finding this unhelpful when looking at histograms and trying to determine whether something is lepto- or platykurtic - even with a Normal distribution drawn on top for comparison.

I personally prefer Wikipedia’s description which describes kurtosis as the measure of “peakedness”, rather than the “thickness” of the “tails”. When looking at a uniform or multimodal distribution, the concept of tails isn’t really useful for me. But I can see when a distribution looks more or less peaked in the middle, or wider toward the top. Try looking at the following graph with that in mind and see what I mean:


Also this:

Hope someone else finds that useful! If you have any tips of your own please share.