[link] The danger of predictive analytics - Carpe Datum

This article is a great, short read (by the excellent Buck Woody, whom I’ve had the fortune to meet at a few conferences) that leads me to two points that I hold close to my heart regarding analytics. As I’m four weeks into my degree, I hope you’ll forgive me if I’m not solid on some of the definitions yet.

1. Predictive analytics isn’t a crystal ball.

You’re never going to be right 100% of the time. I think when people see things like Nate Silver predicting the game incorrectly, they want to make sweeping generalizations about how this proves predictive analytics is all made up and we should just go back to the nice, comfortable world of 100% human judgment.

The problem with this is that there’s a huge difference between predicting whether or not a single event will occur (such as the winner of a specific sports game or election, which I believe is technically forecasting) and predicting the average outcome of a repeated event (such as how much the average customer will buy if shown a certain ad, which I think is predictive modeling). We might be wrong just as often in both events, but one has a bigger impact and is easier to point fingers at. Personally, I’d rather do the latter.

2. Analytics is far from being an out-of-the-box solution. 

Yes, there are and will continue to be solutions popping up to make analytics easier for untrained (and trained) professionals. But the most important problems cannot be reduced to computer algorithms with minimal expert input by the user. If they could, my program wouldn’t have such amazing employment statistics.

You can plug a data set and some variables into a program, but if you don’t know what you’re doing you will never have a clue that your metaphorical Siberian meteor is much bigger than you think. At least the JPL researcher knew the weaknesses of his model - do untrained dabblers know their model outcomes aren’t gospel?

The skill in analytics isn’t for using the software and knowing the secret keystrokes to make models “go”, it’s knowing how your data is structured, how to detect, diagnose and treat issues, and estimating how wrong you might be despite your best efforts.

Just a few of many reasons why I love analytics.

random observations from today

1. Somewhere between heteroskedasticity and ANOVA, I deleted righty-tighty, lefty-loosey

2. A classmate suggested that coding while standing optimizes your ability to cheer when you do something right, making the whole experience invigorating.

3. Nerds don’t need grades to motivate them to put in the work and do their best. But if you add a grade into the equation (particularly if it is one of 3 grades that determines whether or not you get kicked out) the stress level will increase dramatically. 

4. I feel like there’s some kind of analogy to be made between #3 and the way you treat employees, but it’s 4pm on day 5 of intensive statistics training.