An Arbitrary Number of Things Every Data Scientist should Know

By Os Keyes

Every time I see any presentation or writeup about data science it includes a diagram that looks like this:

lol production values

Obviously the production values are normally higher because they’re put together by Big, Impressive, Smart people with resources - rather than me, at 9pm, in Google Draw.

But the diagram almost always appears. And when it doesn’t it’s replaced by prose making basically the same argument: that data scientists are magical wizard unicorn people who shit rainbows and chunder whiskey.

And you get articles claiming you need postgrad work to be a Data Scientist, that some of your Data Scientist coworkers are mundanes who are going to be outsourced, that Data Scientists have to be entrepeneurial. Or that there are 50 specific things all Data Scientists can do. No wait, there are nine! It’s like Cracked but with career opportunities instead of the Marvel Cinematic Universe.

All of this is total shite. Data Scientists are not unicorns. Data Scientists should not be expected to be unicorns. And any “thought leader” or “visionary” who thinks different should be promptly beaten with a sock full of quarters until the stupid comes off, because the defining attribute of a unicorn is that it’s a mythological creature that doesn’t exist.

But this isn’t new bullshit. This is very old bullshit. This is precisely the bullshit applied to frontend engineers five years ago, and engineers generally five years before that (and it’s still applied to both!). This is the idea that there is something magical and special about a particular profession or skillset.

The process normally goes something like this:

  1. First, $field is the Big Upcoming Thing. Everyone is talking about $field. Nobody quite knows what they’re going to do with it, but a stuffed shirt from Oracle says it’s very important and it’s the work of the future, dontchaknow.
  2. Second, building on that narrative, a load of authors and companies pop up to capitalise on $field. They create $Field Boot Camp for learning how to get into $Field. They write Becoming a $Field Member or $Becoming a Better $Field Member or $Field: The Future Is Now!. Some of these works are useful and many of them are mindless pap.
  3. Slightly behind the people with enough animalistic cunning to make money off horking their vague suppositions all over O’Reilly or a rented conference room come the people marginally too dim for that, who start websites regurgitating drivel they vaaaaguely remember from some other article as if it’s Vital! Career-Making! Information!

    Obviously I’m not thinking of any website in particular here, no, no way.
  4. The field is now chock full of people. Many of whom are very nice good people doing nice good useful work. All of whom are surrounded by this narrative that there are Real members of $Field who have these 100 very specific skills and then there are FAKE members of $Field who only have 80 of these things and they are FRAUDS and they are BAD PEOPLE and helloooo impostor syndrome.
  5. We find a new Big Upcoming Thing.

Which would be fine

…I mean, for levels of fine up to and including “a load of people get really stressed and exist in unpleasant environments for a while”…

…if it didn’t leave traces.

But it does. This unhealthy mix of impostor syndrome, toxic power dynamics, cognitive dissonance and aggressive patriarchal me-being-awesome-means-you-feeling-awful nonsense does not go away when the Oracle suits do. It spreads. Because now you’ve got an industry where the mythology and the narrative is toxic and exclusionary and burns people out.

And we know this from engineering. This is not a new problem: software engineering has had it for a while. And what it looks like past the blog posts is “opportunities” that suck the life out of the participant, the devaluation of human beings, and an exclusionary narrative that rejects the Other. It looks like human misery. And we are replicating it here.

Data Scientists are not unicorns. Data Scientists are not “wizards”. The ability to use Stan did not turn you into Gandalf the White. Data Scientists are people, with their own needs and skills and, yes, limitations. And if we perpetuate the idea that Data Science is magical, that Data Scientists are magical, we blind ourselves to both the problems the field can’t solve - and there are many - and the toxic wash left in our wake.

If you want to prove that Data Science is magic, stop saying it’s special and start doing something special instead. Not some new ML or NLP project - how about working on the fact that only twelve percent of data scientists are women. Which, for reference, is worse than computer science. Work on the fact that the inclusion of people of colour is even worse. Work on the fact that the inclusion of LGBT*, particularly Trans*, people, is at best no better than in Engineering. Consider what it means that some of your colleagues write algorithms that literally kill people when they go wrong and yet, somehow, are in the clear by the ethical standards of our field.

Because if Data Science is special, if there’s something powerful and important about this field, then making this field better should be urgent and vital. And if it’s not, to you, well: here’s an article I read on One Word Every Real Data Scientist Should Know.