Creative Lessons from Data Science Demigod Hadley Wickham
The auditorium at George Washington University was packed with data scientists of all ages and abilities. We came to see one of data science’s demigods and RStudio’s Chief Scientist, Hadley Wickham. Stuffed with sponsored pizza, we sat at the edges of our seats to glean a few drops of wisdom.
We came to learn from a master. Instead we learned that Hadley is just like us. He makes mistakes. A lot of them. And it made all of us laugh. Not because Hadley was the fool, but rather because we all saw him fight through the same typos, errors, conflicts, and sometimes inexplicable bugs that we all face, every day.
The Psychology of Frustration vs. Creative Determination
During his 90 minute presentation, the creator of R’s most downloaded packages live coded a new package before our eyes. His goal was to show us how easy package creation can be. While he took us through each step, none of it was flawless or truly easy. But each time he made a mistake or prompted an error message, we were reminded of a basic truth of programming.
Whether you are a coding demigod or neophyte, you will make a lot of mistakes. The difference between success and failure is how you deal with them. As Hadley put it, “I make a lot of mistakes. I’m just good at fixing them fast.” And it doesn’t hurt to have a great sense of humor about it all.
Being a good programmer requires more than technical chops and rapid problem solving: rather it’s creativity, tolerance for failure, and optimism.
To any experienced programmer, this might seem an obvious insight. But Hadley’s fame in data science stems largely from his empathy — he makes data science accessible to the uninitiated. For this part of the audience, seeing him in action was inspirational. And for our team at Deducive, it was a reminder of what it takes to overcome day to day obstacles like integrating with Facebook’s fantastically undocumented API.
There’s no doubt that determination is critical to Hadley’s success. But brute force hacking backed by a dose of humor isn’t everything. Process matters, a lot.
Again, any experienced programmer is familiar with the basic principles software development like unit testing. Seeing Hadley do it made it clear that even those relatively new to R can do it too. And he showed how easy it is to build documentation and sensible error messages.
Just as important, however, is the role of research in Hadley’s process. When asked if he’d regularly let the public watch him code, he said “It would be horrific.” And he said we’d all see him spend an awful lot of time researching, especially on StackOverflow, where he learns like the rest of us (and makes some popular contributions).
The Limits of Packages
The 12,000 packages available for download on CRAN today offer an amazing array of functionality, extensibility, and time saving shortcuts. But Hadley pointed out some important limitations worth remembering:
Packages aren’t great for analysis.
Nor are they really good for reporting.
As Hadley pointed out, “Packages can be just for you.” When you find yourself needing the same functionality over and over again, creating a package may be a huge time saver in the long run.
More About Hadley
Hadley has written the definitive text on R for Data Science.
And if you didn’t know the scope of Hadley’s work, a quick run down of his packages:
ggplot2 for visualising data.
dplyr for manipulating data.
tidyr for tidying data.
stringr for working with strings.
lubridate for working with date/times.
readr for reading .csv and fwf files.
readxl for reading .xls and .xlsx files.
haven for SAS, SPSS, and Stata files.
httr for talking to web APIs.
rvest for scraping websites.
xml2 for importing XML files.
devtools for general package development.
roxygen2 for in-line documentation.
testthat for unit testing
And thanks to Data Community DC for organizing the event!