Creative Lessons from Data Science Demigod Hadley Wickham
RStudio Chief Scientist Hadley Wickham

RStudio Chief Scientist Hadley Wickham

The auditorium at George Washington University was packed with data scientists of all ages and abilities. We came to see one of data science’s demigods and RStudio’s Chief Scientist, Hadley Wickham. Stuffed with sponsored pizza, we sat at the edges of our seats to glean a few drops of wisdom.

We came to learn from a master. Instead we learned that Hadley is just like us. He makes mistakes. A lot of them. And it made all of us laugh. Not because Hadley was the fool, but rather because we all saw him fight through the same typos, errors, conflicts, and sometimes inexplicable bugs that we all face, every day.

The Psychology of Frustration vs. Creative Determination

During his 90 minute presentation, the creator of R’s most downloaded packages live coded a new package before our eyes. His goal was to show us how easy package creation can be. While he took us through each step, none of it was flawless or truly easy. But each time he made a mistake or prompted an error message, we were reminded of a basic truth of programming.

Whether you are a coding demigod or neophyte, you will make a lot of mistakes. The difference between success and failure is how you deal with them. As Hadley put it, “I make a lot of mistakes. I’m just good at fixing them fast.” And it doesn’t hurt to have a great sense of humor about it all.

Being a good programmer requires more than technical chops and rapid problem solving: rather it’s creativity, tolerance for failure, and optimism.

To any experienced programmer, this might seem an obvious insight. But Hadley’s fame in data science stems largely from his empathy — he makes data science accessible to the uninitiated. For this part of the audience, seeing him in action was inspirational. And for our team at Deducive, it was a reminder of what it takes to overcome day to day obstacles like integrating with Facebook’s fantastically undocumented API.

Process Matters

There’s no doubt that determination is critical to Hadley’s success. But brute force hacking backed by a dose of humor isn’t everything. Process matters, a lot.

Again, any experienced programmer is familiar with the basic principles software development like unit testing. Seeing Hadley do it made it clear that even those relatively new to R can do it too. And he showed how easy it is to build documentation and sensible error messages.

Just as important, however, is the role of research in Hadley’s process. When asked if he’d regularly let the public watch him code, he said “It would be horrific.” And he said we’d all see him spend an awful lot of time researching, especially on StackOverflow, where he learns like the rest of us (and makes some popular contributions).

The Limits of Packages

The 12,000 packages available for download on CRAN today offer an amazing array of functionality, extensibility, and time saving shortcuts. But Hadley pointed out some important limitations worth remembering:

  • Packages aren’t great for analysis.

  • Nor are they really good for reporting.

As Hadley pointed out, “Packages can be just for you.” When you find yourself needing the same functionality over and over again, creating a package may be a huge time saver in the long run.

More About Hadley

Hadley has written the definitive text on R for Data Science.

And if you didn’t know the scope of Hadley’s work, a quick run down of his packages:

DATA SCIENCE

DATA IMPORT

  • readr for reading .csv and fwf files.

  • readxl for reading .xls and .xlsx files.

  • haven for SAS, SPSS, and Stata files.

  • httr for talking to web APIs.

  • rvest for scraping websites.

  • xml2 for importing XML files.

SOFTWARE ENGINEERING


And thanks to Data Community DC for organizing the event!

How to AI-Wash Your Company in 3 Easy Steps
Source: GLG

Source: GLG

The wait is over, Artificial Intelligence is here. If your company isn’t doing it, start looking for a new job. Obsolescence, decline, and/or bankruptcy are around the corner.

The evidence is everywhere. Take for example drinks giant Coca-Cola, which used the mighty power of Artificial Intelligence to come up with a new flavor, Cherry Sprite, based on the mixes we puny humans selected from their make-your-own “Freestyle” machines. Greg Chambers, Coca-Cola’s head of digital innovation summed up their AI-powered strategy at a conference recently:

“AI is the foundation for everything we do. We create intelligent experiences. AI is the kernel that powers that experience.”

It's hard to argue with that. Everyone knows that AI is what makes soda taste good. Just like everyone knows that naming your AI after a Sherlock Holmes character makes you an AI company.

But 130-year-old corporate behemoths aren’t the only ones pivoting to the AI future. Remember all those “cloud” startups that were popping up everywhere 5 years ago? Or the “green” companies from 10 years ago? Well, most of them are still around, and now they too are using AI!

According to Gartner analyst Jim Hare: “Nearly every technology provider is now claiming to be an AI company,” He even counted more than 1,000 vendors who sell AI or bake it into their products. Gartner predicts that by 2020, AI will be pervasive in every software product pitch.

And a recent study published in the MIT Sloan Management Review found that of 3,000 executives surveyed, 85% believed that AI would be transformative in their companies. Nevermind that only 20% actually incorporate AI into any of their offerings — AI is the future!

The good news is you don’t need to possess a deeply intimate understanding of Artificial Intelligence to start bathing in its glory. Just follow these three easy steps, and your company is bound for success:

  1. Replace “Automation” with “Artificial Intelligence” in Your Marketing

  2. Replace “Analytics” with “Artificial Intelligence” in Your Marketing

  3. Replace “Application” with “Artificial Intelligence” in Your Marketing

BONUS!!

More good news: you can even use AI to AI-wash your company’s marketing materials. Using your personal AI machine, follow the steps above, but begin by typing in “Control-F”...

DOUBLE BONUS!!!!

Some excellent tips in these articles on how to look like a real AI company:

Reshaping Business with Artificial Intelligence (MIT Sloan Management Review)

Has IBM Watson's AI Technology Fallen Victim to Hype? (Fortune)

The AI Resurgence: Why Now? (Wired — from 2 years ago)

Hyping Artificial Intelligence, Yet Again (New Yorker -- from 4 years ago!)

Why the AI Hype Train is Already off the Rails and Why I’m Over AI Already (Medium)

TRIPLE BONUS!!!!!!

Did you know artificial intelligence used to be even MORE popular?  Here’s a requisite chart from everyone’s favorite not-evil AI company, Google:

ai_bigdata.png
The Top 25 Coolest Data Science Terms
confusedscientists copy.jpeg
 

Machine Learning.

Neural Networks.

Hierarchical Clustering.

Does anyone know what these terms really mean? Sure, a handful of nerds know. Like the roughly 50,000 ranked Kaggle members.

But these esoteric terms only scratch the surface of what the world’s data scientists have come up with to describe, promote, and ultimately obfuscate their day to day jobs.

We thought it would be worthwhile to catalog some of the best, but lesser known, terms of art in data science, ranked by how cool they sound.

 
 

THE TOP 25 COOLEST SOUNDING DATA SCIENCE TERMS

1      Hyperplane

2      Hyperparameter

3      Gradient Descent

4      Confusion Matrix

5      Softmax

6      Monte Carlo Simulation

7      Multi-armed Bandit

8     The Curse of Dimensionality

9     Naive Bayes

10   Max Pooling

11    Cross-entropy Loss

12   Centroid

13   Axon

14   Hierarchical Clustering

15   P Hat

16   Dendogram

17   Epoch

18   K-fold Cross Validation

19   The Kernel Trick

20  Hidden Layers

21   Rectifier function

22   Artificial Neuron

23   Agglomerative

24   Bootstrapping

25   Lasso

 

Looking for more? We can’t say that all of these terms sound cool, but KDNuggets has assembled a fairly definitive glossary of 277 data science terms with references and occasional pictures.

And we try to publish a running list of cool terms in the Deducive Twitter feed.

 

A Note on Methodology

Our rankings are based on a rigorously unscientific process involving opinion, argument, and flights of fancy. However, it is safe to say that many of these terms are growing rapidly in popularity.

We’ve charted Google search trends for the top 5 terms using the gtrendsR package below.

 
Source: Google Trends, September 2017

Source: Google Trends, September 2017

Deducive, Defined

Sherlock Holmes, Donald Trump, and the Data Science Paradox

 

“This is indeed a mystery,” I remarked. “What do you imagine that it means?”
“I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. But the note itself. What do you deduce from it?”

― Arthur Conan Doyle, Sherlock Holmes: A Scandal in Bohemia

 

Yes, “deducive” is a real word. And it’s the name we chose for our company. We chose it for its rational correlation to logic and the scientific method, as well as its emotional connection to a great fictional sleuth.

But, in data science terminology, it may have been a bad choice.

The full meaning of deducive — as it relates to deductive reasoning — represents only one of three primary modes of problem solving in data science. The other methods — inductive and abductive reasoning — are actually more important. Understanding them all reveals a paradox in data science relating to the nature of facts, probability, and the burden of proof necessary for business decision making.

 

Deductive Reasoning Uses Facts to Find Facts

Deductive reasoning is top-down: you begin with facts to form a hypothesis that is then tested with more facts to draw an inescapable conclusion. In other words, you reduce facts from a general theory to specific, factual conclusions.

For example, modeling from Aristotle’s famous syllogism about Socrates’ mortality:

  1. Donald Trump has a personal Twitter account

  2. Donald Trump won the US Presidency

  3. The President tweets from his personal account

Though the soundness of this argument (and the tweets) is questionable, it is illustrative. We reduce facts to find facts. Thus the deductive process is ideally suited to fields of inquiry where the certainty of a conclusion is critical.

But deduction is also implicitly limited in its applications by the availability of facts and certainty of premises. In the practical application of data science in a business setting, this can be a problem.

 

Inductive Reasoning Uses Facts to Extrapolate Conclusions

What happens when you have a hypothesis that is itself uncertain? Inductive reasoning takes a bottom-up approach. With inductive reasoning, you can extrapolate general theories from specific facts. In data science terms, you examine a large set of data to determine the probability that your hypothesis is correct.

  1. Donald Trump’s tweets originate from both iPhones and an Android devices

  2. Tweets from Trump’s Android device are 40-80% more likely to be negative

  3. Donald Trump tweets from an Android device; his staff uses iPhones

During the 2016 US Presidential campaign, Stack Overflow’s David Robinson used inductive reasoning (via sentiment analysis) to explore a hunch he and others had: that Trump’s most hyperbolic tweets originated directly from his own personal phone whereas his more even-handed tweets originated from his campaign staff, largely on iPhones.

While the findings were fascinating and generally confirmed David’s hunch, the conclusions could not be called certain (even though they were confirmed again in 2017). And, as Holmes pointed out, the facts can be twisted to fit theories.

But how much certainty is really needed to understand a problem — or make a business decision? Inductive reasoning offers probable conclusions, not certain truth.

 

Abductive Reasoning Uses Facts to Infer the Most Likely Explanation

In data science (and science generally), sometimes you don’t know the precise nature of the problem you’re trying to solve — or have a complete set of observations to create a theory. Abductive reasoning, considered by philosophers to be a variety of inductive reasoning, infers the hypothesis that best fits observable facts.

In other words, when we find a model that explains the data better than any other option, this model is probably the correct one. This part of data science is the most creative, requiring flexibility and imagination as well as a keen understanding of where the data might be misleading.

In fact, many of Holmes’ famous deductions were actually examples of abductive reasoning. When he proposes the solution to a murder mystery, he uses evidence to create a theory that best fits the available facts. His brilliance is in his ability to uncover facts and create theories — not his use of deductive reasoning.

Here at Deducive, we take inspiration from Holmes’ creator, and don’t get too caught up in the linguistic and philosophical differences between deductive, inductive, and abductive reasoning. Though data science is based on statistics and mathematical theory, creative thinking and strategic insight are far more important to making the right decision.