Sherlock Holmes, Donald Trump, and the Data Science Paradox
“This is indeed a mystery,” I remarked. “What do you imagine that it means?”― Arthur Conan Doyle, Sherlock Holmes: A Scandal in Bohemia
“I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. But the note itself. What do you deduce from it?”
Yes, “deducive” is a real word. And it’s the name we chose for our company. We chose it for its rational correlation to logic and the scientific method, as well as its emotional connection to a great fictional sleuth.
But, in data science terminology, it may have been a bad choice.
The full meaning of deducive — as it relates to deductive reasoning — represents only one of three primary modes of problem solving in data science. The other methods — inductive and abductive reasoning — are actually more important. Understanding them all reveals a paradox in data science relating to the nature of facts, probability, and the burden of proof necessary for business decision making.
Deductive Reasoning Uses Facts to Find Facts
Deductive reasoning is top-down: you begin with facts to form a hypothesis that is then tested with more facts to draw an inescapable conclusion. In other words, you reduce facts from a general theory to specific, factual conclusions.
For example, modeling from Aristotle’s famous syllogism about Socrates’ mortality:
Donald Trump has a personal Twitter account
Donald Trump won the US Presidency
The President tweets from his personal account
Though the soundness of this argument (and the tweets) is questionable, it is illustrative. We reduce facts to find facts. Thus the deductive process is ideally suited to fields of inquiry where the certainty of a conclusion is critical.
But deduction is also implicitly limited in its applications by the availability of facts and certainty of premises. In the practical application of data science in a business setting, this can be a problem.
Inductive Reasoning Uses Facts to Extrapolate Conclusions
What happens when you have a hypothesis that is itself uncertain? Inductive reasoning takes a bottom-up approach. With inductive reasoning, you can extrapolate general theories from specific facts. In data science terms, you examine a large set of data to determine the probability that your hypothesis is correct.
Donald Trump’s tweets originate from both iPhones and an Android devices
Tweets from Trump’s Android device are 40-80% more likely to be negative
Donald Trump tweets from an Android device; his staff uses iPhones
During the 2016 US Presidential campaign, Stack Overflow’s David Robinson used inductive reasoning (via sentiment analysis) to explore a hunch he and others had: that Trump’s most hyperbolic tweets originated directly from his own personal phone whereas his more even-handed tweets originated from his campaign staff, largely on iPhones.
While the findings were fascinating and generally confirmed David’s hunch, the conclusions could not be called certain (even though they were confirmed again in 2017). And, as Holmes pointed out, the facts can be twisted to fit theories.
But how much certainty is really needed to understand a problem — or make a business decision? Inductive reasoning offers probable conclusions, not certain truth.
Abductive Reasoning Uses Facts to Infer the Most Likely Explanation
In data science (and science generally), sometimes you don’t know the precise nature of the problem you’re trying to solve — or have a complete set of observations to create a theory. Abductive reasoning, considered by philosophers to be a variety of inductive reasoning, infers the hypothesis that best fits observable facts.
In other words, when we find a model that explains the data better than any other option, this model is probably the correct one. This part of data science is the most creative, requiring flexibility and imagination as well as a keen understanding of where the data might be misleading.
In fact, many of Holmes’ famous deductions were actually examples of abductive reasoning. When he proposes the solution to a murder mystery, he uses evidence to create a theory that best fits the available facts. His brilliance is in his ability to uncover facts and create theories — not his use of deductive reasoning.
Here at Deducive, we take inspiration from Holmes’ creator, and don’t get too caught up in the linguistic and philosophical differences between deductive, inductive, and abductive reasoning. Though data science is based on statistics and mathematical theory, creative thinking and strategic insight are far more important to making the right decision.