Thursday, August 16, 2018

“The Book of Why- the New Science of Cause and Effect” by Judea Pearl and Dana Mackenzie

Judea Pearl is a professor of computer science at UCLA,  who has been working at the frontiers of machine learning and artificial intelligence for decades. Through this lens, he has become a leader in the new science of causation. Unbeknownst to many even in the disciplines that heavily rely on statistics, Pearl now believes that there has been a revolution in the techniques of teasing out cause from effect. In 2000, he proclaimed, “Causality has undergone a major transformation from a concept shrouded in mystery into a mathematical object with well-defined semantics and well-founded logic. Paradoxes and controversies have been resolved, slippery concepts have been explicated, and practical problems relying on causal information that long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics. Put simply, causality has been mathematized.” This book is Pearl’s effort to explain the science of causality in layman’s terms. He states that “the calculus of causation consists of two languages: causal diagrams, to express what we know, and a symbolic language, resembling algebra, to express what we want to know.” In this model, the data comes last because data is dumb to causal relationships.

The science of causation is so important because for humans “causal explanations, not dry facts, make up the bulk of our knowledge, and should be the cornerstone of machine intelligence…. Our transition from processors of data to makers of explanations was not gradual; it was a leap that required an external push…. Our ancestors’ capacity to imagine nonexistent things was the key to everything…. The connection between imagining and causal relations is almost self-evident.” The three levels of causation are “seeing or observing, [which] entails detection of regularities…. doing [which] entails predicting the effect(s) of deliberate alterations of the environment and choosing among these alterations to produce a desired outcome…. [and] understanding that permits imagining.” Therefore, Pearl has invented the metaphor of a Ladder of Causation, which consists of three rungs: association, intervention, and counterfactuals. Only the top two rungs involve causation. However, almost all animals and all current learning machines are stuck on the bottom rung. Intervention asks the question, “What if we do…? What Will happen if we change the environment?” Counterfactuals go back into time and require imagination. They ask, “what if things had been different?” This is skill, more than likely, unique to humans.

The bulk of Pearl’s book mixes the history of statistics with the history of a small set of outlaws, who dared to ask “why” questions, instead of just being content with correlations. Sewall Wright “was the first person to develop a mathematical method for answering causal questions from data, known as path diagrams…. [The path diagram] was the first bridge ever built between causality and probability, the first crossing of the barrier between rung two and rung one on the Ladder of Causation. Having built this bridge, Wright could travel backward over it, from the correlations measured in the data (rung one) to the hidden causal quantities.” As Wright himself wrote, defending his method from hostile attacks by the statistical establishment, “the combination of knowledge of correlations with knowledge of causal relations to obtain certain results, is a different thing from the deduction of causal relations from correlations.”

Pearl fully admits that causal diagrams require scientists to step out of their comfort zone of objectivity, explicitly. He states that “drawing a path diagram is not a statistical exercise; it is an exercise in genetics, economics, psychology, or whatever the scientist’s own field of expertise…. Causal analysis requires the user to make a subjective commitment. She must draw a causal diagram that reflects her qualitative belief—or, better yet, the consensus belief of researchers in her field of expertise—about the topology of the causal processes at work.” That is why Pearl came to causal science first through his work on Bayesian networks. “The prototype of Bayesian analysis goes like this: Prior Belief + New Evidence —> Revised Belief…. Bayesian statistics gives us an objective way of combining the observed evidence with our prior knowledge (or subjective belief) to obtain a revised belief and hence a revised prediction of the outcome…. To articulate subjective assumptions, Bayesian statisticians still use the language of probability…. The assumptions entering causal inference, on the other hand, require a richer language.” In causal diagrams, “each arrow can be thought of as a statement about the outcome of a hypothetical experiment…. Whereas a Bayesian network can only tell us how likely one event is, given that we observed another (rung-one information), causal diagrams can answer interventional and counterfactual questions.” The use of such diagrams allows the scientist ““provisional causality,” that is, causality contingent upon the set of assumptions that our causal diagram advertises…. They have the advantage of being conducted in the natural habitat of the target population, not in the artificial setting of a laboratory, and they can be “pure” in the sense of not being contaminated by issues of ethics and feasibility…. One of the major accomplishments of causal diagrams is to make the assumptions transparent so that they can be discussed and debated by experts and policy makers…. The diagram encodes the causal story behind the data.”

Pearl ends his book by circling back around to his expertise of machine learning and the state of the field today. He states, “With Bayesian networks, we had taught machines to think in shades of gray, and this was an important step toward humanlike thinking. But we still couldn’t teach machines to understand causes and effects…. Without the ability to envision alternate realities and contrast them with the currently existing reality, a machine cannot pass the mini-Turing test; it cannot answer the most basic question that makes us human: “Why?”” Machine learning will have to go beyond deep learning and gathering big data sets to get there. “In technical terms, machine-learning methods today provide us with an efficient way of going from finite sample estimates to probability distributions, and we still need to get from distributions to cause-effect relations…. A strong AI should be a machine that can reflect on its actions and learn from past mistakes. It should be able to understand the statement “I should have acted differently”…. Intent is a very important part of personal decision making…. The ability to conceive of one’s own intent and then use it as a piece of evidence in casual reasoning is a level of self-awareness (if not consciousness) that no machine I know of has achieved…. Thinking in terms of intents, therefore, offers us a shorthand to convert complicated causal instructions into simple ones…. The algorithmization of counterfactuals is a major step toward understanding these questions and making consciousness and agency a computational reality…. I believe that the software package that can give a thinking machine the benefits of agency would consist of at least three parts: a causal model of the world; a causal model of its own software, however superficial; and a memory that records how intents in its mind correspond to events in the outside world.”

No comments:

Post a Comment