Saturday, July 8, 2017

Book Review: Everybody Lies--Big Data, New Data...

I did finish the book: Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who We Really Are, by Seth Stephens-Davidowitz. I count myself among the elite, since the author claims that fewer than 10% of readers finish reading economics books. And it speaks to my opinion--I recommend all of you to follow my example and read it to the end.

Seth (the author refers to himself that way, and in the interests of brevity I will follow his lead) is a trained economist who studies Big Data, by which he means millions and even billions of data points. Big Data is distinct from small data, aka survey data, where a researcher polls some relatively small sample of subjects--probably less than a thousand. His primary source is Google. Through Google Trends (and a special relationship with the company for which he once worked) he has access to every search anybody has ever made (suitably anonymized). And not just Google, but also PornHub, one of the largest pornography sites in the US, and Facebook. All of this gets processed and analyzed, turned into statistics and conclusions.

Seth is suitably modest about his endeavor. Most of the book illustrates the virtues of Big Data, but the last chapter discusses the limitations. The major shortcoming is conflating correlation with causation, which one shouldn't do. Big Data is good at the former, but survey data is often essential to uncover causal relationships. The two together offer the most complete picture.

He refers to Google searches as truth serum. People in their darkest hours or horniest moments confide in a Google search bar for help, though as Seth often asks, what answer do they actually expect to receive?

Seth claims that Big Data will turn the social sciences into true sciences, i.e., disciplines with definitive truth statements about human behavior. I think he's wrong here, and some of the flaws in his book illustrate that.

I cite three examples where I think his enthusiasm leads him astray.

First, he completely misunderstands Sigmund Freud. He refers to The Psychopathology of Everyday Life (a book I read as an undergraduate), aka Freudian slips. Those refer to slips of the tongue, e.g., if I inadvertently say sex instead of flex. So, when Google searchers type penistrian instead of pedestrian, Seth (he with the dirty mind) assumed they were making a Freudian slip. And then uses Big Data to prove that they weren't--it's simply a fat fingers effect.

I could have told him that. Freudian slips are always verbal, never written. Freud, of course, never saw a computer, much less a Google search bar, nor do I think he ever used a keyboard. The closest analogy to fat fingers I can think of is when strangers meet each other on the street. They sometimes do a little dance to determine which way to get past each other. Freud concluded that might occasionally reflect something sexual, but most of the time it is simply a miscommunication between two pedestrians. No psychopathology at all.

So Seth has misread Freud, and also draws a conclusion obvious to anybody who has read Freud (even if that was 40 years ago). No Big Data required.

Second, Seth asks an interesting question: "Why do some parts of the country appear to be so much better at churning out America's movers and shakers?" He goes through a list: Madison, WI, Berkeley, CA, etc. They're all college towns. To which he says: "Some of it may well be due to the gene pool: sons and daughters of professors and graduate students tend to be smart...But there is most likely something more going on: early exposure to innovation."

Really? When I think of "innovation," college is the last thing that comes to mind. A more sclerotic, hidebound, conservative, politically-correct institution is hard to imagine. Seth cites new art and music, and while that's outside of my bailiwick my experience leads me to think colleges fail there as well. So I'll posit another reason for the geographic effect our author identifies.


Jews, who are very smart, for historical reasons have congregated in college towns. This attracts more Jews, and also more people who prefer living around other smart people. The result is you end up with "movers and shakers." It has nothing to do with the now moribund institution called college, except as the historical cause for the initial effect.

My model predicts that "moving and shaking" will correlate strongly with relatively high Jewish populations. Of course Seth never bothered to check that, so we don't know. What he did check reflects his bias, not necessarily reality, and represents a reason to think the social sciences will never become true sciences.

Seth's biases show most egregiously when it comes to politics. He admits to being one of Bernie's Bro's, though hopefully not part of the brown shirt gang that forced the cancellation of a Trump campaign rally in Chicago. And he, being most uncharitable toward his fellow citizens, looks for any reason he can think of to dub Trump voters "racist." Of course he discovers this on Google (I think if you look hard enough you can discover anything you want on Google). Searches for racist jokes are most common in regions that voted for Trump! QED.

Elsewhere in the book he contradicts his own argument. He says: "Four days after the shooting [in San Bernardino--ed] then president Obama gave a prime-time address to the country...But searches calling Muslims 'terrorists,' 'bad,' 'violent,' and 'evil' doubled during and shortly after the speech...Yet searches for 'kill Muslims' tripled during his speech. In fact, just about every negative search we could think to test regarding Muslims shot up during and after Obama's speech...".

In other words, Obama--however unintentionally--turned his audience into stark, raving Islamophobes.

I understand that completely. Obama's insufferable, patronizing self-righteousness engenders rebellion from any sentient human being. One types in transgressive Google searches just out of spite. (Black Lives Matter and "Check your privilege" partisans have the same effect.) So it's not that the country is unusually racist. It's rather that Obama was a spectacularly talented asshole.

I'll suggest that Trump's more laid-back attitude will reduce anger among the white "racists." Perhaps Seth can check if racist searches have declined since he's been in office?

Causality in society is dense. There is no one reason for anything. The causes Big Data discovers will be the causes the researcher chooses to look for in the first place. It is impossible for it to be otherwise, but it won't result in science.

Seth's book is fascinating and well worth reading. The only thing wrong with it is he doesn't realize that his bias is showing.

