Best of the Month – Everybody Lies
Editor | On 21, Nov 2018
Darrell Mann
Here’s one of those polarizing choices for our ‘Best Of’ feature. It’s pop-science, and, these days, that in itself is enough to alienate many potential readers. When book covers declare ‘New York Times Best Seller’, that’s usually a warning sign. And, of course, there is a fair amount of lowest-common-denominator, salacious content to plough through (author, Seth Stephens-Davidowitz wanted to call the book How Big Is My Penis?, as an early example of a case-study-come-story). But there are also a host of nuggets that take this book beyond Cambridge Analytica Big-Brother-Is-Not-Only-Watching-You-But-Making-You-Do-Stuff-The-Real-You-Would-Never-Have-Done danger territory. Stephens-Davidowitz settled for Everybody Lies. The book is subtitled What the Internet Can Tell Us About Who We Really Are and it’s a polished display of some of the early fruits of “big data†science.
Stephens-Davidowitz’s first source, when he set up as a data scientist, was Google Trends, which records the relative frequency of particular searches in different places at different times. He soon added Google Adwords, which registers the actual number of searches. Then he moved on to other vastnesses: Wikipedia, Facebook and then PornHub, one of the largest pornographic sites in the world. PornHub gave him its complete data set, duly anonymised: every single search and video view, and then we get into the seamier territory of a host of, ahem, niche sites like neo-Nazi-central, Stormfront.
Ignoble metadata flowed in. And thus we get things like the discovery that searches for racist jokes rise about 30% on Martin Luther King Day in the US, and that in the recent Republican primaries, regions that supported Donald Trump in the largest numbers made the most Google searches for the n-word. Data from Prosper, a peer-to-peer loan website, showed that there are five expressions in particular that one should beware of when reviewing applications for loans: “Godâ€, “promiseâ€, “will payâ€, “hospital†and “thank youâ€. Making promises “is a sure sign that someone will … not do somethingâ€. “God†is particularly bad news.
There are many such facts waiting to be harvested. For a social scientist such as Stephens-Davidowitz, big data has four central virtues. First, it’s a “digital truth serumâ€: it supplies honest data on matters people lie about in surveys, for instance racist attitudes, but above all sex. Second, it offers the means to run large-scale randomised controlled experiments – which are usually extremely laborious and expensive – at almost no cost, and in this way uncover causal linkages in addition to mere correlations. Third, the sheer quantity of data allows us to zoom in precisely on small subsets of people in a way that was previously impossible. Finally, it provides new types of data.
Stephens-Davidowitz thinks searches of internet pornography habits are probably “the most important development … ever … in our ability to understand human sexualityâ€. They deliver data that “Schopenhauer, Nietzsche, Freud and Foucault would have drooled overâ€.
Some of his sexual facts are depressing, others are funny and touching. Some are engaging because we find them extraordinary, others because we find them all-too-human. The search data suggests that hundreds of thousands of young men are predominantly attracted to elderly women. Many heterosexual men feel about their partner what William Wordsworth felt about his wife Mary (they wish she’d put on weight). Anal sex is on course to overtake vaginal sex in pornography before the end of the decade. Pornography “in which violence is perpetrated against a woman … almost always appeals disproportionately to womenâ€. More than 75% of searches of the form “I want to have sex with my …†are incestuous. Men search for ways to perform oral sex on themselves as often as they search for how to give a woman an orgasm.
There are many unwavering specialisations. For some women, only short fat men with small penises will do; for some men, only massive nipples. Thirty per cent only ever watch pornography of the ugliest kind. But many of us are not as weird as our online behaviour may suggest. Distortion is introduced by the fact that certain types of Google searches “skew towards the forbiddenâ€, and there are numerous subtleties and traps when it comes to the interpretation of data, many of which Stephens-Davidowitz expounds clearly. For all that the numbers are big, and they add up.
“The next Foucault will be a data scientist. The next Freud will be a data scientist. The next Marx will be a data scientist.†This is unlikely, I think, unless the data scientist educators start teaching students how the world works from a first-principle level. Or, better yet, that the other disciplines learn how to do meaningful data science. In any event, by the end of Everybody Lies Stephens-Davidowitz has almost earned his flourishes (“What constitutes data has been wildly reimagined … Everything is data!â€). What he hasn’t done is say enough about the dangers. I expected a reference to Cathy O’Neil, who shows in her book Weapons of Math Destruction (2016) how programs based on big data introduce a frightening new efficiency into predatory advertising, “distort higher education, drive up debt, spur mass incarceration, pummel the poor at nearly every juncture, and undermine democracyâ€. Programs designed with the very best intentions fall into deadly self-confirming feedback loops that confirm their efficacy even as they spiral away from the truth and increase injustice.
One of the greatest dangers of the internet, noted by Daniel Kahneman in his crucial book, Thinking, Fast and Slow (2011), arises from the fact that “people can maintain an unshakable faith in any proposition, however absurd, when they are sustained by a community of like-minded believersâ€. This isn’t any sort of exaggeration; the trouble is that any belief – any prejudice or hatred – can now fairly easily find a critical-mass sized supporting community on the internet.
Stephens-Davidowitz has a reply to some of these worries. He’s a social scientist, and malignant programs aren’t data science in his sense of the term. Their creators aren’t simply trying to describe and explain human behaviour; they’re directing it and manipulating it. Big data isn’t intrinsically dangerous or evil, and it can be extraordinarily valuable and engaging. New facts spring up everywhere. For him “the big point is this: social science is becoming a real science. And this new, real science is poised to improve our livesâ€.
I like Stephens-Davidowitz’s suggestion in a recent interview: “Sometimes I think it would be a good thing if everyone’s porn habits were released at once. It would be embarrassing for 30 seconds … then we’d all get over it and be more open about sex.†But I don’t share his general optimism. I suspect the easy availability of pornography is turning out to be one of the great tragedies of human history, destructive of the best kind of sexual relations. If we had an infallible happyometer that could measure the overall gains and losses to human existence caused by the internet, I think we’d find that the balance was – will increasingly be – negative. Which sounds kind of glass-half-empty. So, just in case we ever want to take the future seriously, we should also say that, as the contradictions against ‘Big Data’ grow, so will the desire to solve them. Which will then loop back to the new breed of first-principle-lead-data-scientists. People that will have understood that rooting out contradictions is one of the main jobs of Big Data, and that society’s most important job is to then set about solving them. The future’s bright. Just after the darkest hour before the dawn.
Meanwhile, all we need to remember is that Everybody Lies is nothing more nor less than a PanSensic source book. Aha, that’s why we like it.