Reading by the Numbers: When Big Data Meets Literature

It’s a question that draws heated answers. Digital humanities has been accused of fetishizing science, of acting as a Trojan horse for the corporate forces threatening the university, and worse. A recent broadside in The Chronicle of Higher Education called “The Digital-Humanities Bust” took a bludgeon to the field’s revolutionary rhetoric, with Mr. Moretti among those accused of issuing a stream of vague “promissory notes” for results that never arrive.

Mr. Moretti — who prefers to call the lab’s work “computational criticism” — tends to greet such challenges with a mixture of modesty and bravado.

“Our results are not as good as what I had hoped for 10 or 15 years ago,” he said in an interview earlier this month, during a brief trip to New York. “We have not yet created a revolution in knowledge. But how much of literary scholarship is even trying to do that?”

Mr. Moretti, who was born to teacher parents in a small town in northern Italy (his brother is the filmmaker Nanni Moretti), represents something of a paradox. He’s an intellectual trained in the grand European tradition who questions its most cherished methods. And he’s a professor who has achieved some measure of celebrity by promoting a ruthlessly impersonal idea of both scholarship and literary history itself.

Literary criticism typically tends to emphasize the singularity of exceptional works that have stood the test of time. But the canon, Mr. Moretti argues, is a distorted sample. Instead, he says, scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

“We know how to read texts,” he wrote in a much-quoted essay included in his book “Distant Reading,” which won the 2014 National Book Critics Circle Award for Criticism. “Now let’s learn how to not read them.”

These days, Mr. Moretti has softened his rhetoric, though the underlying point is the same.

“Reading is one of life’s greatest pleasures,” which we “would be insane” to give up, he said. “But the question is whether reading and knowledge are continuous with each other.”

Mr. Moretti’s own output has a similar dividedness. His early work was grounded in close reading, and his last book “The Bourgeois: Between Literature and History,” included fine-grained analysis of classic works.

That his digital provocations command wide interest even among highly skeptical colleagues may owe something to what Leah Price, professor of English at Harvard, called a “Nixon in China” effect.

“Only because his own close readings are so dazzling, does Moretti have the credibility to say: Read as closely as you want, but if you want to understand literary history you’ll need other tools,” she said.

The literary quality of the lab’s pamphlets, which are usually credited to teams of researchers, also doesn’t hurt. Yes, they bristle with bar charts, scatterplot diagrams and sometimes eyeball-blistering terminology. (“‘Operationalizing’ must be the ugliest word I’ve ever used,” Mr. Moretti writes in the first sentence of a solo-authored pamphlet called… “Operationalizing.”)

But they are also full of witty asides and suspenseful first-person narration that acknowledges surprises, dead-ends and the collaborative, experimental nature of the lab’s work.

“They’re very good at dramatizing the method,” said Ted Underwood, a professor at the University of Illinois who also uses computational analysis. “That’s part of the fun of reading them.”

Photo

The publication of some of the Stanford Literary Labs pamphlets in “Canon/Archive” prompts a larger question: What has the Big Data approach to literature added up to? Credit Alex Welsh for The New York Times

Some of the lab’s results may seem less than earthshaking. For example, it turns out that what distinguishes the Gothic novel isn’t just castles and ghosts, but more frequent use of the certain verb tenses and prepositions. (The critic Kathryn Schulz, writing about some of the early pamphlets in The New York Times Book Review in 2011, said she “mostly vacillated between two reactions: ‘Huh?’ and ‘Duh!’”)

But even modest-seeming results — like the finding that from 1785 to 1900 the language of the British novel steadily shifted away from words relating to moral judgment to words associated with concrete description — unsettle established ideas of literary history.

“We tend to see literary history as a story of movements, periods, sudden revolutions,” Mr. Underwood said. “There also these really broad, slow, massive changes that we haven’t described before.”

Some of the lab’s findings have themselves had sudden, and totally unexpected, real-world results. Mr. Moretti noted with amusement a flap last spring at the World Bank, where Paul Romer, the chief economist, was relieved of some management duties at its research arm after demanding, among other changes, that its publications reduce their use of the word “and” — one of the stylistic tics mocked in “Bankspeak,” a Lit Lab pamphlet analyzing the bank’s drift over 60 years toward more abstract and “self-referential” language.

“How many literary critics can say they got the chief economist of the World Bank, O.K., not fired, but deprived of some of his power?” Mr. Moretti said with a laugh.

Since retiring from Stanford, Mr. Moretti has moved to Lausanne, Switzerland, where he is helping to create a new digital humanities program at the country’s leading polytechnic.

He also has a full slate of writing projects, including a study of forgotten 19th-century British best-sellers and an algorithm-free book based on his undergraduate lectures on American culture, to be published by Farrar, Straus and Giroux.

Even if the results of computational criticism never catch up with his early polemical fervor, Mr. Moretti remains unapologetic about trying.

“I’d rather be a failed revolutionary,” he said, “than someone who never tried to do a revolution in the first place.”

Continue reading the main story

Leave a Response