Measuring Rappers by Their Vocabularies Is Dumb

May 5, 2014 | Andy Cush

The Largest Vocabulary in Hip Hop,” a data visualization project by Matthew Daniels that’s been making waves since it made the front page of Reddit over the weekend, is interesting for a moment. Quantitative analysis is trendy, and so is rap music, so why not combine the two?

One, because using statistics to glean anything interesting or valuable from art — unless, of course, the art itself is based in statistics — is largely a fool’s errand. FiveThirtyEight already tried this with Romeo & Juliet. Their study’s big reveal? That the star-crossed lovers don’t actually talk to each other that much — a fact that feels meaningful at first, but, once you consider how important the couple’s apart-ness is to the narrative, becomes banal. The numbers tell us nothing about the important things — the beauty of the language, the arc of the story, the feeling of being in love.

Likewise, the results of the vocabulary study couldn’t be more obvious. Is anyone surprised that Aesop Rock — a guy who once wrote a verse comprised only of words beginning with L, S, and D — came out on top? Or that GZA, whose nickname is “The Genius,” for Christ’s sake, is in second? We get to pat ourselves on the backs for appreciating Wu-Tang’s verbiage and laugh at DMX for being at the bottom, but else do we learn?

This kind of analysis also reinforces the shitty, condescending idea that rap is only worthwhile when it’s openly skillful or brainy — that a wordy, boring-as-hell MC like Canibus is somehow objectively better than a bleeding-edge songsmith like Future or Young Thug because he talks about “researching footnotes” and “thousand-volt thunderbolts.” Rap is about a convergence of many, many things, most of which are a lot harder to quantify than a word-count. The thrill of hearing Nicki Minaj switch between patois and her girly-girl Barbie voice on “Monster” won’t show up on a chart.  We’d balk at a study claiming Mahavishnu Orchestra are better than the Rolling Stones because they use more chords; why should rap be any different?

Lastly, Daniels’ methodology is all kinds of janky:

35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don’t have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

Why 35,000 words? And why aren’t mixtapes included? Totemic artists like Lil Wayne and Gucci Mane get shafted  — both released tons of fantastic (even wordy!) music on the format over the past decade or so —  and on principle, ignoring mixtapes means ignoring a huge, vital part of contemporary hip hop culture.

And what about pimpin’?

I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimpspimppimping, and pimpinare four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.

No disrespect to Daniels, who clearly loves rap and put a lot of work into this research, but let’s not treat it as anything more enlightening than the goofy lark it is. Now excuse me while I listen to some beautiful, edifying music.

(No disrespect to Aesop Rock, either. He’s great.)

(Image: Wikipedia)