Thursday, July 11, 2013

The Data Speaks (Ten at Fifty)



spreadsheet that I created, using a quick-n-dirty Python script and a variety of lyrics sites, has provided a peculiar view into the twenty songs I had decided to think about for the last couple of months. I decided I was interested in lyrics as data points; where are their repetitions, where are the commonalities, and are there things that correlate well within an era, as well as across the eras.

The "1963" and "2013" tabs show each song's lexicon sorted by frequency. The top of the frequency list provides no surprises, as most are something like definite articles or words from the title. I decided to make a column show the overall lexicon in alpha order, which went a little awry given the oddities of spacing I encountered since I didn't do a lot of cleaning of my cut-and-pasted lyrics. But still, it was sort of interesting that the older "rollup lexicon" ran 675 entries or so, and the newer just a hundred or so more, with similar redundancies of "you" and "what" and arbitrary other words that happened to fall at the end of a line or whatever and show up as distinct. My preconception is that the later songwriters, having overall longer songs, would fill them both with significantly more words, and with a greater variety.

Interestingly, a song from each era established itself as a repetition leader by repeating three words about the same number of times - again, there are minor provisos with respect to capitalization or other factors sometimes - and those three floated to the top of the frequency list. "It's All Right" repeated its title 27 times, and apparently didn't use those words elsewhere, as common as they are. But "Scream and Shout" used "and", "we", and "oh" about 50 times each, far outstripping its nearest competitor, "Suit and Tie" ("a", "you", "and.")

Some interesting 1963 words only used once: "rapture", "aglow", "Haggerty's."

From this year: "compliment", "saints", "doozy."

The "Across" tab showed disjoint words, including "a'married" (you couldn't guess that year!) and "sinner", "bitch", and the usual panoply of rap-related invective.

The average song length obeyed the unwritten rule in 1963, at two and a half minutes, and thanks in part to Justin T., this year's average time was about four. That courtesy the "Master Stats" tab, which also lists the statistical winners of each category that I mentioned along the way, including probably the  biggest stat, the number of words in "Suit and Tie", which correlates pretty well with that outsize running time.

And if there's a clear message in all these stats, it's probably not a code I'll crack. And that'll close the examination of the Top of the Pops for this year.

No comments:

Post a Comment