Over the past several years, Google and its university partners have been scanning every book they can get their hands on into the searchable Google Books resource. Despite the lawsuits, they've collected over 15 million books. Meanwhile, a team at Harvard led by researchers Jean-Baptise Michel and Erez Lieberman Aiden has been digging through this immense trove of data and pulling out all kinds of gems.
For their first study, published last week by Science, the authors pared down the data set to only the most reliable books--excluding, for example, those with blurry scans or uncertain dates of publication. The resulting data set was 5 million books. By searching the database for words and phrases (n-grams), the researchers were able to track patterns and changes in the English language. You can read their whole study, and see all their graphs, at the link above (with a free registration).
Among other findings, they showed how the number of English words has been steadily increasing...
When verbs with irregular forms were replaced with more regular words...
And how effectively the Nazis were able to erase Jewish artist Marc Chagall from public awareness.
Want to try it yourself? You can make your own word graphs with Google's n-gram tool. Here are a few things I've found:
While "men" vastly exceeded "women" until the 1980s, "boys" and "girls" have been better matched. The kids saw an increase in popularity in the mid-20th century, maybe when a lot of child-raising books were being written. But around the time "women" surpassed "men," "girls" also edged out "boys."
Genetics has been an increasingly popular way to explain our traits and tendencies over the past century. Before that, what did we have? Head bumps, for one thing.
Newly discovered scientific principles have a steep learning curve, then plateau once people have caught on. It remains to be seen where global warming will level off.
Luckily, we're not a generation that sits back and assumes that what happens on this planet is outside of our control.