Google launched a new toy in their labs yesterday. Google Books Ngram viewer is a tool for examining the occurrence of different words and phrases over time in language corpora gleaned from their Google Books project. In short, you can see whether a word is in, out, on the up or heading for oblivion.
It's a compelling toy, one on which I could quite happily waste an entire morning. To find that Ronald Reagan made no impact on written English as a film star and neither did Margaret Thatcher as a backbench MP, for example. Or the history of British fast food tastes.OK, maybe I'm too much of a geek for my own good.
Of more interest is the use of language with respect to our community. "Transsexual" peaked over a decade ago, replaced by "transgender" which only took off about 1990. No real surprise there, except perhaps the lateness of the latter's rise.
And it seems I'm a little dated in my use of "transgendered", too.
For me that last graph captures the joy of mining large real-world data sets. No matter what your opinion may be, the data never lies, and sometimes it tells you something unexpected, or maybe something you didn't want to see. And strangely enough I find that rather beautiful.
Wow, I had no idea about this new tool. This is totally fascinating. I'm going to reblog this at my Tumblr because I think it's something people will really find useful.
ReplyDeleteOh dear, try out 'tranny' and transexual...
ReplyDeleteThat is an interesting way to look at label. I do not like either and prefer just trans myself.
ReplyDeleteYou have done it again and my life is melting away...
ReplyDeleteWhy were people writing about cars in 1800 twice as often as bicycles at their peak?
Everybody used gay in 1800 and nobody dared write homosexual! suddenly the graphs swoop together and tangle for a while, love this...
Caroline xxx
Better yet, try transgender and transvestite :-)
ReplyDeleteI always thought that 'transgender' was an umbrella term covering all 'trans' folk whereas 'transsexual' and 'transvestite' are specific to their type.
ReplyDeleteShirley Anne xxx
I figured you'd all like it. There are other public corpus and n-gram tools out there - Microsoft Research have one, for instance - but they usually aren't as easy to use as this one.
ReplyDeleteI should caution you all though that this is just books. It doesn't include magazines, newspapers, the internet (or its precursors) or the spoken word. To illustrate with an extreme example, if you were to use one of my mother's bookcases as a corpus, you might believe eighteenth century English had undergone a resurgence since the 1960s, due to her occasional penchant for reading historical romances.
Interestingly I noticed when plotting "gay" as far as 2008, that the word is in decline, it peaked in the '90s. Has being gay become run-of-the-mill, so nobody writes about it any more?
I should have provided this link near the top of the last comment:
ReplyDeletehttp://corpus.byu.edu/
Enjoy!
Fascinating but use with care ... I looked at the gay vs homosexual chart that you mentioned and found the graph as you mentioned a slow decline in gay's old usage with a rapid rise as it's new usage from the mid 80's. If you look at the underlying data you can see that it is possibly skewed by the changes in popularity of the dramatist John Gay! You really need to be able to filter that out to see the true picture. :)
ReplyDeleteRachel XXX
It's true, you ahve to consider the data. Quick way to check, compare the results from different corpora.
ReplyDeleteIt returns to my point above about mixing data from more than one type of source.
Well, I'm reading this late but I had already discovered this cool tool and I had already done the TS/TG thing.
ReplyDeleteSo, who the hell is Father Christmas?
And what is a Wimpy's?
Calie xxx
Father Christmas? Well it's simple, you only get Santa Claus, but we get both Father Christmas and Santa Claus. Yeah, it's a PITA having to put out two sets of mince pies, I know, but that's the free market for you!
ReplyDeleteWimpy? Be very glad you don't know.
This comment has been removed by the author.
ReplyDeleteOh joy. Something's up with Blogger's comment system this morning.
ReplyDelete