Well, I’ll start by answering my own question: Google is a brilliant search engine, a fantastic invention when it came on the scene 15 years ago, and never surpassed. It’s a huge and very valuable company, and search is still its core business. But…
In certain respects it has got worse. 1. The cache is harder to find and sometimes absent. 2. The buying-up of Usenet newsgroups has helped to preserve them, but at the expense of a bastardisation of the system and some problems of data retrieval. 3. Attempts to diversify have only had a 50-50 success rate. Buzz buzzed off, Wave came and ebbed, G+ is still a niche product, Chrome is okay. 4. Its purity has been compromised by the ownership of Youtube, which it promotes ahead of other video sites, and by the continually increasing amount of advertising on the main search results page, which is especially making it difficult for children to use Google effectively.
But all that’s by the by. The topic I come back to time and again is Results. When you click Search the first thing you see is something like “About 10,000,000 results” and these numbers are quoted far and wide, often as an excuse for doing no real research whatsoever. The problem is that this figure is plucked out of thin air. Well, I exaggerate, but the algorithm that spews them is little better than woo. It’s a homeopathic product, a placebo designed to comfort the user. This is a shame, and I’m at a loss to understand why Larry Page treats his customers in this way. I understand that coming up with an accurate figure within a fraction of a second is difficult, but is it unreasonable to expect a ballpark figure?
The wanton inaccuracy is partly a by-product of Google’s increasing desire to second-guess the user by correcting our typos and eggcorns for us. The hit count in a way reflects all the possible hits for any part of our search string and possible variations thereof. As you can imagine, this inflates the results massively.
Example 1: dadge – “228,000 results”. Actual results: 525.
Example 2: Az a baj – “20,200,000 results”. Start going through the pages of results and you quickly find that most of the pages don’t contain this whole phrase at all. (It means “That’s the problem” by the way.) What you have to do to correct this is click “More search tools” on the left and then click “Verbatim”. The revised figure is 61,000,000 results(!) but at least the required phrase exists in the listed webpages. Actual results: 454
So, people, don’t forget your grain of salt when you’re using Google.
Update: I’ll add more examples as I find them “in the wild”.
“the ongoing war in Iraq”, quoted figure: over 700,000,
current figure: “269,000 results”,
actual results: 440.
Downton Abbey, quoted figure: 109 million
downton abbey 59.2 m
“downton abbey” 19.7 m
downton abbey (Verbatim) 101 m
“downton abbey” (Verbatim) 68.7 m
actual number of webpages listed: 757, (Verbatim) 415
Every single one of these numbers is wrong!
Often there doesn’t seem to be much logic to the idiomatic use of prepositions in English, and sometimes English uses a different preposition from other languages. So it’s not surprising that there’s quite a bit of variation in usage. Language Log is currently revisiting bored with/of/by and I recently spat out my cornflakes over “appreciate of”, but I thought I’d take a look at excited about/for. On Language Log, Mike Kelly comments:
What does sound odd to me… is my kids’ use of “excited for” where I would use “excited about,” e.g, “I’m excited for Thursday,” “I’m excited for the game,” I’m excited for having a day off.”
According to Google, a lot of people were/are excited about The Sims 3:
“excited about (The) Sims 3” 122,000
“excited for (The) Sims 3” 100,000
“excited with (The) Sims 3” 160
“excited by/of (The) Sims 3” 1 apiece
“excited at/to (The) Sims 3” nil
To an oldie like me, “excited for” means something different from “excited about” (Compare happy about and happy for), but never mind: another useful distinction has kicked the bucket. A few more comparisons:
“excited about Twitter” 327
“excited for Twitter” 8
“excited about Myspace” 47
“excited for Myspace” 7
“excited about Facebook” 230
“excited for Facebook” 16
“excited about the wedding” 818 “…marriage” 209
“excited for the wedding” 838 “…marriage” 19
“excited about the couple” 54
“excited for the couple” 69
The latter comparison is somewhat different from the others because it has been more usual to be excited for people than about them. I can test that with some Google hit ratios:
excited about/for it 58:8
excited about/for that 28:7
excited about/for her 14:6
excited about/for him 8:4.5
excited about/for us 8:6
p.s. Hey, Google, your hit counts are STILL broken. When are you ever going to fix them??