It sounds so seductive… by using advanced statistical methods, you can determine the best mix of on page factors for SEO. Wow, imagine the incredible competitive edge that you’d have. You could use just the right number of bold tags, figure out whether to use bold or strong, and you’d be an unstoppable ranking machine.
The only problem with this approach is that it’s complete bunk. Let’s try a couple examples…
A Statistical Lie I Kind Of Liked: MSN "prefers" sites that run on Microsoft’s own IIS
A while back, someone published a statistical study that appeared to show that MSN’s search results were far more likely to contain pages from sites that run IIS, vs. Google and Yahoo. Did people take this as a sign that they should move their web sites onto IIS? No, of course not… because Google has a lot bigger market share, people actually thought maybe they should switch away from IIS in order to do better on Google!
So… now you wonder: is MSN rewarding you for using IIS while Google doesn’t care, or is Google rewarding you for using Apache while MSN doesn’t care? If Google doesn’t care and MSN does, then you rush to IIS. If Google cares and MSN doesn’t… enough! Spare yourself the circular reasoning before you go mad, and let’s consider some possible root causes.
At the time this study was published, I pointed out that there are many differences between IIS and Apache, aside from the names.
- Whereas Apache is the majority choice of the entire web, as you move to larger sites and in particular the corporate world, IIS has a much stronger position. So if Google crawls more of the web’s smaller sites than MSN, they’re going to have a higher percentage of Apache-delivered pages in their index. Which means, statistically speaking, that you’re likely to see a higher percentage of pages on Google SERPs being served up by Apache.
- ASP.Net, whatever else it does, can come with a lot of extra baggage. Such as the "viewstate" form fields that tend to get inserted, with 10-50k of utter gibberish text. So if MSN taught their bot to ignore this junk, and Google didn’t… well, this alone might account for this statistical variation. Since it’s relatively easy to build a site on IIS without adding all that dead weight, it’s hard to blame the search engines either way.
The bottom line: search engines don’t care what kind of server you run. They might care how it behaves, but not about the name.
The Original Statistical Sin: Keyword Density
If you’ve never used "search engine optimization" software to tell you how to optimize your web pages, good for you. If you run keyword density analyzers to do anything other than extract search terms from web pages… stop. You don’t need to. Keyword density isn’t a factor – search engines just don’t work that way.
Keyword density is loosely defined as "the percentage of the words on the page that are your keywords." I can remember endless debates back in the late ’90s about the "right" way to measure it – did you count all the words, did you only count exact phrases? There was only one problem with those debates – we were all wrong. Search engines do not measure the "keyword density" of a web page. Continue reading →