Lies, Damn Lies, & SEO: Statistical Analysis of SERPs

It sounds so seductive… by using advanced statistical methods, you can determine the best mix of on page factors for SEO. Wow, imagine the incredible competitive edge that you’d have. You could use just the right number of bold tags, figure out whether to use bold or strong, and you’d be an unstoppable ranking machine.

The only problem with this approach is that it’s complete bunk. Let’s try a couple examples…

A Statistical Lie I Kind Of Liked: MSN "prefers" sites that run on Microsoft’s own IIS

A while back, someone published a statistical study that appeared to show that MSN’s search results were far more likely to contain pages from sites that run IIS, vs. Google and Yahoo. Did people take this as a sign that they should move their web sites onto IIS? No, of course not… because Google has a lot bigger market share, people actually thought maybe they should switch away from IIS in order to do better on Google!

So… now you wonder: is MSN rewarding you for using IIS while Google doesn’t care, or is Google rewarding you for using Apache while MSN doesn’t care? If Google doesn’t care and MSN does, then you rush to IIS. If Google cares and MSN doesn’t… enough! Spare yourself the circular reasoning before you go mad, and let’s consider some possible root causes.

At the time this study was published, I pointed out that there are many differences between IIS and Apache, aside from the names.

  • Whereas Apache is the majority choice of the entire web, as you move to larger sites and in particular the corporate world, IIS has a much stronger position. So if Google crawls more of the web’s smaller sites than MSN, they’re going to have a higher percentage of Apache-delivered pages in their index. Which means, statistically speaking, that you’re likely to see a higher percentage of pages on Google SERPs being served up by Apache.
  • ASP.Net, whatever else it does, can come with a lot of extra baggage. Such as the "viewstate" form fields that tend to get inserted, with 10-50k of utter gibberish text. So if MSN taught their bot to ignore this junk, and Google didn’t… well, this alone might account for this statistical variation. Since it’s relatively easy to build a site on IIS without adding all that dead weight, it’s hard to blame the search engines either way.

The bottom line: search engines don’t care what kind of server you run. They might care how it behaves, but not about the name.

The Original Statistical Sin: Keyword Density

If you’ve never used "search engine optimization" software to tell you how to optimize your web pages, good for you. If you run keyword density analyzers to do anything other than extract search terms from web pages… stop. You don’t need to. Keyword density isn’t a factor – search engines just don’t work that way.

Keyword density is loosely defined as "the percentage of the words on the page that are your keywords." I can remember endless debates back in the late ’90s about the "right" way to measure it – did you count all the words, did you only count exact phrases? There was only one problem with those debates – we were all wrong. Search engines do not measure the "keyword density" of a web page.

What they do is, in fact, immensely more complicated. Don’t follow that link unless you want to get hit with a firehose full of math, BTW… it talks about the vector space model, information retrieval theory, linearization, TF/IDF (term frequency / inverse document frequency), and other stuff that can give you tired head real fast. I’ll summarize what it all means in a minute.

So there’s no such thing as keyword density… at least to the search engines. However, the "fact" that keyword density isn’t even measured by search engines hasn’t stopped people from peddling their latest "statistical analysis" of the optimal keyword density.

There are two main approaches that are used to push keyword density:

  • Take the top 10 pages for a particular search query, measure their keyword density (yes, I know nobody can agree on how), and then take the average score as the "ideal" keyword density. This is the approach that most optimization software uses. Never mind that the #1 result may be 2%, the #2 result 41%, etc. – if the average is precisely 4.67291% then that’s what you shoot for… and while you’re at it, make sure your page matches the average number of words that were used by the top 10.
  • Dive deeper, categorize the pages into buckets based on their keyword density. Then you analyze a whole bunch of search results and determine that, statistically speaking, the pages that fall in a certain range are more likely to be ranked higher. Depending on the search terms you use, the numbers will vary a bit, but generally the "magic number" is discovered to lie somewhere in the 1-4% range. Which, as it turns out, is pretty much what happens when you just write naturally.

As you already know, there are other factors in play when it comes to rankings. In fact, "on page text" is probably nowhere close to the most important factor in SEO. What this statistical data should be telling you, is that you are wasting your time by worrying about keyword density. If you translate all the technical stuff in Dr. Garcia’s paper on keyword density into English and then summarize, it says "use relevant keywords, but write naturally."

So yes, my friends, there is a magic number for exactly how many times and in what places you want to place your keywords. Unfortunately, without access to the entire search engine index and their ranking algorithm, you don’t stand a snowball’s chance in Hades of discovering what that actually is… and it’s different for every search query.

Even if you could measure the same things the search engines were measuring, you’d still be unable to get there with statistical analysis, because there are a few things that will tend to skew the statistical averages higher or lower than what’s optimal from a pure "vector space search" perspective:

  • People who are doing SEO work to improve their rankings will probably tend to repeat their keywords just a little more than the average writer… which might actually be WORSE for their rankings than writing naturally, but they also tend to do other things, like building links to their sites and using anchor text to boost their rankings. This will drive the numbers up.
  • People in general are more likely to enjoy reading pages that are well written, with natural word use… and they tend to not enjoy reading keyword stuffed garbage. This leads to a general trend where "over optimized" pages will receive fewer links, and sites that are full of keyword stuffed jibba-jabba will receive fewer links. This tends to reward sites that don’t have an extremely high "keyword density" and drives the numbers down.
  • Blogs make a big difference in the math, because blog posts tend to gather more links over time (increasing rankings) and collect comments as they collect traffic (decreasing keyword density towards the average of natural language). This drives the numbers down, or toward the averages at least.

The bottom line on keyword density is that there is no such thing. Write naturally, write persuasively, write to communicate… because no matter what your keyword density is, you can always fire more links at the page to improve its ranking, but the only way to make your copy do its job is to write well, and forget about the damned search engines.

We’ll talk more soon.

65 thoughts on “Lies, Damn Lies, & SEO: Statistical Analysis of SERPs

  1. Thank you!! I get sick and tired of seeing threads on forums and posts on blogs about “the perfect keyword density.” No matter how many people are told, it seems like the lies continue to spread.

    It’s really not something I should get upset about, but it’s just annoying I guess :)

    Thanks for a great article!

  2. Many people you are not trained in the science of reverse engineering of software and therefore don’t believe it exists.

    It exists.

    And many people have no data to back up what they say regarding SEO. I see none here, for example.

  3. OK, Chuck, when your reverse engineering maps out the actual algorithms, let us know.

    There’s a strong statistical correlation between rainy days and my dogs whizzing on the carpet. If you’re not getting enough rain, maybe I could send ‘em over. :D

  4. The keyword density concept has spawned an industry who will of course close ranks to decry such heresy.

    Great job Dan thank you for the input.

    P.S. Your response in the last post smacks of the best of Groucho, one of my personal heroes… great shot.

  5. Dan,

    Have any of the statistical guys done controlled experiments to see if similar pages with different keyword densities are picked up with different results? I haven’t seen the original studies but it seems like if they haven’t done this sort of experiment that all they are doing in finding correlations. And, as any statistician will tell you. Correlations don’t indicate causality.

  6. Lisa, the problem with keyword density is that the “black box” these guys are trying to reverse engineer doesn’t measure it.

    Imagine you have a black box, and when you put rocks into it, it shines a light out in some color of the spectrum. Let’s say you want to learn what makes it glow green.

    You’re doing statistical analysis based on the number of rocks you’re putting in, and you become convinced that the best number of rocks is 17 because that gives you the greenest green output.

    The guy who built the box laughs at you, because it’s actually operating based on the weight of what you put in the box, and the number 17 is based on the average weight of the rocks you happen to be using.

    That’s what’s going on here. Search terms are measured by weight, not volume, and the statisticians can not determine the weight.

  7. Dan, last year I listened to a Stomper CD with Brad & Nathan Anderson. Nathan discussed keyword density with MSN & Yahoo and to try the 7% bump for MSN along with modeling the top sites.

    Anyway, I decided to analyze those top sites and model them. I used a density of about 6%. I eventually made it to page 1 on Yahoo & MSN for my main keyword. Obviously, there were many other factors that played a role but it seemed to be useful at the time.

    I was always kind of curious…if I wrote naturally and ended up at 2% or 3%, would I have made it to page 1 sooner, later, or never?

    Thanks for your continued efforts and giving the best free information on the web!


  8. Hear hear. I find that the people most hung up over keyword density don’t see the wood from the trees.

    My “perfect” keyword density is to “actually have the search phrase on the page”. So many forget that simple secret! Then to also have the plurals and the reverse of the search phrase. If you don’t have the phrases actually on the page, then it takes a lot more links to get to the positions you need.

    I find that clients can get so hung up on statistics that they miss the basics.

    Have the search phrase in the title, meta description, H1, and in an opening paragraph, then “scatter the search phrase around the page”. Now that’s really scientific!

    Totally right – has to read well.

    Just a pity that the level of the playing field is that much higher with Google in the past months regards links.

    You start trying to get clients website high and you realise how hard it is. You do the SEO basics onpage, onsite, and get links…. and it takes time for the Gold of your clients site to rise above the heap. You do nothing apart from steady links over months, and all of a sudden you rank well. Google seems to let you in over time even if you deserve to rank higher.

  9. Dan,
    I understand that keyword density is irrelevant, but what about keyword distribution?

    As a library studies student (many moons ago) we learned that terms in titles and headers bear more weight than those in the first paragraph, which bear more weight than those in the middle of a document (since an author is likely to highlight key terms in these places).

    We can’t know for sure what’s inside Google’s blackbox but it seems like fundamental concepts like these would still bear fruit.

    What do you think?

  10. Bob, actually Chuck and I went lateral with our discussion over here… there are a lot of things you can test in a controlled setting. Headings, titles, stuff like that.

    There are also things that testing (and first hand knowledge) will demonstrate mean nothing, but which may still show up as factors to the statisticians.

  11. I found this info very usefull, my coach is pushing me to write 250 words of “jibberish” just for s.e.o. ( my sight is new) and i understand what he is teaching, but i find it bad to write words on my front page that will make no sense to visitors wanting to buy. thanks for the help…c.h.

  12. Dan – You make some really good points here.

    I think the problem with most SEOs is that, when we think of keyword density, we think of the frequency of the words on the particular page, while IR research books clearly show that the search engines need to take in account the frequency of the words in the whole document set as well.

    The important question is: Why?

    Search engines need to tell web pages apart (in order to properly organize them). One way to tell them apart is by finding words that are unique to them.

    TF/IDF helps in that phrases that are popular on the page, but that are not popular (inverse relation) on the whole index are better at describing the web page, than words that are also popular on the whole index.

    The vector space model and the similarity equation are really interesting concepts. However it is very unlikely that large scale search engines are using that at the moment. Multidimensional vectors with millions of dimensions are not only hard to imagine, but hard to scale too. Please this comment for information

    I have to agree with you that the current search engine software on the market is doing more harm than good. Fortunately, there is an interesting SEO software coming to the market very soon. ;-)

  13. If you go all the way back to Salton’s work, I’d agree that the simple vector space model isn’t going to precisely describe what the search engines’ “black box” is doing. But it’s a much more helpful model for understanding than keyword density.

    For those who want more information (I fear for their sanity), I should also mention Dr. Garcia’s blog which has been a nice source on currents in IR research. One of these days I’ll get around to summarizing what’s wrong with SEOs ranting about Latent Semantic Indexing (LSI), themes / silos, etc.

    One of my Stompernet colleagues recently returned from SIGIR in Amsterdam, but we haven’t had a chance to catch up so that I can download his brain.

    I’m looking forward to investigating your software, Hamlet.

  14. If you plan to read Dr Garcia’s excellent stuff, it is a good idea to brush up your linear algebra skills. He has some excellent tutorials on matrices and I’d start there if I was new to the site.

    I’m looking forward to investigating your software, Hamlet.

    Dan – I sent you a private beta invite. Thanks in advance for your valuable feedback.

  15. “One of these days I’ll get around to summarizing what’s wrong with SEOs ranting about Latent Semantic Indexing”

    Just the other day I saw a video, I forget where, showing how you can find LSI keywords in order to place them strategically into your content.

    I thought, what in the hell is wrong with this guy!!

    I guess some people just don’t get it.

  16. I guess some people just don’t get it.

    What surprises me is when they don’t want to get it. When I figured out that keyword density was meaningless, it was a big relief… one less thing to worry about.

    The LSI peddlers are another thing, though – gotta wonder why they’re talking about it.

  17. Dan,

    Your article is very insightful. In November of last year we got our single biggest SEO client ever – the one client hired our company to provide them 15 hours a week in an ongoing SEO / SEM campaign. After extensive research to bolster the paltry knowledge I had at the time regarding SEO, I proposed a massive site overhaul. One of the first things they thew at me was one of their staff members findings – she’d used one of the off-the-shelf SEO analysis solutions – you know the likes of which tell you your keyword density in granular categorized statistics…

    My own research back in November had led me to a blog listing 35 key points in order of importance, as determined by the person who’d posted the list. While one of those factors listed was keyword density (ranked around 10th most important, I chose to take the much bigger approach, because my client’s site had over 150 pages. I was not about to expend 80% of my clients 15 hours a week on keyword density.

    I am quite happy to report that within three months,
    four of our top phrases got us in the #1 position at Google. All these months later, we’re now up to a dozen of our top phrases resulting on the 1st page results, four more are on page 2, and all the rest (25 phrases in total) are continually moving up in ranking, whereas when I started, the best results they had were on the 17th page of results.

    So how much time have I put into deep density analysis? ZIP. Nada. None. I have taken pages that, when I’d started, were purely garbage and unintelligible to a real site visitor, and re-wrote the content on almost all the pages. I’ve added RSS news feeds to the site, and several new pages. I’ve focused on back-links. Optimizing specific phrases on specific pages of the site. In-page in-linking. Strong, Bold and Header text on pages. Tight integration of title, Keywords, Description, and on-page content.

    On and on the list of things I’ve focused on goes. And by simply writing as much as I can in a natural language method. Oh sure – I’ve added a TON of text to a few of the more important pages – some of them are massive pages now. And honestly, I have a LOT of clean-up to do – because there’s still so much text that is very nearly worthless to a site visitor when it comes to readability or worthiness.

    When I applied the same concepts to a 2nd client this past spring and summer, we got very similar 1st page ranking results in yet one more fierce SEO saturated industry.

    So until Matt Cutts hands out copies of the Google algorithm, and provids us with regular updates every time they change or tweak it, there is no way I’m going to waste my clients money on keyword density being so vital as the statistics software pushing companies would have us all believe.

    Keep up the great work here.


  18. I’m still an Seo Newbie trying to learn everything about Seo. Before, I believe that keyword density works, but now I guess I need to agree with you.

    I’m currently joining an Seo competition here in the Philippines “Paradise Philippines” and I tried to experiment this keyword density by doing some keyword stuffing technique, but it doesn’t really works, my entry is still on page three.

    I observed that most of the websites on the first page of Google have a lot of backlinks, some of them have thousands of backlinks which my site doesn’t have.

    It only shows that links is still the main reason why this websites ranks well and not for keyword density.

    I have learned also that having a small file size is a great help.

    What can you say about this?

  19. Considering you can rank as high as #1 for keywords that aren’t even on your page why is this topic even still around?

    Sheesh, you can use link text to decide what the SE think your site is about without having the keywords on your page.

    That little fact by itself kinda makes keyword density irrelevant because you can have a keyword density of zero and still get ranked for your keywords.

    I make sure I get my keywords in the title, description, h1, first paragraph, last paragraph and the rest of it I don’t worry about. I may bold or italisize related keywords because I think LSI is as important as keywords.

    In the end I still think words used in link text are more important than keyword density. Just ask the miserable failure. ;)

  20. Pingback: (EMP) E-Marketing Performance » : » Team Reading List 8.28.07

  21. Pingback: Search Engine Land: News About Search Engines & Search Marketing

  22. Great article Dan, would like to know if you could write something about how to add links like you have on your blog.

    Spread the word Digg Furl Netscape StumbleUpon Technorati Windows Live Yahoo!

  23. Hi Dan,

    Thanks for clearing up this confusion as I have been operating under a false impression for quite some time and achieving decent results!! Hopefully this will save a lot of effort and still achieve decent results.

    Thanks again!


  24. Pingback: Demystifying the SEO myths…

  25. Pingback: A Call to Expose SEO Liers « IR Thoughts

  26. Pingback: Ask the SEO Guru

  27. Goodness! How is one supposed to decide what’s true? Ask the experts they say. But it seems they can’t agree. I think I will make my website for the people, but try to get the search engines to like me at least a little. Sigh.

  28. Pingback: Just Joshing You › SEO link roundup

  29. I remember my early days in SEO where I was trying to understand this and many other factors. I understand them better now,.. main result of all this understanding: I don’t care much about most factors anymore. Just build a site the right way and you get pretty much all factors right without measuring anything. In the end what makes the difference is marketing your website. There are some things I do look at though:

    Focus of a page – You can do every technical factor correct, but if you don’t focus, it’s a waste of time. (and it often amazes me how good people are at building pages that lack focus.)

    Semantic related words – Who cares if a keyword comes up many times in a page if the related words are missing? Search engines don’t!

    anchor texts – with a lot more focus on internal links than on external links.

  30. Hey, Peter!!! Nice to see you! Hope this isn’t a one time visit.

    Internal links are actually the focus of tomorrow’s article/post. :D It’s funny how much of a “secret” that seems to be, when we’ve been talking about it for so long.

  31. I just wanted to pick up a quote from your article.

    “because no matter what your keyword density is, you can always fire more links at the page to improve its ranking”

    When you say “fire more links” that’s actually more difficult nowadays, if we are to stay well clear of reciprocal link exchanges.

    I’m really struggling getting people to link to a travel site which provides a service. I’ve added a blog and write unique content daily, added travel videos just to add content that people will link or bookmark.

    I am using the blog to link to level 1 to 3 pages on the site.

    It’s frustrating.

  32. Dan, I saw you in wpw, followed your signature link and found this interesting post. Always nice to see you too.

    Internal links indeed seem to be a “secret”. I guess people prefer to put the “blame” outside their website rather than having to look critically at them selves,…. :)

  33. Darren,

    Don’t overlook the power of internal links – it doesn’t take much to make fine-grained details of on page text irrelevant.

    In terms of building external links, reciprocal links have long been a big time suck vs. the benefits, but there are plenty of other ways to get links. Have you watched any of the videos in my free link building course?

  34. Dan,

    I’ve always been led to believe that internal links carry very little weight where ranking a keyword is concerned. I guess from reading posts here that it isn’t the case.

    Regarding the videos I have watched them all, but I need to go back and revisit some of them.

    Thanks for all your doing


  35. Hi Darren, I have found that internal linking, authority directories, and topically related reciprocal linking has given my site high rankings on all tiers.

    If your not in all of the authority directories, you should consider them.

    Good Luck!

    Jeff :)

  36. Thanks for the advice Jeff.

    I have upped the internal links on my blog, and have submitted to most of the major directories, with the exception of Yahoo and DMOZ.

    DMOZ, will take an age to get listed

    Yahoo, is it really worth the $299 anymore?

  37. Just took a visit to a forum and found alot of posts relating to directories being dumped by Google. Is this just another forum myth or are Google cracking down on directories?

  38. Uncoverthenet seems to have had some problems with Google, but I suspect they’ve fixed the problem (they were presenting SERPs as content, not intentionally spamming AFAIK) so they’ll likely return to the index soon.

    There are some directories that really stink from the search engines’ perspective, because they’re really just paid link farms. A real directory would actually decline some submissions, but a lot of them don’t.

  39. Yahoo is worth every penny, IMO. Especially for newer sites. I wouldn’t launch a site without submitting to Yahoo, unless I had a very good plan to get a lot of authoritative links.

  40. I submitted to DMOZ, over a year ago, I’ve given up, and you can’t leave requests for an update on the resource zone forum anymore.

    I intend to submit to Yahoo Directory this week.

    Well, internal links seem to have worked, jumped from #57 to #7 for two of my optimised keywords.

  41. I hear you Darren. No luck for me either with submitting my sites and a few client sites to DMOZ. Most of my competitors managed to make it in though.

    Good job with the internal linking :)


  42. ^ ha ha! :)

    glad to see you’re using the nofollow on these commenter’s links, as i’m pretty sure that the hope of getting a backlink from you is the only reason some of these clueless ones continue to post their nonsense here…

    and you can keep your dogs, thanks (although we could use the rain)…

  43. Hey Dan ,

    I had a question about moving my site from .asp to .net and if this will have any effect on my current seo efforts.

    I rank in the top 1-3 on MSN, YAHOO, and Google for just about every phrase( very competitive phrases ) that makes me money. I have my guys switching over my site from .asp to dot net. However I am worried that the prefix change at the end of my url’s will effect my results in the SERP’S.

    Thank you for your help. I am a long time student of yours.


  44. Dave, I’m guessing that the change in URLs is from .asp to .aspx? The best-case would be not to change the URLs. If they must change, then you want a 301 redirect from the old version to the new version for every URL.

    I’m not the world’s greatest ASP/IIS/.NET expert, so I can’t help much with HOW to get that done. I do know that ASP.NET supports URL rewriting (you don’t need any server plugins) so it shouldn’t be a huge undertaking to keep the same URLs.

  45. Hey Dan ,

    Thanks very much. Yes the change is from .asp to aspx.

    The 301 redirect was my first thought too. However I wasn’t completely certain that it was the best answer, shy of not changing the urls.

    Thank you for your help. It means a lot.

    Take care and have a great weekend.


    Take care and have a great weekend.

  46. Dan,
    Great blog- as a self taught SEO blundering amateur this all helps -cant afford a pro as the last ‘expert’ i dealt with emptied our coffers and managed to destroy our page rank and positions [-all done to that point without any “expert” help.]

  47. Thanks for the article and truth on keywords. There is a company out there helping people to make their sites more effective. I’m sure there are design tips that can help increase sales, like those I found in thermal mapping. Is there anymoe that can be done with keywords that would make that much difference? Of course this is after we follow all the procedures in your SEO book. They were former clients you had and have been offering the 30 day challenge under different site names.

  48. Dan,
    another thought came into mind on validating pages. Is it necessary to be sure our present pages will work in much older browsers? I also noticed that some links from other companies have html that is not compatible with older browsers, mostly netscape and Explorer 5 especially with size tags, for example height. Can we change those without causing a problem in the links?
    I’m also trying to work with .php to download database info. I have little experience with it. IS there a site I can go to that would show me how to properly use this? My web server supports many programs including .php. The companies supply a script and site info but after that I need to stop for directions.

  49. Great post. I’ve never been a fan of keywords density, even if it makes a small difference on MSN. Not worrying about repeating keywords endlessly gives freedom to SEO copywriters, which will then produce much better link worthy content.

  50. Hey Dan, What’s the Haps?

    I think you might be referring to MY statistical analysis of MSN that turned up the supposed preference for IIS servers.

    I know this post is pretty old, but I thought I’d better add my 2¢ here anyhoo.

    That stats analysis turned up a whole lot more than a preference for IIS. Upon further investigation, it really didn’t point to a preference for IIS servers, it pointed to a preference for Microsoft-owned properties, which all happen to be hosted on IIS servers (duh).

    So stats analysis can reveal some pretty shocking stuff. In this case, it’s a search engine seeding their SERPs with their own sites. Not exactly ethical (or legal, if you want to get technical), but certainly understandable.

    Stop on by SEO Club again someday if you’d like to give the numbers a scan…

  51. Hey, Nate! :D

    I was actually talking about someone else’s study, but thanks for reminding me about that conversation. I remember you had that explanation in mind when we talked last year. It was sort of the obvious explanation, but obviously slipped my mind completely when I wrote this.

    It’s nice to hear what the outcome was when you took a deeper look, too. That’s the difference between looking at the statistics and drawing silly conclusions, and examining the data to get at the truth.

    I don’t know about the legality/ethics of the hand edits (or whatever) that some search engines apparently do to boost their own properties in SERPs.

    MSN might simply be looking at user data, concluding that their users prefer certain sites, and boosting those popular sites in SERPs. Since they get a lot of their search audience from their own properties there would be a bias toward those sites within their user data.

    So it could all be perfectly innocent… and I got a bridge I’d like to sell ya too. :D

  52. i totally agree with you, “SEOs work to improve their rankings will probably tend to repeat their keywords just a little more than the average writer”. even content witters make this mistake, natural writing is what visitors and search engines like.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>