Stop Words Are Dead! Did I Miss Another Memo?

That’s right folks… after years of telling the world to pretend that "stop words" don’t exist, at least when you’re writing copy, now you can really do it for real, because…

Stop Words Are Dead!

Well, they’re dead at Google and MSN anyway. Yahoo might want to get it in gear. I didn’t bother checking Ask, A9, or any of the other spares – I’ll leave that as an exercise for enterprising readers who actually care what the 4th-tier search engines are up to right now.

What are stop words? Well, stop words are (were) words that are so common that search engines have chosen to (sort of) ignore them, by not indexing them when they crawl a web page. Words like (a, and, is, or, the, was), etc.

This doesn’t (didn’t) mean that they have no effect on search results, because the index records the position of words, so even a "blank" in the word order created by a stop word could still affect the order and proximity of other words that you searched for. If this makes no sense, don’t worry – it didn’t matter much in the first place, and now stop words are dead.

Did I Miss A Memo?

If this isn’t news to anyone, someone let me know, because I haven’t seen it written up anywhere. It was news to me anyway – not especially exciting news, but that’s beside the point. A couple years ago when I was working on the Search Engine Marketing Kit, stop words were definitely in effect. We demonstrated this with a bunch of comparative searches like:

  • cats or dogs vs. cats and dogs
  • kick the bucket vs. kick a bucket

These search queries will return the exact same search results, if the words (a, and, or, the) aren’t being indexed. If those words are being indexed, then we’d expect to see different search results – as we now see on MSN and Google. You have to do a bunch of searches to make sure, because common words, even when they are indexed, have a fairly small effect on ranking.

So, last month, a friend of mine asked me to look over a list of stop words he’d received and let me know if they were correct. When I checked on Yahoo, most of them were correct, although Yahoo does show different "universal search" type results (video, product, etc.) with different queries, the organic search results themselves did not appear to change on any of my test queries.

When I checked on MSN and Google though… none of the "stop words" on the list worked as stop words. Zippo. So it looks like somewhere along the line, 2 of the 3 major search engines stopped stopping, and started indexing every word.

Why should you even care about stop words?

Well, if you’ve been ignoring my advice all these years and fretting about stop words, you have one more reason to stop worrying and start writing naturally. If you use search engines, it’s at least a small improvement in the quality of search results when you’re using common words, as often happens when searching for books, lyrics, movies, music, Vogon poetry, and the like.

Fact check for me?

  • If you can find stop words that appear to still be "working" on MSN and Google, do let me know.
  • If you can find some indication that Yahoo is indexing commonly known stop words, do let me know.
  • If someone else has written this up and I really did just miss the memo, please post the original citation.

In other news…

My Stompernet colleague Don Crowther has put together a simply amazing video on how to leverage social media, social marketing, and Web 2.0 for traffic, conversion, and SEO. You need to watch the whole thing to fully understand the point, but it’s absolutely worth your time to do just that. Don’s also releasing a free PDF report with more information either today or tomorrow. While it’s in support of a coaching program that he’s launching soon, my friends at Stompernet really know how to "move the free line" and give away great information. It’s too bad they had to cut it down to only 50 minutes, but as I understand it the other 20-30 minutes was almost as good.

Designers and conversion specialists: Another Stompernet colleague, Andy Edmonds (our "chief scientist"), spent a good deal of time and treasure in 2007 developing a "vision simulation tool" called Stomper Scrutinizer that works just like a web browser, to show you how people see the colors, type, and navigational elements on your web page designs. Then our fearless CEO decided to, what the heck, give the software away as a holiday gift to the community. Nothing to buy, just go get this free software. The video linked from that blog post is a real eye-opener, by the way. Working with smart people is really cool.

 

22 thoughts on “Stop Words Are Dead! Did I Miss Another Memo?

  1. Hi Dan,

    In Google search for the following phrases in portuguese language, even if my site maintains the same position, the number of pages change strongly to 5 times more:

    ganhar dinheiro em marketing – 322.000 pages
    ganhar dinheiro com marketing – 1.510.000 pages
    “ganhar dinheiro com marketing” – 6.130 pages
    “ganhar dinheiro em marketing” – 7 pages and my site doesn’t shows.

    The words “com” and “em” have the same meaning but they could be translated to english by “with” and “in”.

    Keep the good work,

    Carlos

  2. Pingback: Search Engine Land: News About Search Engines & Search Marketing

  3. THX Dan, very informative – Yahoo! seem to be indexing stop words – try “find right lawyer” vs. “find the right lawyer” – you’ll see different results (as you do for Google)

  4. Chas, the difference in search results between those could be due to the change in word proximity, not necessarily because they’re indexing stop words.

    Try these:
    find the right lawyer
    find a right lawyer

    If those come up different, then we can probably add Yahoo to the list.

  5. Pingback: BKSEO’s SEO Blog » Blog Archive » Stop Words No More! But I Don’t Care

  6. Hi Dan,

    Not being one to just take someone’s word as gospel I ran the same search you mention with Google on ‘kick the bucket’ and ‘kick a bucket’ and for ‘cat and dog’ versus ‘cat or dog’, without the quotes of course which obviously would give different results.

    Unlike you Dan I got very different results for the kick bucket comparison and different, although not much, for the cat dog one. How many times did you try this out? If just once per term it could have been merely coincidence.

    I know the Internet is changing fast but I’m writing this comment the same day your blog about this was published.

    Or perhaps Google read your article already and not wanting to be predictable, changed things back. ;-)

    Warmest regards,

    Ellery

    P.S. Dan, please don’t take offense to this. I have the greatest respect for you. I just thought you would want to know. I met you in Atlanta a couple of years ago at Brad Fallon’s SEO Pros seminar.

  7. Pingback: (EMP) E-Marketing Performance » : » Team Reading List 1.10.08

  8. No offense at all, Ellery – if we can’t deal with questions then we probably haven’t been very careful before posting.

    The order of results isn’t extremely different on these queries, because these are extremely common words, and the other words (dog, cat) are far more important in ranking the search results.

    You should expect to see a different number of total matches (displayed as “1-10 of ____”) and at least a slight difference in the ranked pages.

    These former stop words are so common, that we always had to do more than a couple queries to have any confidence that a given word was actually a stop word.

    If you look up a couple comments higher, Chas found some better queries to show that Yahoo is also not using those stop words. Switching out stop words has a greater effect on some search engines than others, obviously.

    This may tell us that Yahoo, for example, relies more heavily on the text of the page than MSN or Google do. Which would confirm the most common line of speculation.

  9. Pingback: New Google Approach to Indexing and Stopwords -SEO by the SEA

  10. Anyone interested in search technology and "how things work" should follow the trackback above and read Bill Slawski’s post.

    These words aren’t exactly news to those who follow the technology, but it’s very helpful, when trying to figure out what search engines are and aren’t doing, to keep performance in mind:

    Typically, given a query, the performance bottleneck is the time it takes to decode the occurrences (which are typically delta encoded to save space, and thus have to be followed from the beginning) of the most frequently occurring term, especially if this term is a so-called stop-word such as “the”.

  11. Pingback: Search Engine Land: News About Search Engines & Search Marketing

  12. Pingback: Dude, I'm Phaaaaaat!

  13. Pingback: How Search Really Works: "The" Index (2)

  14. I see the example above. So, if all the words in a url are together with no hyphen, then google sees them as one word? If so, why do I constantly see long key phrases like lasvegasweddings (Las Vegas Weddings) in top positioned url’s?

  15. I also see this in url’s with words like Royalty Free Music. How can this be seen as one word? Also, it seems Google would be able to tell the difference between “Does Not” and “Doe Snot” in a url based on other words in a search query. Their algorithm must at least be that advanced?

  16. Pingback: Marketing Words Copywriting Blog :: Entries :: Stop Words Are Ignored by Search Engines. Not!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>