Home: Free SEO Book Download The SEO Fast Start Book Free Why SEO Fast Start?
 

Dan Thies

Hello, I'm Dan Thies. I've been writing about, teaching, and practicing search engine optimization for over 12 years.

In spite of what you may have heard, SEO is very simple. If you don't believe that, then you really need to read my book! It's only 100 pages long, and it's free.

July 25, 2007

Crawling Out Of The SI (For Large Sites)

39

In my last post, I explained a simple method for working on your site's indexing in Google… and I promised to give some more information for folks with large web sites (1000+ pages).

Unfortunately for those with large sites, the process can be "just a bit" more involved… and Google seems to have taken away a few of the tools we would have used.

So if we want to do this, we'll we have to improvise a bit… Let's begin by reviewing some key ideas from SEO Fast Start.

The pages on your site can be divided into "tiers" based on how far they are from the home page. If the home page is the first tier, then any page that has a crawlable link from the home page is in the second tier. The first step is to make sure that your second tier pages are not in the SI.

There are plenty of tools that will give you an outbound link report for a web page. I like this one:
http://www.webconfs.com/search-engine-spider-simulator.php
Because it gives me a list of links that I can copy and paste into Microsoft Excel.

Now, Google has thrown a bit of a monkey wrench into the works recently, because the info: search (which we would have used) currently isn't showing whether pages are in the Supplemental Index or not. Craig posted a search hack to see "just the supplemental results" last week… and that hack isn't working right now either.

Like I said, we gotta improvise. We gotta think outside of the box. We gotta revive an old hack that we used to use way back in the day… page tagging. Back in the day, when we wanted to see whether groups of pages were getting indexed, we'd just tag those pages with some kind of unique text.

So if I wanted to check on a set of pages (maybe my category pages), I'd add some text to the bottom of these pages, like "Page Code: Zebra" – this will get indexed, and then I can do a site: search for that text. This limits the # of pages that show up, and allows me to measure my indexing. If I have 55 pages in my "Zebra" group, I can determine how many are indexed, etc.

This isn't the post I wanted to make this week… but it's all we've got to work with right now.

If anyone out there can come up with a search hack that will let us check the status of an individual web page, please post it here in the comments. At the very least, I'll send you a signed copy of the SEO Fast Start print edition. :D

Filed under Blog by Dan Thies #

Buzz Up Digg Mixx Twitter
Email Print
Current Delicious Diigo Facebook Fark Google LinkedIn Live MySpace Newsvine Propeller Reddit Slashdot Sphinn StumbleUpon Tip'd Yahoo! What's This?

Comments on Crawling Out Of The SI (For Large Sites) Leave a Comment

July 25, 2007
Reply

Suzanne @ 4:51 pm #

Hi Dan, I'm a newbie at this so I may have gotten this wrong. When I use site:www.myshrink.com I get all my indexed pages and the ones that are in supplemental. Are you looking for more than that? Otherwise, it's been working for me for a long time. Suzanne

Reply

James @ 4:57 pm #

Sounds like after using this we can't tell if the pages are in the SI–just if they are indexed. I have a set of pages that are indexed and were in the SI last month. Now they are still indexed and don't show the SI-although other pages do show SI.

I did hear about Google blending the SI back in, but it still could be the case that they think those pages have thin content (which they do, it's on my to-do list to improve).

So I should assume that I don't know if they are actually in the SI or not?

Reply

Dan Buglio @ 5:01 pm #

Dan:

I'm not sure I see the same thing as you. When I initially searched Google for:

site: http://www.mydomain.com (Probably the wrong syntax, but I included a space)

I got the results that were in the primary index only….NO Supplemental results. Kinda freaked me out. But when I eliminated the space between the : and the http, I got a full listing of all pages INCLUDING those in the supplemental index.

site:http://www.mydomain.com (No space after site: seems to work still.)

So at least for right now, I can still print out a list of all my pages in AND out of the S.I.. I just printed a full listing just in case things change again with Google. Hope this helps!

Reply

Jason T Chandler @ 5:59 pm #

Does this relate to the cache dynamically updating to my brand new content just FTP'd with a cache date that is a week old?

not sure if I can post a URL here so I will leave only part: /google-services/google-trustrank.htm. I JUST made the page PINK in the body. Pushed the page live with DW. the cache shows the PINK page with a date from: google-services/google-trustrank.htm as retrieved on 9 Jul 2007 05:46:14 GMT.

So the cached HIGHLIGHTED text is gone. Meaning that G may not be ranking keywords as much as passion and knowledge. Or in other words "socially accepted".

Good luck Dan, I have been following you since 2003. Good stuff!!

Reply

Jason T Chandler @ 6:00 pm #

PS – both browsers (FF /IE) demonstrate this cache change.

Reply

Dan Thies @ 6:11 pm #

Suzanne & Dan, doing a site: search will still show a mix of supplemental and main index results. The "hack" Craig posted last week was: site:www.yoursite.com *** -sjpked Which was showing only supplemental results… which would be very helpful for a large site since it would let us find more of the SI pages.

Jason, link away, you're good for it. If we have to edit a posted comment we will. For those who would like to see the page Jason's talking about: http://www.jasontchandler.com/google-services/google-trustrank.htm

That's one ugly looking page, Jason… White on pink, it burns my rods and cones!!! The reason why Google is showing the pink background on the cached page, is because you defined that color in your stylesheet (CSS), which isn't cached, so it has to be loaded when we view the (cached) page.

Reply

Dan Thies @ 6:27 pm #

James, right now Google displays "Supplemental Result" next to pages that are found in the SI.

The rumors about it basically run along the lines of:
- Google engineers are tired of answering questions about the SI…
- They are not just a little tired , they are very very tired of it…
- So they will solve their problem by removing the label from SERPs…
- Which would let them keep the SI, without having to explain it…
- Because nobody can see it any more.

Dan

Reply

Jason T Chandler @ 7:37 pm #

hey Dan – since I have your ear… Is it true that Google can identify the content based on the GUI that produced it? We are getting 90% errors on some sites due to the server looking for massive amounts of files with the Dreamweaver lock on the end. LIKE this:

/images/blank.gif.LCK

Essentially G just added another factor to combat astro-turfers?

Reply

Dan Thies @ 8:02 pm #

You got me on that one, Jason. Are you saying you've got spiders requesting those files?

Reply

Manny @ 8:26 pm #

Dan, Jerry West already posted a hack on the stompernet forums, have you seen it?

Reply

Dan Thies @ 9:06 pm #

Yes, Manny. It's the same one Craig posted. It was working last week. It's not working now.

Reply

David Leonhardt @ 9:19 pm #

It seems to me that people spend a lot of time worrying about this whole SI thing that needn't be. As PageRank rises, fewer pages will be in the SI.

The best test for a given page really is to see where it ranks for its main search term, which is all that counts for that page anyway.

And the best way to get the page out of the SI is to build links to it or to its parent.

In my view, any page can be a first tier page if you build enough inbound links to it from other websites. So a page two levels deep from the home page can in fact be like a first tier page, and all the pages linking from it are much less likely to be lost in SI. But again, the key test is how each deep page ranks for its main search term.

The bigger the site, the more work this is (building deep links), but I am willing to bet that websites with a good array of linked-to pages just do not have the same SI concerns that sites with a high concentration of inbound links focused on the home page (of course, "I am willing to bet" is pretty poor evidence, but I'm not the stats cruncher to prove or disprove my own wild assertians).

Reply

Jason T Chandler @ 9:34 pm #

domain.com site:domain.com

But I do not know how you are applying it. I have not been doing hacks as I do not do rank checks… (or maybe it is I am just too G'd out by the end of the day to think about rock pigeons flying backwards). Good Luck Dan!

Reply

Dan Thies @ 9:57 pm #

Hi, David. Nice to see you here! Have you read the book yet?

I doubt you've watched my link building course, but it does cover the need for deeper linking beyond the home page.

Low PageRank for a deep page, especially as the site grows and obtains more inbound links, is still more likely to be a structural issue. As you move deeper, and the site gets larger, the exponential explosion of pages makes it exceptionally difficult to attract external links to every page.

(Assuming you don't have a giant link farm to work with… because I sure don't!)

Blogs are, or can be, an exception… if your blog has a sufficient presence and following you can expect every post to have some inlinks.

Reply

Dan Thies @ 10:17 pm #

Jason, what we're after is some way to craft a search query on Google that will list ONLY the Supplemental pages from a site. I suspect we're up against Google's intentions on this one, though.

Reply

Jeff Knize @ 11:53 pm #

Here is a search query that checks all C-Class datacenters and displays the number of URLs in the supplemental index at Google: http://oyoy.eu/google/supplemental/

For those interested, here is more information on the topic: http://oyoy.eu/huh/supplemental-tool/

Jeff

July 26, 2007
Reply

Dan Thies @ 11:05 am #

Jeff, it doesn't work any more. It's based on the same broken hack. Stuff like this "tool" (which massively abuses Google's services) is what forces Google to change stuff up on us.

Reply

Jeff Knize @ 11:32 am #

Okay Dan. I hope it comes back.

Thanks, Jeff

Reply

Jason T Chandler @ 11:45 am #

Dan – that is exactly what we have in our awstats when viewed with a log analyzer.

July 30, 2007
Reply

Mike Belasco @ 11:43 am #

Try this site:www.mydomain.com/&

Reply

Jeff Knize @ 11:56 am #

Thanks Mike. It works!

Jeff

Reply

Dan Thies @ 2:34 pm #

Mike, you are my new hero.

Reply

Mike Belasco @ 11:13 pm #

no problem Dan, with all the help you've given me over the last couple years, it really is nothing

July 31, 2007
Reply

Gail Mills @ 11:05 am #

Thanks Dan- I followed your recommendations re: tools for site testing re: Google SI etc. After testing and checking for content duplication,the only thing I could come up with is the duplication of my left hand index linking column on the site. The column is duplicated on every page. There isn't anything to inhibit the spider from the content. What is usually done in this situation? I studied Leslie's 330 video- no follow etc., but I am still confused as to how to handle the situation. What do you recommend? Gail

Reply

Gail Mills @ 11:13 am #

Mike- I did a copy and paste and received a 404 error. what are the symbols after .com
ie: site:www.mydomain.com/&- This is what I got???

Thanks,

Gail

Reply

Gail Mills @ 11:16 am #

Mike-

When it went up on the portal board it was converted differently. I get the ampersand. (and) sign. I have an apple. Thanks again.

Gail

Reply

Dan Thies @ 11:17 am #

Gail, type that into a Google search box.

For your site, two things I'd work on: 1) Redirects so that you don't have so many URLs for the home page and stuff. You have http://www.q—.com, q—.com, http://www.q—.com/index.html, etc. and that should be reduced to one variation. You also want to make sure that you only link to one version. Jerry West's 301 redirect & htaccess tutorials in the Stompernet forums are great at explaining how to do all kinds of redirects. 2) More incoming links to get the site itself more PageRank – your internal structure can only get you so far. You need more linking & promotion.

August 1, 2007
Reply

bart van der Velden @ 9:43 am #

Dan, I saw a hack on http://andrescholten.nl/index.php/bekijk-de-supplemental-index-van-google/ He says with "*-view" after the url in Google you get the supplemental results for that url. Is this correct?

Reply

Dan Thies @ 9:56 am #

Bart, that's the hack that used to work. Mike Belasco posted a new hack here yesterday, that does work: site:www.mydomain.com/& (add slash/ampersand after the domain)

Reply

bart van der Velden @ 9:59 am #

never mind, it's the stomper Jerry West hack but with one instead of 3 asteriks, so the same as Craigs.

Reply

Gail Mills @ 6:21 pm #

Thanks Dan- Followed your advise using the tool 'site://mydomain.com/&' for SI listings and they all disappeared. I have another question: The left hand column on my site is like a vertical nav bar. With links to different pages. However this vertical nav bar is duplicated on every page. Does this count as duplicate content? Is there a way to use some coding so it will he human friendly and not read by spiders? What does the expert recommend?

I just spoke with one the people I recommended you to and he agrees you are the best. Please explain exactly what you would like us to do as we 'spread the word'.

August 2, 2007
Reply

Michael VanDeMar @ 12:33 am #

And with one fell swoop, they make it almost impossible to diagnose the problem, without actually doing anything to help it:

Death of the supplementals label.

Nice, Google, nice.

Reply

thanos @ 12:47 am #

Hi Guys, this new hack for me, doesnt work, or my websites are not any more in SI :) . Can you give me a test domain name?

Thanks

Reply

Michael VanDeMar @ 12:50 am #

thanos, are you saying it doesn't work because when you search you don't see the supplemental label next to each of the results…?

Reply

Jason T Chandler @ 1:21 am #

http://googlewebmastercentral.blogspot.com/2007/07/supplemental-goes-mainstream.html

The supplemental label has been removed.

Reply

thanos @ 9:00 am #

Hi Michael…Yes… A week before i checked with the old method to see my SR, and i saw some pages, now i check it with the new method and i dont see anything.. or cause i used the google sitemap protocol to my websites?

February 29, 2008
Reply

Patrick Ryall @ 8:44 am #

God your behind the times aren't you supplementary index has been gone for a while and your advising people how to index spam better. Well all I can say is why not try some good content for a change and simple SEO and maybe all would be sound. Ranking and traffic is so simple it is called value content original content and a concious give it a try you could be surprised.

Reply

Dan Thies @ 10:26 am #

Patrick, apparently you can't read the dates on old posts…

As for the rest of your little rant… LOL.

November 15, 2009
Reply

Dating in London @ 6:34 pm #

Dan – can I plse just double check. Supplemental content no longer applies or at least is not longer displayed by Google? tia

Leave a Comment

Click here to cancel reply.

Fields marked by an asterisk (*) are required.

Subscribe without commenting

SEO Fast Start Info

  • How The SEO Fast Start Process Works
    • About Dan Thies
    • Step 1: Mapping Out Keyword Strategy
    • Step 2: Optimizing Site Structure
    • Step 3: Develop & Optimize Web Pages
    • Step 4: Link Building & Promotion
    • Step 5: Measuring Results & Resources
    • Step 6: Refining Your Strategy
  • SEOFS 2009 Download Page

Categories

  • Blog
  • Coolness From The Web
  • PPC – Pay Per Click
  • SEO Fast Start
  • Step 1: Keywords
  • Step 2: Structure
  • Step 3: Content
  • Step 4: Links
  • Step 5: Measure
  • Step 6: Refine
  • Tools

Recent Comments

  • Ben Pate on How To Get Referrer Data From Google's SSL Search
  • John @ Affiliate Training on Google's "Operation BENDOVER" Exposed - Nofollow & PageRank Sculpting
  • John @ Affiliate Training on Mastering Both Kinds Of Link Building - Authority & Reputation
  • Incorporate Online on Split Testing Adwords: You're Doing It Wrong
  • Treadmill on Link Building & Promotion Questions

Recent Posts

  • Plan B Week 5: Sourcing & Creating News
  • Plan B Week 4: Magnet Content Planning
  • How To Get Referrer Data From Google's SSL Search
  • Plan B Week 3: Building The Plan B Website
  • Plan B Week 2: Your Topic & Voice

The Best SEO Book is Free - Download Now!


Sign up today and you'll also receive:

  • Free "White Hat Black Belt" private site membership - over 4,000 members and growing!
  • Immediate access to all Dan's legendary 6-week link building video course
  • Email newsletter with priority notification of all SEO Fast Start updates
  • All yours, entirely free, when you join today!

Blogroll

  • Hamlet Batista
  • Ian Lurie – Conversation Marketing
  • Joost de Valk
  • Lynn Terry
  • Sebastian’s Pamphlets
  • SEO Training
    Join me in the SEO Braintrust for one on one training
RSS feed

Subscribe to this site's RSS feed.

Desktop Reader Bloglines Google Live Netvibes Newsgator Yahoo! What's This?
Keyword Strategy Site Structure On Page SEO Link Building Measuring SEO Results SEO Strategy
Copyright 2012 by Dan Thies
Made with Semiologic Pro • Copywriter, Gold skin by Denis de Bernardy