How To Get More Pages Indexed With Nofollow

I knew Chapter 4 of SEO Fast Start (on site structure) was going to be just a little bit controversial… but it really shouldn’t be. In this post I will briefly give some facts about where we are, controversy-wise, just to get you up to speed. I hope that a brief statement of the facts and a little explanation will help you filter out the noise that’s going around about this subject.

The timing is interesting, because I had already planned a tutorial for this week, on the pros, cons, ins, outs, and reasons for using "dynamic linking" (nofollow is just a tool) within your site… then a great new tool was released that makes the whole thing a lot easier… and along comes the sound and fury of controversy to make it "topical."

If you’re not interested in the controversy and just want to learn how to use nofollow, don’t worry, because I’ll get to the meat pretty quickly. (If you don’t know what nofollow means, you may want to read the book first).

The Nofollow Controversy Rages – But Why?

Google’s reps have been telling us for over a year that it’s OK to use nofollow on your own internal links, although they usually emphasize that it’s not good for guaranteeing that a page will not be indexed, since they may find other links that aren’t nofollowed. This is actually an important feature that we make full use of in dynamic linking, BTW. Anyone who tells you that using nofollow means removing pages from the index simply doesn’t understand it yet.

Last week, Rand Fishkin published an interview with Google’s Matt Cutts. Matt repeated, in plain English, that it’s perfectly safe to use nofollow on your internal links, to control the flow of PageRank within your own site. I thought this would end the controversy, but Rand’s interpretation of Matt’s comments left an opening for the semantic parsers of the world to pick a fight.

Rand’s words: 

Nofollow is now, officially, a "tool" that power users and webmasters should be employing on their sites as a way to control the flow of link juice and point it in the very best directions.

If you replace the word "should" with "could" then nobody would have a nit to pick… but he did say "should" so let me deal with that.

The Big Question – Should You Use Nofollow?

My answer to this question is an unqualified "maybe!" I can’t really stand behind that answer with pride, because it’s no kind of answer at all, so maybe I should explain a bit more…

In SEO Fast Start, I answered "yes," but the implementation is very limited, because while the "fast start" method is intended to be a framework for all SEOs, the book itself was primarily written as a beginner’s guide.

So for beginners, I described a very minimal implementation that involves nofollowing some links to "overhead pages" like privacy policies, contact info, terms & conditions, etc. This is a "play it safe" approach, which should at least deliver some benefits.

Once you get past a very minimal implementation, it’s very easy to screw things up. So, if you don’t truly grok how PageRank works, you probably don’t want to mess around with it. Although I did outline several "advanced" nofollow & dynamic linking techniques in the book, I claim no responsibility for your ability to understand PageRank.

The #1 Goal Is To Get More Pages Indexed

Before I go any further, let me explain why you might want to control the flow of PageRank within your site. It boils down to one major goal – index penetration. If you can get a little bit more PageRank to your most important content, by taking some away from less important content, you just might be able to get more of your pages into Google’s index. That’s it – that’s the key point. Getting more of your important pages indexed.

If you expect to funnel so much extra PageRank to your "money pages" that they will leap to the top of the rankings, then you’re probably dreaming, because you can only accomplish so much with changes to your site structure. The primary impact on your "money pages" will come from getting more of your other pages indexed, because the additional pages can be used to link (with appropriate anchor text) into your money pages.

Now, if your site is so small that you could literally link to every page from the home page (<150 pages), the minimal implementation (as described in Chapter 4 of SEOFS) is about all you’d ever want to do. Likewise, if your site has very little PageRank coming in from external links, then you probably have bigger fish to fry, so do the minimal implementation, if that, and get to work on more important stuff.

If you have a large site, with a lot of sitewide links to "overhead" pages, and you’re having a hard time getting your deeper pages indexed, then changes to your site structure can make a big difference in how many pages get indexed. One of my students worked through a major site restructuring last year, and went from a few hundred to over 1000 pages indexed – with significant gains in traffic and sales.

The Real Issue Is Site Structure – Nofollow Is Just A Tool

No matter what your situation, the key question isn’t really about nofollow at all. The key question is whether you can improve your position with search engines by changing the internal linking structure of your web site. Most of us can do at least a little bit better, because it’s very unlikely that you’ve developed the optimal structure by chance.

Once you’ve decided to make changes to your linking structure, it’s really down to making choices about the methods you’re going to use. Using nofollow allows you to "cut" links out of the PageRank calculation, without taking them away from users. This makes the nofollow attribute a handy tool, because you can make some kinds of structural modifications transparent to your site’s users.

For the sake of usability, you probably want links to your privacy policy, shopping cart, terms & conditions, contact information, etc. on every page. In fact, you may want some of those pages indexed (contact information) because people do use site: searches to find that kind of information… the question is whether you want those "overhead pages" to be more important (have more PageRank) than your real content (product pages, etc.)

How PageRank Flows Inside Your Site

PageRank, to misquote a friend of ours, is a very subtle beast. PageRank attempts to decide which of the pages you’re linking to are more important, by simulating a "random surfer" who blunders around the site, clicking links. The more times the random surfer stumbles across a given page, the more PageRank it has.

When this random surfer does his work at the scale of the web, the result is wonderful. Important web sites and even particularly important pages (those cited frequently on the web at large) end up with more PageRank. It’s a good thing. Link spam influences it to a degree, but you’d have to be one hell of a spammer to get more PageRank than, say, Amazon.com. Link spam probably has a lot more influence because of anchor text than it does on PageRank. You can hate Google if you like, but PageRank is a beautiful innovation.

Anyway… back to the point.

One of the "subtle things" about PageRank is that the amount flowing out of a page is divided up between all the links on the page. If there are 10 links, each one gets one tenth. If there are 100 links, each one gets 1% of the PageRank that flows out from the page. So removing a link means that the other links carry more weight. If one out of every five links points to an "overhead" page, then 20% of your PageRank is flowing into pages that you don’t really care about very much from an SEO perspective. But you need those links, don’t you?

PageRank works great at the scale of the web, but not so well once it gets inside of your web site. That’s because your web site will have a lot of links that you need for accessibility, usability or legal compliance, which lead to pages that aren’t especially interesting or important. Nobody "out there" on the web is linking to your "earnings disclaimer" page, but if you have to have one, you probably have to link to it from every page on your site.

It actually helps to understand this if you put yourself in the position of the spider, and pretend you’re standing on the home page, faced with dozens of links that all look the same. To borrow from Crowther & Woods, it appears to be "a maze of twisty little passages, all alike."

Unless you do something about it, the overhead pages on your site get more PageRank than they really deserve. You can remove these overhead pages from the index by using robots.txt or a robots meta tag, but completely removing them actually reduces the total amount of PageRank inside your site.

Completely blocking spiders from these pages also means that they can’t be found by visitors using a site: search, so it’s not the greatest thing you could ever do for usability – what if someone is trying to find your privacy policy, or searching for your fax number?

Nofollow Gives You Some Control Over PageRank Flow

I say "some control," because nofollow isn’t a magic swiss army knife. It’s just a tool. If you think of every link on your site as a valve that pushes some PageRank on to the next page, nofollow simply lets you turn some valves off. This increases the amount of PageRank flowing through the remaining links. By "nofollowing" the links to your overhead pages (except, perhaps, from your sitemap) you move more into your important pages. It’s that simple.

The total amount of PageRank that you have to play with is a function of how much is coming in from the web (mostly), and how many pages you have indexed. You can get more, but no matter how much you get, it still has to be divided up between the pages on your site. Nofollow can’t create more PageRank than you already have, unless you actually get more pages indexed.

Because of the way PageRank flows, your home page will normally have a lot more than your "second tier" pages, which will have a lot more than your "third tier" pages. So although nofollow can help you increase the share of PageRank that flows into each tier, if you want a specific page to get the most possible PageRank, you have to link to it from pages that have some to share – like the home page, many second tier pages, etc.

If The Whole Thing Gives You "Tired Head," You’re Not Alone

Thinking about this stuff wears me out… actually doing the math is even more of a beating.  If you’re like me, you’ll do the simple stuff and then move on. If you really want to get hardcore about it, you’re going to need tools… I’ve built them on my own in the past, and I wouldn’t dare share the kind of spaghetti code that I write with the world.

Fortunately, there is a tool out there that you can use… and it’s free (ain’t the web cool?). Halfdeck (of SEO4Fun) has recently released a free tool called the PageRankBot that will spider your site and map out the distribution of PageRank. He’s labeled it badly as a supplemental results detector, because it’s actually a lot cooler than that. There will be some work involved in installing it, and I am not on board for tech support. With that caveat, it can be all kinds of fun to play with once you get it running.

He even used it to simulate a ‘3rd level push’ – sort of (I don’t think he cut the links from the second tier to the home page and left the sitewide links in place), and simply by playing around realized that the "sitewide" links were holding him back from getting more PageRank deeper into the site. It would take you a lot of time to do that without a tool – with it, he sorted out a better PageRank distribution in an afternoon.

To Learn More: Read Chapter 4 of SEO Fast Start – It’s Free

With apologies to our guests, most of the folks reading this have already downloaded SEO Fast Start… so rather than repeat it all here, I’ll refer you to Chapter 4 of SEO Fast Start. The book is free, but if you’re not sure that it’s worth the few minutes it would take for you to go download it, you can read my explanation here.

Discuss…

(PS – I will never buy the idea that Google’s just trying to trick us into revealing our sites as "SEO’d" – I think they can spot the kind of SEO they care about by looking at the anchor text of inbound links)