Home: Free SEO Book Download The SEO Fast Start Book Free Why SEO Fast Start?
 

Dan Thies

Hello, I'm Dan Thies. I've been writing about, teaching, and practicing search engine optimization for over 12 years.

In spite of what you may have heard, SEO is very simple. If you don't believe that, then you really need to read my book! It's only 100 pages long, and it's free.

January 10, 2008

Ding Ding, Google! Time For Cathedral v. Bazaar, Round 2?

5

When Jimmy Wales (one of the founders of Wikipedia) announced plans to create a search engine to compete with Google, some people took it seriously, while others dismissed it as a pie-in-the-sky fantasy. Here's what Jimbo himself has to say:

"Search is part of the fundamental infrastructure of the Internet. And we are making it open source. Wikia Search will start to change search from being proprietary, top-down, and closed."

Well, an early alpha release of that "open source" search engine is now online, albeit with a very small data set, and you can see the first hints at how the user interface will differ.

Matt Cutts blogged about it with a screenshot of the Nutch relevance scoring display. Michael Arrington called it a big disappointment. I don't know what he expected, really. It's Nutch for the search engine and Grub for the crawler – but we knew that was coming before they released it.

What interests me about Wikia isn't the current state of the index or SERPs. You'd expect those to be next to useless right now. What interests me is whether an open source, community effort can build a search engine to rival the best efforts of large commercial engines like Google. To put it another way, what are the limits of open source?

In The Cathedral and the Bazaar, which is one of the canons of the Open Source movement, Eric Raymond tells the story of how Linux managed to succeed, and of an open-source project that he himself led.

The parallels are interesting enough – like Linus Torvalds and Eric Raymond, the Wikia team has started with existing applications (Nutch and Grub). As with Linux, early releases don't look like much (I followed Linux from the very beginning, thanks to a co-worker who was on it from the start).

Whether this initial Wikia rises to the level that Eric Raymond described as a precondition for success, whether it will really get enough people excited, I am not qualified to say… but it doesn't look good. Linux got people excited back in 1991, but the bar was lower, because nobody had done anything like it before.

Sixteen years later, the community may be more jaded, and less likely to contribute to an effort that isn't necessarily truly open source. The community that they need to engage is also a lot broader.

For me, three big questions arise:

  1. How will Wikia engage the minds of information retrieval scientists, and not just coders? Writing software that runs and is dependable is one thing – Open Source has made the case that it can do this extremely well. But to build a great search engine, you need great algorithms, which means you need a lot of people who understand (for example) what BM25F does, the pros and cons of using it, etc. – and you need to somehow get these people to work together. Oh yeah, and they're starting at least 5 years behind. How can Wikia keep the 'best and brightest' engaged in an open source project, when the major search engines are hiring talent as fast as they can?
     
  2. How will Wikia's user-input improve search results, rather than helping the usual pissers game the system? It's hard enough to get a large user community to edit the Wikipedia without melting down – allowing user feedback on every SERP will not happen without a lot of challenges related to scale. How will Wikia respond to spam in general – the problem will only get bigger if they actually gain traction with users. Eyeballs attract spammers like flies to a rotting carcass – and they'll be probing for weaknesses every step of the way.
     
  3. How will Wikia survive success, in the unlikely event that they can solve the other problems? The scale of the physical operations for Google is simply staggering. The sound of every hard drive Google owns, moving at once, would probably be loud enough to knock down the walls of Jericho. Google has huge resources because of ad revenue, hundreds of extremely talented people to work full time (minus 20%?) on solving their growth and scale problems. If Wikia takes off, what will they have?

So is this really Cathedral vs. Bazaar, round 2?

Or is it just Google's Cathedral vs. Jimbo Wales' pet project? That will depend on what happens at Wikia, because building a search engine is not the same thing as building software. It's orders of magnitude more difficult. I wish them luck. I sorta hope that they're up to it. It would be nice to see the underdog at least make a good show of it.

Filed under Blog by Dan Thies #

Buzz Up Digg Mixx Twitter
Email Print
Current Delicious Diigo Facebook Fark Google LinkedIn Live MySpace Newsvine Propeller Reddit Slashdot Sphinn StumbleUpon Tip'd Yahoo! What's This?

Pings on Ding Ding, Google! Time For Cathedral v. Bazaar, Round 2?

January 11, 2008
  • Wikia.com Has Launched. Will This REALLY Be The Google Crusher?

Comments on Ding Ding, Google! Time For Cathedral v. Bazaar, Round 2? Leave a Comment

January 11, 2008
Reply

Clint Lenard @ 7:25 pm #

"What interests me is whether an open source, community effort can build a search engine to rival the best efforts of large commercial engines like Google. To put it another way, what are the limits of open source?"

I agree with you about the future and how things may come into play with being an open source community, but I tend to look at this project similar to PHP. Maybe I'm wrong, but it's being backed by a company and I believe this thing will take off. Will it be bigger than Google? I doubt it… and I'm sure the numbers will always be questionable, regardless… similar to how the number of Firefox users vs. Internet Explorer users statistics seem to add up. I always have MORE Firefox users on ALL of my sites versus Internet Explorer. I tend to believe that's partly due to the Commercial backing and the band wagon full of commercialized fans.

Google has always been the "do no evil" company, yet their slowly stepping on toes here and there. This might just help Wikia grow like wildfire once they get into the Beta phase.

But… then comes the future. An open source community can always be a little shaky. Hence: Mambo & Joomla.

I am looking forward to seeing how things pan out in the future. I'm rooting for Wikia, although I'm still a major fan of Google! ;-)

Great Article as always, Dan!

Reply

Dan Thies @ 9:21 pm #

Thanks for the great comments, Clint.

If Wikia can produce something that's even "arguably" better than Google (Firefox is better than IE, no?)… even if it's only "arguably better" for certain types of use, then it's going to be a great thing.

What makes it so difficult is that Wikia is stepping into an ecosystem where search engine spam has been evolving like a virus. Spam is countered by the major search engines' "immune system" which has also been evolving for years.

Stepping into that environment, Wikia has no immune system. If human intervention proves to be effective in that regard, then they have a chance to survive.

If they can succeed against the spam, without simply creating a new form (manipulative human intervention), then all they have to do is solve all the other extremely difficult challenges standing between square one and good search results.

Wikipedia's "success" in maintaining its integrity vs. spammers is touted, but even Wikipedia wasn't able to stop the manipulative intervention. That's why they had to implement nofollow in order to become a less tempting target.

Wikia won't be able to do that, and the job of protecting dynamic search results is massively greater than protecting a finite number of pages where in most cases there is an objective standard against which a change can be evaluated.

So it's not going to be easy.

I sincerely hope that there is enough good creative energy in the world to make Wikia work. But Jimmy Wales sure could have picked something easier to do. :D

January 12, 2008
Reply

Clint Lenard @ 1:07 pm #

Hey Dan,

That's actually a great point, regarding spam. Wikia is 100% human, correct? If so, then yes… it's going to be a nightmare for them without some sort of Software integration/Intelligence program to prevent the most obvious spam so that they can handle the not-so-obvious… ha… yes, it's going to he hell for them either way.

I guess most people (like me) don't even realize how tough it is for these Search Engines, at first glance, to keep away Hardcore Spammers. I guess that is definitely going to be a major issue for them.

But being an Open Source project, I'm hoping that they will be able to figure something out in the fight against spam. I guess I'm just figuring that, much like the PHP community, there will be a lot of high profile names stepping in to help with the success of the Search Engine. Everybody wants to be appreciated… and with a Major (possible) Search Engine, I guess I "want" to believe that there will be a lot of new "up and comers" willing to step up and gain some recognition.

Ehhh… I'm a little tired right now. Hopefully I made a little sense! ;-)

May 13, 2009
Reply

sumera @ 10:42 pm #

Can someone plz explain BM25 and BM25F to me. i also need formula of BM25F

Leave a Comment

Click here to cancel reply.

Fields marked by an asterisk (*) are required.

Subscribe without commenting

SEO Fast Start Info

  • How The SEO Fast Start Process Works
    • About Dan Thies
    • Step 1: Mapping Out Keyword Strategy
    • Step 2: Optimizing Site Structure
    • Step 3: Develop & Optimize Web Pages
    • Step 4: Link Building & Promotion
    • Step 5: Measuring Results & Resources
    • Step 6: Refining Your Strategy
  • SEOFS 2009 Download Page

Categories

  • Blog
  • Coolness From The Web
  • PPC – Pay Per Click
  • SEO Fast Start
  • Step 1: Keywords
  • Step 2: Structure
  • Step 3: Content
  • Step 4: Links
  • Step 5: Measure
  • Step 6: Refine
  • Tools

Recent Comments

  • Ben Pate on How To Get Referrer Data From Google's SSL Search
  • John @ Affiliate Training on Google's "Operation BENDOVER" Exposed - Nofollow & PageRank Sculpting
  • John @ Affiliate Training on Mastering Both Kinds Of Link Building - Authority & Reputation
  • Incorporate Online on Split Testing Adwords: You're Doing It Wrong
  • Treadmill on Link Building & Promotion Questions

Recent Posts

  • Plan B Week 5: Sourcing & Creating News
  • Plan B Week 4: Magnet Content Planning
  • How To Get Referrer Data From Google's SSL Search
  • Plan B Week 3: Building The Plan B Website
  • Plan B Week 2: Your Topic & Voice

The Best SEO Book is Free - Download Now!


Sign up today and you'll also receive:

  • Free "White Hat Black Belt" private site membership - over 4,000 members and growing!
  • Immediate access to all Dan's legendary 6-week link building video course
  • Email newsletter with priority notification of all SEO Fast Start updates
  • All yours, entirely free, when you join today!

Blogroll

  • Hamlet Batista
  • Ian Lurie – Conversation Marketing
  • Joost de Valk
  • Lynn Terry
  • Sebastian’s Pamphlets
  • SEO Training
    Join me in the SEO Braintrust for one on one training
RSS feed

Subscribe to this site's RSS feed.

Desktop Reader Bloglines Google Live Netvibes Newsgator Yahoo! What's This?
Keyword Strategy Site Structure On Page SEO Link Building Measuring SEO Results SEO Strategy
Copyright 2012 by Dan Thies
Made with Semiologic Pro • Copywriter, Gold skin by Denis de Bernardy