Ding Ding, Google! Time For Cathedral v. Bazaar, Round 2?

When Jimmy Wales (one of the founders of Wikipedia) announced plans to create a search engine to compete with Google, some people took it seriously, while others dismissed it as a pie-in-the-sky fantasy. Here’s what Jimbo himself has to say:

"Search is part of the fundamental infrastructure of the Internet. And we are making it open source. Wikia Search will start to change search from being proprietary, top-down, and closed."

Well, an early alpha release of that "open source" search engine is now online, albeit with a very small data set, and you can see the first hints at how the user interface will differ.

Matt Cutts blogged about it with a screenshot of the Nutch relevance scoring display. Michael Arrington called it a big disappointment. I don’t know what he expected, really. It’s Nutch for the search engine and Grub for the crawler – but we knew that was coming before they released it.

What interests me about Wikia isn’t the current state of the index or SERPs. You’d expect those to be next to useless right now. What interests me is whether an open source, community effort can build a search engine to rival the best efforts of large commercial engines like Google. To put it another way, what are the limits of open source?

In The Cathedral and the Bazaar, which is one of the canons of the Open Source movement, Eric Raymond tells the story of how Linux managed to succeed, and of an open-source project that he himself led.

The parallels are interesting enough – like Linus Torvalds and Eric Raymond, the Wikia team has started with existing applications (Nutch and Grub). As with Linux, early releases don’t look like much (I followed Linux from the very beginning, thanks to a co-worker who was on it from the start).

Whether this initial Wikia rises to the level that Eric Raymond described as a precondition for success, whether it will really get enough people excited, I am not qualified to say… but it doesn’t look good. Linux got people excited back in 1991, but the bar was lower, because nobody had done anything like it before.

Sixteen years later, the community may be more jaded, and less likely to contribute to an effort that isn’t necessarily truly open source. The community that they need to engage is also a lot broader.

For me, three big questions arise:

  1. How will Wikia engage the minds of information retrieval scientists, and not just coders? Writing software that runs and is dependable is one thing – Open Source has made the case that it can do this extremely well. But to build a great search engine, you need great algorithms, which means you need a lot of people who understand (for example) what BM25F does, the pros and cons of using it, etc. – and you need to somehow get these people to work together. Oh yeah, and they’re starting at least 5 years behind. How can Wikia keep the ‘best and brightest’ engaged in an open source project, when the major search engines are hiring talent as fast as they can?
     
  2. How will Wikia’s user-input improve search results, rather than helping the usual pissers game the system? It’s hard enough to get a large user community to edit the Wikipedia without melting down – allowing user feedback on every SERP will not happen without a lot of challenges related to scale. How will Wikia respond to spam in general – the problem will only get bigger if they actually gain traction with users. Eyeballs attract spammers like flies to a rotting carcass – and they’ll be probing for weaknesses every step of the way.
     
  3. How will Wikia survive success, in the unlikely event that they can solve the other problems? The scale of the physical operations for Google is simply staggering. The sound of every hard drive Google owns, moving at once, would probably be loud enough to knock down the walls of Jericho. Google has huge resources because of ad revenue, hundreds of extremely talented people to work full time (minus 20%?) on solving their growth and scale problems. If Wikia takes off, what will they have?

So is this really Cathedral vs. Bazaar, round 2?

Or is it just Google’s Cathedral vs. Jimbo Wales’ pet project? That will depend on what happens at Wikia, because building a search engine is not the same thing as building software. It’s orders of magnitude more difficult. I wish them luck. I sorta hope that they’re up to it. It would be nice to see the underdog at least make a good show of it.

5 thoughts on “Ding Ding, Google! Time For Cathedral v. Bazaar, Round 2?

  1. “What interests me is whether an open source, community effort can build a search engine to rival the best efforts of large commercial engines like Google. To put it another way, what are the limits of open source?”

    I agree with you about the future and how things may come into play with being an open source community, but I tend to look at this project similar to PHP. Maybe I’m wrong, but it’s being backed by a company and I believe this thing will take off. Will it be bigger than Google? I doubt it… and I’m sure the numbers will always be questionable, regardless… similar to how the number of Firefox users vs. Internet Explorer users statistics seem to add up. I always have MORE Firefox users on ALL of my sites versus Internet Explorer. I tend to believe that’s partly due to the Commercial backing and the band wagon full of commercialized fans.

    Google has always been the “do no evil” company, yet their slowly stepping on toes here and there. This might just help Wikia grow like wildfire once they get into the Beta phase.

    But… then comes the future. An open source community can always be a little shaky. Hence: Mambo & Joomla.

    I am looking forward to seeing how things pan out in the future. I’m rooting for Wikia, although I’m still a major fan of Google! ;-)

    Great Article as always, Dan!

  2. Pingback: Wikia.com Has Launched. Will This REALLY Be The Google Crusher?

  3. Thanks for the great comments, Clint.

    If Wikia can produce something that’s even “arguably” better than Google (Firefox is better than IE, no?)… even if it’s only “arguably better” for certain types of use, then it’s going to be a great thing.

    What makes it so difficult is that Wikia is stepping into an ecosystem where search engine spam has been evolving like a virus. Spam is countered by the major search engines’ “immune system” which has also been evolving for years.

    Stepping into that environment, Wikia has no immune system. If human intervention proves to be effective in that regard, then they have a chance to survive.

    If they can succeed against the spam, without simply creating a new form (manipulative human intervention), then all they have to do is solve all the other extremely difficult challenges standing between square one and good search results.

    Wikipedia’s “success” in maintaining its integrity vs. spammers is touted, but even Wikipedia wasn’t able to stop the manipulative intervention. That’s why they had to implement nofollow in order to become a less tempting target.

    Wikia won’t be able to do that, and the job of protecting dynamic search results is massively greater than protecting a finite number of pages where in most cases there is an objective standard against which a change can be evaluated.

    So it’s not going to be easy.

    I sincerely hope that there is enough good creative energy in the world to make Wikia work. But Jimmy Wales sure could have picked something easier to do. :D

  4. Hey Dan,

    That’s actually a great point, regarding spam. Wikia is 100% human, correct? If so, then yes… it’s going to be a nightmare for them without some sort of Software integration/Intelligence program to prevent the most obvious spam so that they can handle the not-so-obvious… ha… yes, it’s going to he hell for them either way.

    I guess most people (like me) don’t even realize how tough it is for these Search Engines, at first glance, to keep away Hardcore Spammers. I guess that is definitely going to be a major issue for them.

    But being an Open Source project, I’m hoping that they will be able to figure something out in the fight against spam. I guess I’m just figuring that, much like the PHP community, there will be a lot of high profile names stepping in to help with the success of the Search Engine. Everybody wants to be appreciated… and with a Major (possible) Search Engine, I guess I “want” to believe that there will be a lot of new “up and comers” willing to step up and gain some recognition.

    Ehhh… I’m a little tired right now. Hopefully I made a little sense! ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>