Successfully fighting wiki spam

Home - About » Industry Blog - April 5, 2005 « Previous Entry - Next Entry
Computer Science
Research, Industry Work,
Programming
Community Service
Hillside Group, CHOOSE,
Stanford GSA
The Serious Side
Business School,
Learning Chinese
Humorous Takes
Switzerland, United States,
Software, Fun Photos
Travel Stories
Europe, United States, Asia
  
Living Places
Berlin (+ Gallery), Zürich
Boston, S.F. + Bay Area

I run a few wikis, most notably A Geek's Tour of Silicon Valley. After getting hit with wiki spam and watching it for a while, I decided to go YAGNI (*). Or, more precisely, to do the simplest thing that could possibly work to stop the spammers from defacing my wikis. I've been free of wiki spam for a few months now, so here is how it has worked for me.

Wiki spam is text that a person or a machine enters into your wiki with the goal of increasing the number of links that point to some websites the spammer is trying to promote. Thus, most of the text that a spammer enters into a wiki is links to his or her websites. They do this, because Google and other search engines present you pages ranked by popularity, and the number of links pointing to a webpage is an indicator of its popularity and importance.

First, I implemented Google's proposal for stopping comment spam that marks links out of your wiki as not to use in search engine algorithms. Basically you add the text ‹rel="nofollow"› to every link you present to your users. Your users don't care, because they don't see it. But the search engine bots will read it and decide to ignore these links. If everyone implemented this, wiki spam would have no chance. A smart spammer would check your website first and decide only to spam it if it actually helps his job. So this is as much a technical as it is a psychological measure.

Problem was, my wikis were already under spam attack. Implementing Google's proposal after some spammer had put my wikis into his list of spamming targets wouldn't get me out of there. The spammers' bots were on auto-pilot, spamming me automatically, without a human spammer ever looking at my wikis again whether it made sense to spam them or not. Hopefully, though, the nofollow directive will prevent that new spammers put my wikis into their lists.

So a reverse Turing Test it had to be. A Turing Test is a test that distinguishes man from machine. In its traditional version, man has to figure out whether they are talking to a machine, in the reverse version a machine (i.e. the server) is trying to figure out whether it is talking to another machine or a human. You may have seen it when you registered for your Yahoo! account or asked Whois for some data. Usually, you are presented some complicated picture and you have to read and type in the letters and numbers you see. Humans can do this, machines (presently) not.

The reverse Turing Test was supposed to protect the Save button. Only if the person or machine passed the test, would changes to a wiki page be saved. Otherwise, the changes would be rejected, leaving the wiki intact. But how to do a reverse Turing Test? Do some complicated graphics, implement some Java code? My wikis are some stoneage branch of Ward's original Perl code, and I'm probably the most opportunistic (and clueless) Perl hacker that is out there.

So I simply extended the edit page of my wiki with a textbox next to which it says "enter code 1234 here", see below. If a human sees this, he or she is likely to duly enter 1234 into the text field and then press save. A machine is likely to overlook it. Yes, machines are that dumb, aren't they?

I admit, it is trivial. It is outrageously trivial. I'm not even changing the passcode. It is always the same, hard-wired into my Perl code. But it works. I've been free of wiki spam for months now. It may be that the wiki spammers never checked back whether their attempted changes actually had any effects. If they do, they'll see that I'm using the nofollow directive. That should make them leave me alone.

If you and your wiki are truly large and important, and you don't want to use the nofollow directive, you'll attract enough spamming intelligence to be forced to use more complicated means. Cory Doctorow describes an ingenious way of how spammers throw the ball right back into your court.

But for me and my small to medium-size wikis, it is looking good right now. Case closed, I hope, for the foreseeable future.


(*) YAGNI is XP speak for You Ain't Gonna Need It, and it admonishes you to not overdesign but to use the simplest solution that could possibly work.

Copyright (©) 2007 Dirk Riehle. Some rights reserved. (Creative Commons License BY-NC-SA.) Original Web Location: http://www.riehle.org