The best spam filter ever

The famous Turing Test tests the intelligence of computers in the following way: if a computer can convince us it is a human, it is probably as intelligent as a human (that’s not Turing’s original version, but it’s better known).

What is interesting is that although Turing focused on language and problem solving, one of the easiest ways of telling a human from a machine is our perceptual system — especially our sense of vision, which in humans is the dominant of the five senses. So one of the most important forms of the Turing Test today is actually a vision test.

To get an account on just about any website, you must prove you are human by copying a few sloppily-written letters. Machines, despite decades of research, are very bad at visual recognition of objects, including alphanumeric letters. These bits of text that you have to rewrite are called CAPTCHAs.

Bring to the scene reCAPTCHA. The website says it all:

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.

That is, each CAPTCHA is in fact a section of text in a book that is being digitized that their computer is unable to read. I have no idea how well the system has worked so far or the details of the implementation, but the idea is brilliant, and really captures part of what makes the Web so powerful: millions of people all donating just a few minutes of their time. This is of course what has given us the Fray, Wikipedia, and Web-based experiments. But unlike those cases, filling out CAPTCHAs is something people have to do anyway.


May 28, 2008

3 Responses to The best spam filter ever

  1. Anonymous June 13, 2008 at 6:21 am #

    If people have to digitize a book not digitized yet, how does the site know the correct answer?

  2. coglanglab June 13, 2008 at 11:10 am #

    They don’t say, so I’m not sure.

    One way you could do it is to have part of the CAPTCHA be something that has already been decoded. Behavioral experiments often work the same way. For some of the stimuli, you know how people should respond; only part of the experiment is actually “experimental.” This way you can check to make sure your participants are awake, paying attention, etc.

    Please try my web-based experiments

  3. Anonymous May 28, 2008 at 9:45 am #

    Now that CAPTCHAs are falling out of favor, finally a use for them. By the time they can be really helpful, they’ll be gone.

    Pass-through attacks (spam bot grabs CAPTCHA image and presents it to one of its own users as a hurdle for a pron page view, then takes the user’s answer and sends it to the form it was trying to break) are so prevalent that CAPTCHAs no longer protect against bots. New schemes are being devised, but none is great nor can any prevent pass-through.