Mon Jan 3rd 2005, 12:59am
CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart
Captcha is quite clever; it's a great way of blocking out automated spam bots from posting to web pages. A human, if asked, can read the above as "smwm", but surely no computer could. If you gate your web forms with this kind of test, then you can be reasonably confident that any posting must be done by a human.

Wikipedia has a good writeup on Captcha. I especially like the discussion on circumvention techniques, for example how spammers use unwitting visitors to porn sites to "crack" Captcha images at various popular email providers for them. Wanna see the pictures? Enter the word below. Ha ha, great stuff.

I haven't had a problem with spam bots here at my site, given that it's a very small out of the way site, and it probably helps that I use my own custom blog engine that none of the ready-made spambots support out of the box (unlike users of popular blog engines like Movable Type). But hearing more and more lately about spambots defacing people's web sites made me realize that if someone wanted to, it would be completely trivial to write a perl script to spam my site with billions of comments and bring it to its knees. So I found a really simple Captcha implementation at SourceForge and threw it up as a second challenge page whenever someone posts a comment. Again, it's very simple, but it seems like it'll be very effective. I've also set up a standalone captcha demo page you can play with. Try it out, fun stuff.

Visitor comments
On Mon Jan 3rd 2005, 10:51am, Forest posted:
Hmm, I might have to actually get off my kiester and do something like this on my site. I'm really sick of all the spam comments I have to moderate through every day.

On Wed Jan 26th 2005, 10:14pm, Erik G. Burrows posted:
I wonder how specific these spam bots are. My site is by no means huge, but it has seen over 1000 comments, and never one spambot posting.

On Wed Jan 26th 2005, 10:37pm, Erik G. Burrows posted:
By that I mean I wonder how specific to blog software package. Perhaps the bots aren't smart enough to decypher the 'name' and 'comment' names I have given my site's form elements!

On Wed Jan 26th 2005, 11:23pm, Steve Kehlet posted:
Yeah, I'm pretty sure they're highly specific to particular blog packages. A lot of the problems I read about were with MovableType, and I know Forest and Will use Wordpress. Script kiddies at work again. As trivial as it might be to spam custom blog software, it would mean actually understanding a little HTML and writing some code.

On Thu Jan 27th 2005, 12:13pm, Will posted:
They are very specific. They only target the main packages, they look for specific files. You can actually defeat them somewhat at first, by changing the name / location of your comment script. There's lots of good stuff you can do. One that I want to try is adding the Apache Mod Rewrite directive to get rid of comments that don't have a referrer field.

On Thu Jan 27th 2005, 4:09pm, Steve Kehlet posted:
Nice, sounds like a clever hack.

On Thu Jan 27th 2005, 4:20pm, Erik G. Burrows posted:
That keeps out most bots, but I've found that requiring a referrer value for a page cuts out almost half of the total hits a page gets. Some of those will be bots and spiders, but many older and off-beat browsers don't send referrer data.