To CAPTCHA or not to CAPTCHA?


One of the interesting things about a blog is the comments section that enables readers and author to interact. The problem, as I have written before, is spam comments that just clog up the boards and waste people’s time. I have an open and unmoderated comment system which means that anyone can comment without registering or getting my prior approval but the catch is that it can be exploited by spam. The people who run the servers have filters for detecting and eliminating or quarantining spam but it is not foolproof and sometimes genuine comments may be excluded while spam comments may be allowed.

Spam is not generated by people who are merely being pests but because there is an underlying economic reason. Some email spammers seek to gain access to mail accounts from which to send their advertisements, while blog comment spammers are seeking to place links to businesses and hence drive up their search engine rankings. As a result, businesses have a financial motivation for creating spam and have generated software for doing so. The more popular your website is, the more you get targeted since the payoff is greater.

I get hit with quite a lot of spam that seems to come in waves. So a couple of times a day, I go into the comments file and clean house. I do this because it would be irritating for readers to encounter spam there or to have their genuine comments rejected. This housekeeping is quite tedious because I have to read all the suspect spam. Often it is easy to spot them because they contain gibberish. On other occasions they are generic comments or comments that are repeated over and over with slight variations and signatures. If it seems clear to me that a comment has no relevance to the post, I delete it.

But I have noticed recently that detecting spam is getting harder. Some spam comments try to deceive by using a sentence from the actual post as the comment. This makes the comment seem relevant though lacking a point. I can usually detect this dodge, even if my post in question is several years old, because I have a good sense of my own writing style. More difficult is when the spam consists of a sentence taken from the posting of a genuine commenter. If the comment does not quite fit, I look at all the comments to that post to see if this is the case.

Even more hard to judge are those short comments that are not copies of other people’s words but seem vaguely relevant to the topic. They often use some of the key words in the post, but are not really adding any value to the discussion and are written ungrammatically. These do not look like machine generated spam and are also not the mistakes of a careless or poorly educated native English speaker but more characteristic of someone for whom English is a second language.

Up until quite recently, my only criterion for accepting and rejecting comments was whether the comment was being generated a human or a machine. In making these judgments I am, in effect, running my own personal Turing test. To help me, I suggested to our webmaster that perhaps we should install one of those little tests that some sites have to detect whether a human or machine is trying to access it. These screening devices have to be simple enough that any human can easily solve them but difficult enough to fool a machine. The most familiar are the so-called CAPTCHAs, those curved letters and numerals on a cluttered background that you have to identify and type in before you are allowed in. Some sites like Machine Like Us require you to do a simple arithmetic sum.

I had thought I was fighting only machines because it would not be worthwhile to have real humans wandering the web and inserting spam. But it turns out that I was wrong. My increasing difficulty in telling whether some comments were being created by humans or machines was because, as a recent study of spam by a team of researchers at the University of San Diego showed, a lot of spam nowadays is being generated by actual humans and not machines, which would make adoption of a CAPTCHA system not as useful as I had thought. (Thanks to Kevin Drum for the link.)

The article discusses the economic basis for this development. The authors argue that in the arms race between software that generates spam and software designed to defend against these spam attacks, the defenders have pretty much won because the attackers are always playing catch up and really sophisticated automated CAPTCHA solvers are quite expensive and have to be updated so often that it discourages most spam operators from using them as not being economically viable. So CAPTCHA solving software is only used for sites that have low-level and static defenses, which are also usually not desired high-value sites.

But one consequence of this defeat is that the spam operators have turned to actual human beings to overcome the defenses. The study finds that businesses now hire people to prowl the web and insert spam. Like many things on the internet, while the market for CAPTCHA solving services has expanded, the wages of workers solving CAPTCHAs have been declining. The paper’s authors report that companies now commonly charge their customers as low as $2, or even $1, to have 1,000 successful hits. As a result, those companies now pay their workers even lower amounts, $0.75 or even $0.50 per 1,000, down from highs of $10 in 2007 As you can imagine, these jobs are being outsourced to those countries that have cheap labor (such as in Eastern Europe, Bangladesh, China, India and Vietnam) which probably explains the unusual grammatical errors of the spam I now detect. I am guessing that these workers are told to take some words from the blog post and insert into a generic comment to make it seem relevant.

All this leaves me with a minor moral dilemma. My obligation to the blog’s readers to maintain a clutter-free comment section means that I should ruthlessly weed out every comment that looks like it could be spam (human or otherwise) even if some genuine comments get thrown out in the process. On the other hand, I feel sorry for those poor bastards who are so desperate for work that they have to take dead-end jobs like this and spend their days posting pointless comments on site after site.

I decided to give my bleeding heart a vacation on this one issue and be ruthless. I figure that there are enough websites that do not care so much about preventing spam so as to provide income for these workers. Also, once the spammers get a ‘hit’ by successfully posting a bogus comment, they presumably get paid even if I come along a few hours later and delete them.

So if you find that I have deleted your genuine comment, please let me know and accept my apologies for thinking you were a spammer. And if you are puzzled by why some comments appear and then later disappear, you now know the reason.

POST SCRIPT: The Daily Show on objectionable words

<td style='padding:2px 1px 0px 5px;' colspan='2'The Hurt Talker
The Daily Show With Jon Stewart Mon – Thurs 11p / 10c
www.thedailyshow.com
Daily Show Full Episodes Political Humor Tea Party

Comments

  1. says

    I used to face the same spam problem on my blog which is a lot bigger and receive a lot more spam. may be you should use akismet which already comes in wordpress. Your current blog doesnt seem to be wordpress so I think some programmatic changes may be necessary in order to ensure no spam goes through. You can deploy word trigger method so if an xxx word appears in a comment, i may automatically go to spam box or get deleted. I can help a little if you say as i am a programmer myself. Thanks

  2. henry says

    The simplest way of handing the situation Mano is to remove the link that is created when someone enters something in the URL box. In fact, remove the entire URL box.

    This will stop the hyper-linked name that is created. For example in tahir’s post.

    Once the ability to create a link to a website is removed then spamming decreases.

    The next issue is when someone puts a website name or other phrase (buycheapdrugsonline for example) as their name.

    Simply have a policy where the name of the poster must be an actual name (not necessarily their real name) and delete any post that doesn’t conform.

    Once you take away the ability to promote an outside website the human generated spam will decrease.

    Finally, if you go to a Capthca type system I prefer the ones that ask the most basic of questions. ‘What color is the sky on a clear day’ or ‘Do you see the sun or moon at night?’

    If you write about 10 of these simple questions and have them rotate it will be an effective filter against ‘machine’ generated spam.

  3. says

    I have a different viewpoint, as you can tell by my use of a keyword phrase as my name. I hope you will consider my comments and take this viewpoint into account when making your decisions.

    First, I totally agree with you that there is way too much worthless, off point comment spam. These generic, valueless comments add nothing to the discussion and do just clutter up the real estate.

    And while Henry is right, removing the ability to link to a website will drastically decrease the amount of self serving comments your posts attract, it also eliminates the incentive for people like me to try to add meaningful ideas to the conversation in exchange for a innocuous link to our site.

    Using myself as an example, I am just a guy trying to make a little money on the side by providing what I hope is useful information on a website that I created and maintain.

    I don’t use automated software or pay someone to post comments on thousands of blogs. I do try to find a few blogs that allow me to create a simple link and then try to add value to the dialog in exchange for that link. I don’t consider myself a spammer because I always try to make relevant, meaningful comments.

    Having said all that, I don’t spend a lot of time commenting on blogs that don’t allow me to post a link and use a phrase as a name. In case you are wondering, the reason to use a certain phrase as your “name” is that the name usually links back to the website.

    Google and the other search engines keep track of the words used to link to websites. That’s how they determine which results to show on a search page. If people like me can get our sites to show up on the first page of search results, we have a better chance of making some money with those sites.

    So…while I understand and agree with the need to eliminate blatant spam comments, I would like to think that popular bloggers like yourself can find a way to allow those of us who genuinely try to add value to the discussion to continue to receive the value we desire in return, in the way of a link. And as for the Captcha question, I personally don’t mind them. That’s just part of the exchange.

    Either way, I hope this fairly long comment has shed a little light on a different viewpoint. I have never been a fan of the “zero tolerance” mindset that substitutes mindless compliance for good judgement. I hope there are still enough people willing to exercise their own judgement that both bloggers and responsible marketers can continue to support each others efforts.

    Thanks for hearing me out. And thanks for the link. I hope I have earned it 🙂

  4. says

    Aha!

    Interesting to hear cost/benefit analysis of a blogger in the implementation of anti-spam techniques…. I’m used to seeing things from a different perspective.

    You see Mano, I belong to the industry that is indeed part of the problem. The search engine optimization industry.

    An industry that is both loved and loathed, the bain of many a webmaster’s existence (not to mention Google) and YET, wholly necessary for the prosperity of both Google and Webmasters alike.

    Really, SEO is the reason that information retrieval works. Google’s algorithm is smart, and their resources are vast and nevertheless, Webmasters need to follow certain guidelines in order for their sites to be ‘read’ and recognized for the content they provide.

    Blog commenting is, in many ways, a very natural and organic way of ‘joining the conversation’.

    The dark side is obviously the computer generated junk that gets dilutes the quality of the victimized site, though perhaps its no less offensive to be attacked by 3rd world outsourced labor.

    In any event, I would like to thing that just as crude or obtrusive advertising is ignored, so too will be blog spam.

  5. says

    Captcha is really turning into a pain. It’s getting to the point that I can’t even understand what the phrase is showing.

    The comment by tahir to use akismet is a good one.

    I’ve put it into several of my sites and the spam level dropped immediately.

  6. says

    While I can see, as Henry suggests, how banning URLs would eliminate a lot of the spam, I don’t really want to do that. Links are the heart of the internet, creating new networks, and as long as people are adding value to the discussion, they are not a problem. I also see no harm in people advertising their own sites, as long as they contribute something meaningful in the comment.

    But what I am going to do is be more ruthless with comments that are just bland and seem designed solely to insert a link, even if they relate to the post. Of course this brings me back to having to monitor the comments but that’s the way it goes. If at some point it becomes too tedious, I will take more drastic steps.

  7. says

    Mano, so here’s a question. So long as a post is clearly relevant and “adds” to the dialogue at hand should it be treated as spam even though it may have been posted primarily for the linking benefit? Something to think about. Paul

  8. Jared A says

    That’s pretty clever, Paul. Oh, how I love the occasional self-referential statement. 🙂

  9. says

    I understand your dilemma. On my blog I have people trying to post about their replica watches all day long. Thank goodness for moderation. Some even send hate mail because I wont let them post. Have you experienced this?

  10. says

    In my opinion, using captcha’s could be difficult for older men with slight visual deficiencies. Some captcha are really hard to read, even though that majority of these security measures have audio versions yet still a bit hard to hear too because some audio translations have poor sound quality.

    With regards to the spamming issue, I must say that it’s relatively easy to identify. What I do is just read through the comments one by one and if the comment is VERY relevant to the post and at the same time it makes sense, even they put links and as long as these links are of general nature, no porn or pill sites. Even automated protection doesn’t work as it suppose to, some “non-spam comments” are placed or flagged as spam because of the nature of the comment. It happened to my brother when he commented on my site once.

    For me, manual checking is still the best way for filtering spam blog comments.

  11. says

    HP,

    No I haven’t had any hate mail. Perhaps it is because I moderate after the fact and do not pre-emptively block comments, although the people who designed the blog did include some filtering mechanism.

    Bradford, your suggestions seems like the way that I will go too.

  12. says

    I for one congratulate you on your judgment and regret the added effort weeding out the spam requires of you. I would think that you get a much wider readership because of the linking ability your site offers. And if I may say this without seeming to flatter (!) your site thoroughly deserves the readership. I enjoy everything I read here.

  13. says

    Although I do not use a Captcha at this point in time, I do not mind having to enter the info in order to leave a post. However, I would prefer everyone used a type of Captcha that is readable. Those that ask a simple math question are pretty nice.

  14. says

    i would highly recommend using captcha, since now adays there are lots of automation tools and softwares that can do just about anything once after they are programmed to do so, it could also be a huge threat for ambiguous spamming

  15. says

    Mano, I’ve been struggling with this for years. Jeremy does a good job trying to keep the spam filters up to speed, but the spammers are always coming back with new techniques to defy them.

    Since moving my blog to WordPress I have Akismet, and while it works well, a certain amount of garbage still sneaks through to the moderation queue. I’ve been seeing patterns similar to those that you describe, and try to be ruthless as well. My main rule of thumb is that the comment should add value to the discussion and be of some interest to readers.

    The sad thing about this is that it is a waste of time for all involved. The spammers waste time posting comments that will simply be deleted and never make it to the post. And you have to waste your time reading through the moderation queue to make sure the spam gets deleted. No one wins.

    But I do think it is worth the effort to ensure that readers can follow the comments without having to weed through a bunch of gibberish themselves.

    Regarding SEO, I would like to point out to “Restoring Intimacy in Marriage” and others that the Case system uses rel=”nofollow” on comment links. This means that Google is not counting these as inbound links to your sites, so the use of keywords doesn’t matter. Go ahead and use your human name. You will still get traffic from people who find your comment interesting--and click on your name to learn more--but it won’t count as a link the way a link from a post would.

    I hope the new semester is treating you well. Cheers!

  16. says

    CAPTCHA is kinda good for security purposes. But then again, sometimes it’s a waste of time. I even had an experience regarding this. I have to had so many attempts before I can continue to do some transactions which is so disgusting sometimes.

  17. says

    Its not just popular blogs Mano.. Some of my websites that NEVER get found by real visitors get found by blog commenting software every single day, many times a day. I dunno how these software find even sites that get almost no traffic when I have done nothing to give them exposure.

    On those sites where I have put in captcha, there is less spam but I hear that they can now also get through captcha using these software.

  18. says

    Well I for one hate captcha’s or at least many of them. Maybe it’s my old eyes, or fat fingers but I mess up on some of them way too often. With some systems that means you have to rewrite your post all over which is a pain and leads me to exit stage right.

    Now I sympathize with the blog comments. I like Akisimet and so far have found it to be pretty effective, most of the time. Now I monitor my comments daily, so it’s not a big deal to round file those that Akismet misses.

  19. says

    Its very popular on our site, we have done tons of hard work to get our rankings high for our keywords and we did it without spamming.

    Google is knows this and has changed a lot of their algorithm so spammers have very little time left…

    Thanks God, since Im getting tired of deleting all of them, just like you…

  20. says

    As an SEO specialist I see this come up all the time. I get my clients to post comments but ONLY when their input can add genuine value that truly contributes to the blog’s intent. The determined black hatters will not be deterred if asked to enter a CAPTCHA, however my hope is that online ‘etiquette’ may gradually creep its way back into social norms and then we all benefit by shared ideas and interaction. It may be slow and embryonic but let’s all make an effort to initiate some positive change.

  21. says

    Shed plans, Mike the carpet cleaner, debt collection guys? HILARIOUS!

    On a side note, I’ve put CAPTCHA into several of my sites and the spam level dropped immediately.

    Mercen

  22. says

    I don’t think the captcha system works at keeping out spam. Granted, it will keep *automated* spam from sending submissions, but it won’t keep individuals with too much free time on their hands from submitting spam.

    I agree that captcha just wastes the end-user’s time.

  23. says

    I’m really amazed that people are using comment sections just because to get more hits for their web sites. This is irresistible orientation of people to earn money. Maybe a way to prevent this spam comment attitude can be found but if there is a business to gain money than people will find a way to abuse that business system. I don’t know what can we do?

  24. says

    Great points…I would note that as someone who really doesn’t write on blogs much (in fact, this may be my first post), I don’t think the term “lurker” is very flattering to a non-posting reader. It’s not your fault in the least , but perhaps the blogosphere could come up with a better, non-creepy name for the 90% of us that enjoy reading the posts.

  25. says

    Some CAPTCHAs are hilariously difficult to read. Google comes to mind for having those.

    On the other hand, do questions like pick the fruit out of these options etc. actually reduce spam? I’m not sure. If they do, I love them. Especially if they’re funny.

    I don’t think commenters mind that much if sites/blogs have CAPTCHAs, it’s becoming the norm on the internet, but let’s hope that one day there’s a better way to reduce spam.

  26. says

    Its intresting topic on captcha or captcha not

    What my opinion captcha should be
    By thes way spammers not able to comments or post

  27. says

    Using CAPTCHA is quite simpler but hope captcha doesn’t filter all the spam.Good approach to clear spam the other way as stated above rather than using a captcha.Hope by this method all the bloggers are free from spam. Asking question instead of captcha is a good idea, recently i had faced problem in my blog that I used to receive spam messages still, when I installed captcha plug-in and in some cases doesn’t treat spammer as spam. SO I recommend every blogger to use question plug-in instead of captcha.

  28. says

    It definitely helps reduce spam on your website. Real costumers with real inquiries do not care about CAPTCHA, they will ask the question and send the message if that is what they really want.

  29. says

    Certainly CAPTCHA is the simplest tool to stop spam but it does not help completely. Many comments are made by human instead automated software so they can fake it however it does not help into gaining benefits from search engines.

    Although CAPTCHA minimize web administrator’s labor over 70%. Besides it, there are several plugins to stop Spam massages like using AKISMAT (only on wordpress), commenting through OpenID or facebook or by twitters, etc. which people can find easily on internet.

Leave a Reply

Your email address will not be published. Required fields are marked *