One of the interesting things about a blog is the comments section that enables readers and author to interact. The problem, as I have written before, is spam comments that just clog up the boards and waste people’s time. I have an open and unmoderated comment system which means that anyone can comment without registering or getting my prior approval but the catch is that it can be exploited by spam. The people who run the servers have filters for detecting and eliminating or quarantining spam but it is not foolproof and sometimes genuine comments may be excluded while spam comments may be allowed.
Spam is not generated by people who are merely being pests but because there is an underlying economic reason. Some email spammers seek to gain access to mail accounts from which to send their advertisements, while blog comment spammers are seeking to place links to businesses and hence drive up their search engine rankings. As a result, businesses have a financial motivation for creating spam and have generated software for doing so. The more popular your website is, the more you get targeted since the payoff is greater.
I get hit with quite a lot of spam that seems to come in waves. So a couple of times a day, I go into the comments file and clean house. I do this because it would be irritating for readers to encounter spam there or to have their genuine comments rejected. This housekeeping is quite tedious because I have to read all the suspect spam. Often it is easy to spot them because they contain gibberish. On other occasions they are generic comments or comments that are repeated over and over with slight variations and signatures. If it seems clear to me that a comment has no relevance to the post, I delete it.
But I have noticed recently that detecting spam is getting harder. Some spam comments try to deceive by using a sentence from the actual post as the comment. This makes the comment seem relevant though lacking a point. I can usually detect this dodge, even if my post in question is several years old, because I have a good sense of my own writing style. More difficult is when the spam consists of a sentence taken from the posting of a genuine commenter. If the comment does not quite fit, I look at all the comments to that post to see if this is the case.
Even more hard to judge are those short comments that are not copies of other people’s words but seem vaguely relevant to the topic. They often use some of the key words in the post, but are not really adding any value to the discussion and are written ungrammatically. These do not look like machine generated spam and are also not the mistakes of a careless or poorly educated native English speaker but more characteristic of someone for whom English is a second language.
Up until quite recently, my only criterion for accepting and rejecting comments was whether the comment was being generated a human or a machine. In making these judgments I am, in effect, running my own personal Turing test. To help me, I suggested to our webmaster that perhaps we should install one of those little tests that some sites have to detect whether a human or machine is trying to access it. These screening devices have to be simple enough that any human can easily solve them but difficult enough to fool a machine. The most familiar are the so-called CAPTCHAs, those curved letters and numerals on a cluttered background that you have to identify and type in before you are allowed in. Some sites like Machine Like Us require you to do a simple arithmetic sum.
I had thought I was fighting only machines because it would not be worthwhile to have real humans wandering the web and inserting spam. But it turns out that I was wrong. My increasing difficulty in telling whether some comments were being created by humans or machines was because, as a recent study of spam by a team of researchers at the University of San Diego showed, a lot of spam nowadays is being generated by actual humans and not machines, which would make adoption of a CAPTCHA system not as useful as I had thought. (Thanks to Kevin Drum for the link.)
The article discusses the economic basis for this development. The authors argue that in the arms race between software that generates spam and software designed to defend against these spam attacks, the defenders have pretty much won because the attackers are always playing catch up and really sophisticated automated CAPTCHA solvers are quite expensive and have to be updated so often that it discourages most spam operators from using them as not being economically viable. So CAPTCHA solving software is only used for sites that have low-level and static defenses, which are also usually not desired high-value sites.
But one consequence of this defeat is that the spam operators have turned to actual human beings to overcome the defenses. The study finds that businesses now hire people to prowl the web and insert spam. Like many things on the internet, while the market for CAPTCHA solving services has expanded, the wages of workers solving CAPTCHAs have been declining. The paper’s authors report that companies now commonly charge their customers as low as $2, or even $1, to have 1,000 successful hits. As a result, those companies now pay their workers even lower amounts, $0.75 or even $0.50 per 1,000, down from highs of $10 in 2007 As you can imagine, these jobs are being outsourced to those countries that have cheap labor (such as in Eastern Europe, Bangladesh, China, India and Vietnam) which probably explains the unusual grammatical errors of the spam I now detect. I am guessing that these workers are told to take some words from the blog post and insert into a generic comment to make it seem relevant.
All this leaves me with a minor moral dilemma. My obligation to the blog’s readers to maintain a clutter-free comment section means that I should ruthlessly weed out every comment that looks like it could be spam (human or otherwise) even if some genuine comments get thrown out in the process. On the other hand, I feel sorry for those poor bastards who are so desperate for work that they have to take dead-end jobs like this and spend their days posting pointless comments on site after site.
I decided to give my bleeding heart a vacation on this one issue and be ruthless. I figure that there are enough websites that do not care so much about preventing spam so as to provide income for these workers. Also, once the spammers get a ‘hit’ by successfully posting a bogus comment, they presumably get paid even if I come along a few hours later and delete them.
So if you find that I have deleted your genuine comment, please let me know and accept my apologies for thinking you were a spammer. And if you are puzzled by why some comments appear and then later disappear, you now know the reason.
POST SCRIPT: The Daily Show on objectionable words
|The Daily Show With Jon Stewart||Mon – Thurs 11p / 10c|