Help!


I’m dying here, people. It’s spam, spam, spam, spam, spam, spam, spam, spam, spam, spam, spam, spam, spam, spammity, spam, spam, spam. I get up every morning and get to spend a half hour cleaning up the crap that accumulates every night, and have to invest more time at intervals during the day purging it. On top of that, as many of you know, the spam filters we have here are garbage. That’s a little unfair — I’m sure they’re keeping out 99% or more of the spam — but it’s the perception we all have. Certain commenters are routinely singled out for exclusion by the filters for mysterious reasons; some combination of adjacent letters in their username seems to trip the spam filters, maybe, except that when I examine the filter results I don’t see any indication of that happening. Other times spammers with comments full of random links get through; legitimate commenters with no links, no profanity, no commercial pitches get held up. It looks entirely arbitrary, and it’s driving me nuts.

I blame Movable Type. Of all the blogware I’ve worked with, it’s approach to handling spam is primitive and inept — it’s almost as if they’ve made a deal with the spammers to keep their prevention leaky and ineffective…or it may just be that the popularity of MT means the spammers work harder to punch holes in it.

Anyway, I appeal to the tech experts out there: is there a good, solid, comprehensive set of tools for MT that actually work at keeping spam at bay? Are there any plug-ins that can improve comment handling in general?

If this keeps up, I may have to switch back to using TypeKey registration to comment, which was an annoying nuisance and caused a lot of new problems to crop up last time I did it (and also makes me wonder if the incompetence of MT’s spam handling is an intentional ploy to drive people to TypeKey). But it may be necessary, since I really don’t have that much free time to play janitor on the comments.

Comments

  1. abeja says

    My comments always get held up for moderation, so I rarely make a comment. It’s frustrating. Any change to the system is a welcome change to me.

  2. jfatz says

    Unfortunately I have no personal experience with any of that, so can offer no more suggestions than Google would. But I DO have the sudden urge to tell people how to increase the size of their wing-wang, refinance their home, and something else to do with their wing-wang.

  3. idlemind says

    Email spam has gotten a lot worse, too. Last Spring I was getting around 600/day (with SpamAssassin nailing 98% of it and Apple Mail junking half of what got through). I’m now getting over 1300/day; most of that increase happened since mid-September.

    I don’t know how you feel about it, but I believe there is a CAPTCHA module for MT. I don’t like CAPTCHA for accessibility reasons, but perhaps combining it with TypeKey as an alternative would be an acceptable compromise.

  4. Carlie says

    That’s got to be frustrating, and this blog has grown to a size I’m sure you never quite imagined dealing with. If you can’t find a suitable tech fix, have you thought about load distribution? If you have a few very close trusted friends who are also tech geeks with some free time, you could add on a few moderators to take turns policing the place.

  5. says

    It’s a lower-tech solution, but you might consider an additional barrier to commenting like on The Carpetbagger Report. You have to answer a very simple security question to post a comment, which is presumably something that automated spam systems can’t handle.

  6. PeteK says

    Aren’t PZ’s posts themselves sometimes full of links and profanity? Maybe a good compromise would be using TypeKey on some posts but not others, when a certain proviso is reached, when number of comments reaches say 150, which is quite often on certain blogs…But TypeKey doesnt work that way does it?

  7. says

    I have been very impressed with Akismet, which is closely associated with WordPress. However, I see that there is an MT plugin.

    MT-Akismet requires Movable Type 3.2 or higher. It requires a few external modules from CPAN that have been packaged along with the plugin. These modules are Class::Accessor or and MT::Akismet. This plugin also uses libwww-perl (LWP) library which ships with MT.

    I’ve not tested it, so I can’t endorse it. But someone with an MT test site could probably tell more.

  8. Benjamin Franz says

    MT4 has built-in CAPTCHA support. In my experience, it is very effective in stopping the spammers. Sb uses MT Enterprise, so I suspect if your techies talk to SixApart about it they will have a path to using CAPTCHAs.

  9. says

    I say we toss a virgin into a volcano –which, in Morris, MN probably means throwing a ten year old child into a hot tub.

    But, it’s the thought that counts.

    Actually, as I remember it, it was Movable Type that killed Scalzi’s site (Whatever) back in September, I think. But, I never had any trouble commenting there, and I know he gets absolutely hammered with spam, so you might want to ask him to see if he’s got any tech-based pointers.

    Inevitably, though, he wound up switching to WordPress, I believe.

    Hope that helps.

  10. raven says

    Geesh!!! This is so obvious. What is a Mad Scientist without a hunchbacked assistant named “Igor” Just look up Hunchbacked Assistants or Igors in the yellow pages and contract one.

    Hmmm, well this is 2008. All the Igors are named Sanjay or Guyatri and work out of Mumbai or Kolkata. Maybe Scienceblogs will offshore some human spam sorters from some (ahem) developing nation.

    And there are always students in college towns looking for simple flexible high low paying part time jobs.

    Other than that, not an expert on spam filters. The previous suggestion of a security question seems feasible. When I first encountered those, they were a not insignificant barrier but after a while they become routine.

    I used to post under another name that could be linked to an email address. Some troll took offense and launched a spam attack. Signed me up for all sorts of garbage products. I ended up abandoning that email address and only use a throwaway for public participation on the internet. Is PZ sure that he isn’t being targeted by Xian terrorists by a spam-denial of service type attack? It is something they would do.

  11. says

    Our place (Eastern Ky Univ) uses CanIt by Roaring Penguin; VERY little spam gets through, and very, very rarely does it trap something I want.

    But then I don’t get the attention that your URL does . . .

  12. says

    I’m using drupal with a very simple math captcha, and I’m wide open for anonymous commenting. The captcha seems to do a very good job at filtering out spam bots. It helps a lot to use an obscure system since bot authors have little incentive to target them.

  13. says

    I find that a ‘close comment’ plug-in is essential. MT has one. It closes comments after 10, 20, or however many days you like. I admit that this IS a trade-off, but even a high-traffic blog like yours will get most comments on recent posts. The trade-off is well worth it.

    In this way, you absolutely close over 95% of your posts to spam.

    Then, use some other spam filter of your choice (like other WP users, I like Akismet) to filter out the spam that might affect your recent posts.

    Such a two-part approach is very effective. Periodically, as software changes, the specific tools need to be modified, but the concept of: 1) close them off, and 2) use some current filter, essentially solves the spam problem.

    Good luck.

  14. says

    If you can, try something simple which works for ~99% of automated spam bots, see http://www.brianblog.com/archives/000282.html for someone doing it with movable type… I don’t know if there’s some plugin specifically for doing this, but i’d guess there is (Maybe you can ask the guy on that blog how he does it, if you want to sacrifice some fields?).

    Basically, one renames the actual “name”, “email” and “URL” fields to something else, and creates fields with names like “jadjhsjdhaskdbkjas” to be used as the actual name, url ane email fields. Then, the “fake” fields are hidden via the CSS style sheet so normal users do not see them. When a spambot fills those fake fields, the comment is rejected, and the IP is banned if one wants to.

    This is rather unobtrusive (Unless a user uses a really old browser without CSS or has CSS turned off, he won’t get to see any change), and stops pretty much all random spambots.

  15. says

    When was the last time you go spam from the Panda’s Thumb?

    Tell your web monkeys that you want TCode and CCode installed and confingured. It’s a javascript based captcha, that doesn’t require any interaction from the user.

  16. says

    or it may just be that the popularity of MT means the spammers work harder to punch holes in it.

    WordPress is more popular than Moveable Type, but in the years I’ve had my blog, only one or two spam comments have made it past Akismet (and I’ve never had any false positives, either).
    I second RinzeWind’s suggestion to get the MT Akismet plugin, if you don’t want to (or can’t) switch to WordPress entirely.

  17. Johannes Eldblom says

    If you end up using a captcha, I’d recommend reCAptcha – http://recaptcha.net/

    The nuisance more easily justified when you’re doing something useful (in this case, digitizing books) :)

  18. Chris says

    What you need to do is get everyone to fill in a simple field for every post they make. An additional field such as “Please type in what day of the week it is, followed by a space, and then the name of an invisible sky wizard.” Not too much to ask people to do, but very tricky for bots that haven’t been specifically pre-programmed for this site.

  19. bill r says

    Geez, your a professor P.Z. use the time honored solution:
    Graduate students. They are disposable and easily replaced. If you mess with their dissertation, they stay around forever. Failing that, undergraduate students. Dumb but cute and enthusiastic.

  20. ice weasel says

    WordPress-Akismet, without hesitation.

    That said, you’ll still spend time on a high volume blog erasing crap. Just nowhere near as much time.

    MT sucks.

  21. MIke says

    The solution to spam is the same as its been since 1993 when I discovered I could no longer keep my email discussion list open to the public: outlaw all non-solicited mass broadcast posts, stop them at the servers, and block all IPs that don’t cooperate. Its mindboggling that the whole of humankind accepts advertiser’s arguments like sheep.

  22. Fer says

    Another vote for akismet

    The commenter is not annoyed with captchas or whatever is said, and the success is almost 100%

  23. The Dude says

    First off, remove your email from the page… sorry, but that email is trashed. You gotta get a new one.

  24. Texas Reader says

    I don’t get it. I have AT&T DSL and maybe two or three times a week something gets by the spam filter. I don’t have my security set at the highest level either.

    You might want to try Security Wonks. Don’t know the exact website but they specialize in presenting free software. They are a mirror site for Spybot Destroyer.

    Anyway, Happy New Years to You from sunny North Texas where its 40 degrees right now.

  25. Graculus says

    outlaw all non-solicited mass broadcast posts, stop them at the servers, and block all IPs that don’t cooperate.

    What’s wrong with heads on pikes?

    The only real technological solution is on the server end, not the client end. Vipul’s Razor and a few other blocklists (SPEWS, Spamhaus, RBL), etc.

    The real solution is to make spamming illegal, period. Any ISP that provides a safe haven gets charged, too.

  26. says

    @Graculus,

    Perhaps we could try catching the spammers and subjecting them to a little light waterboarding?

  27. says

    Another vote for reCAPTCHA. I just read about it and it sound amazing. If 60m CAPTCHA are solved a day then even a fraction of them could improve the speed and accuracy of the digitization of books.

  28. Steve Smith says

    I’m not clear whether the problem is with the blog comments or your email address. What blocks an awful lot of spam for us is Mail Armory:
    https://www.mailarmory.com/
    Last month it blocked 1151 spam messages out of 1457 total. Of those, a handful were spam.

    Steve

  29. terrence says

    Coding Horror (http://www.codinghorror.com/blog/archives/001001.html) has an interesting CAPTCHA technique: they always show the same image. Apparently, just keeping out the spam that doesn’t deal with CAPTCHA tech is enough for them.

    Re: “block all IPs that don’t cooperate.”

    This, sadly, includes 70% of all home computers because the bulk of spam is sent by virus infections. Besides, ISPs already do block IP’s that are sending spam – my Grandmother knows it’s time to run the virus scanner when her internet starts dropping out sporadically. The problem is really that the flood of new malicious software is increasing faster than it can be swatted down.

  30. says

    I don’t know of any plugins that do this, but depending on how much direct control you have over the comment-submission code, PZ, I find it very easy to eliminate comment spam altogether. My site has three steps:

    1. Use the approach halcy mentioned. My site has a fake comment form, hidden by CSS, above the real form in the code. Legitimate users never see it, but spambots do. Any comment that gets submitted to the fake form instead of the real form gets thrown away.

    2. My real comment form has a very simple captcha: a checkbox that’s checked by default, which says “Uncheck this if you’re not a spammer”. Any comment submission that doesn’t uncheck this box is thrown away.

    3. Since some spambots automatically toggle any checkbox they see, my comment form also has another checkbox, also checked by default, that’s likewise hidden by CSS. Any comment that *does* uncheck this box is also thrown away.

    These three simple measures, I can say without exaggeration, handle 100% of comment spam. Trackback spam is a different measure; I just turn trackbacks off altogether. Akismet is also good for that.

  31. kevin says

    I gave up blogging a couple of years ago after too many times getting overwhelmed by this same problem. I ended up spending more time dealing with the back end issues involved in keeping and growing a blog than in actually writing content.

  32. says

    I second the motion for the TCode & CCode plugins. I run several MT sites and no spam gets through at all. The biggest drawback is that they require the user to have Javascript turned on.

  33. says

    A couple of months ago, I noticed that all the comment spam I was getting on my (MT) blog was only posted to half a dozen entries, so I just closed comments on those entries. It’s dropped off dramatically since then, and when I do get spam, it’s always to one entry, so I close comments on it. I don’t expect this approach to work forever, but it’s okay for now.

  34. says

    This is a problem I’ve worked on a lot, unfortunately the product based on it won’t be ready for a few months.

    I’m no fan of captchas, or complex registrations.

    The kind of system I’d advocate (assuming you don’t want a full-blown karma system, which is really my specialty) would basically allow anyone who has a long history of posting to be automatically “deputized” as a spam-flagger. So instead of you (PZ) having to do all the work, dozens or even hundreds of readers share the burden. The messages they flag as spam will be saved for you to review if needed, especially if you have any concerns that someone may be abusing the priviledge and deleting messages that aren’t spam.

    Once a system like this in in place for a while, spammers won’t bother trying to post to your site because their messages don’t last long enough to have any success. (well, as long as you make it *somewhat* difficult for them to post spam by forcing them to change their software regularly to get the messages through…this isn’t hard to do)

    There are other things you (well, probably not “you” per se but a software developer) can do to minimize the chance of any spammer getting their message seen….for instance you can have messages that are posted by people with no history at the site show up initially as simply a single line, which readers have to click on to expand the message (in place) so they can read it. Once that person has a history of posting (and not getting flagged as a spammer), their posts appear in full.

    I would be very interested in helping out with this issue if you’d like to discuss it offline.

  35. says

    One of my recent false positives was the author of a piece I was reviewing, with a long thoughtful post, defending his article, I contemplated leaving it, because its just not that good.

  36. says

    One site I frequent has taken to using a service called Gravatar http://site.gravatar.com/

    As the registration takes place over there, you shouldn’t have to worry too much about people not getting through. I did it once, and now any site that uses it, I can sign on and even get to show my little icon-thingy.

    I’m not hawking it or know really how good it is, just a suggestion for one solution I’ve seen.

  37. says

    Movable Type’s spam filtering is what you make of it so I don’t think its fair to call it primative and inept. (How it’s deployed on your system — that is another story.) MT spam protection was designed to plugin/setup any combination of schemes and systems you want.

    Akismet is quite good. (I wrote the MT plugin.) CAPTCHAs can help though I am not a fan of them. The only 100% fool proof (so far) system for stopping comment spam is authentication. I have a preference for OpenID. It works because it’s hard to automate the sign-up process and it’s easily shut down once a spammer is identified.

  38. Kagehi says

    Problem is, some people are working on things like the Cyc AI and genetic algorithms to track 10,000 fracking aircraft at one time, but most bloody spam filters seem to all operate on the level of intelligence known as Eliza. lol Some day I hope they can *really* come up with a trainable system, that doesn’t just remember words and a few vague associations, but has at least a primitive understanding of what they *mean*, and can at least guess whether what it says has any damn thing to do with what you want to read (or more to the point, whether it logically and linguistically smells like an advertisement for porn, or viagra, whether they glue a few sentences from a science text book into it or not). Those are fun BTW, if you post some place like this. “blah blah blah… DNA replication … blah blah blah vvvvvv.watchmehumpdogs.foo”, or some stupid BS. Mind you, that is an example of the kind of stuff I *have* seen in at least one email, not one I actually got.

    The filters used right now are the email equivalent of the thousand monkeys on typewriters, one of which accidentally, a la The Simpsons, wrote, “It was the best of times, it was the blurst of times…” They might work, but its hard sometimes to tell if its intentional, or pure accident.

  39. says

    Kagehi, one problem with what you describe is that the further they go with that, the more the spambots adapt to it, with their own AI. And when that happens, it gets hard for even a human to tell if a comment is spam…you start reading it, and it seems on topic if not particularly insightful, and not till you get most of the way through do you realize you are reading something written by a spambot.

  40. says

    Take the good advice about better automated filtering. And in the meantime, hire a student–even one of your own kids if they want the money.

  41. jv says

    It’s unlikely any spammers would specifically target you, so simply doing something slightly different than the MT standard will likely be enough.

    For example, add one more input field, with the text “Type ‘pharyngula’ in the box below.” and throw out any comments without that response. Or a “honeypot” field hidden with css — throw out anything that puts a value there.

    I imagine there are a variety of plugins to do this kind of thing, but even though I passionately despise Movable Type, I know it well enough to write a quick bit of code for you, if you’d like.

  42. MrKAT says

    Suggestion to PZ:

    You ask everybody to put personal letters
    “aapPMB” in the beginning of subjetcline/headline of every EMAIL/comment, otherwize it is considered as a spam. This helps a lot of selecting proper messages..

    “aapPMB”=about associated prof.. PZ MYERS’ Blog…

    “[aapPMB]” would be even more efficient to human eyes.

  43. MrKAT says

    Correction: “associate” not “associated”

    If You get tech help and You can tailor parameters of filter, then you could ask select put a lot of weight on “aapPMB” or something like that..

  44. Voting Present says

    Captchas are no problem, but please don’t use registrations. At some point in the distant past I drew a line against signing up for any kind of registration. Maybe it was generalized privacy, maybe it was just not wanting to remember passwords. So I just go where I don’t have to register.

    Requiring registration gets rid of Voting Present. Of course that will please plenty of people …

  45. says

    I’ve had great luck using the aforementioned Akismet plugin from MT. Using dropped me from hundreds of spam comments a week to maybe 10 a month.

  46. tonyk says

    I read an article is SysAdmin magazine a while ago (unfortunately not available online) which indicated that the hidden field method was highly effective at reducing spam (100% effectiveness for the author) whilst not being a barrier to any legitimate users (including those using screen readers or other accessibility tools).

    If you do go with CAPTCHA, apparently Google’s is the best – it is the most readable by humans, and yet there are no automated tools that can read them, unlike most other CAPTCHA image generators.

    I would say the best solution would be to upgrade the filters (Akismet sounds to be the choice here), and also implement the hidden fields.

  47. T L Holaday says

    A solution I applaud is SomethingAwful-rules paid registration. Posting privileges cost $10.00 and can be withdrawn by you at any time for any reason without refund.

  48. Maria James says

    H,
    It ws vry nc rtcl! Jst wnt t sy thnk y fr th nfrmtn y hv shrd. Jst cntn wrtng ths knd f pst. Thnks.

    < hrf="http://www.kcdm.c.k"> Assgnmnt Wrtng

  49. David Thompson says

    H,
    Nc pst! Y hv wrkd hrd n jttng dwn th ssntl nfrmtn. Kp shrng th gd wrk n ftr t.

    < hrf="http://www.cstmcrswrk.c.k"> Crswrk

  50. Jack Smith says

    H,
    Ths s rlly nc pst, y shr gd pc f nfrmtn.
    < hrf="http://www.dssrttnprvdr.c.k">By Dssrttn

  51. Rorschach says

    Did you turn off registration PZ?

    Those spammers all seem to have MT accounts.At least they had to put some effort into it…:-)

    PZ said in the post:

    Anyway, I appeal to the tech experts out there: is there a good, solid, comprehensive set of tools for MT that actually work at keeping spam at bay? Are there any plug-ins that can improve comment handling in general?

    Note the date of the post, January 1, 2008.

  52. Jack Smith says

    H,
    Ths s rlly nc pst, y shr gd pc f nfrmtn.

    < hrf="http://www.dssrttnprvdr.c.k">By Dssrttn

  53. Gyeong Hwa Pak, the Pikachu of Anthropology says

    Those spammers all seem to have MT accounts.At least they had to put some effort into it…:-)

    Well, they are all using an account and it’s likely that they are from one IP address, wouldn’t it be easier to delete and block them. I thought that’s how he stopped Mabus.

  54. Mark peter says

    H,
    Ths s nsprng; I m vry plsd by ths pst. Nc nf t ths pst thnks!!! I rlly lk t

    < hrf="http://rtcls.stndrdssys.cm/Artcls/39/Cstm_ssys.spx"> Cstm Essys