Most of us are familiar with the CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) test that one sometimes has to complete to prove to some website that you are a human and not a bot seeking to impersonate one. Naturally this created the usual arms race between CAPTCHA and CAPTCHA-solving technology resulting in some of tests becoming quite tricky with the lettering highly distorted and with a cluttered background, presumably to fool computers. I have often failed it and had to redo the test.
Then recently I have come across tests that said that all I had to do was check a box that was next to the text “I’m not a robot”. Although I was pleased by the simplicity and ease, I wondered how such a simple test could distinguish between humans and bots. This video explains what is going on behind the scenes. Pretty interesting!
John Morales says
Grr. I tried to watch the video, but the <strong expletive> background “music” irritated me so very much I gave up. Can’t mute it without losing the content.
Bah.
grasshopper says
Every time you enter a captcha you are most likely helping to decipher a hard-to-read word from an old book google is trying to digitize, through a program called recaptcha. If enough people decipher a blurry captcha in a particular way, recaptcha up-votes the captcha to a point where the original blurry word can be reliably entered into the digitized text. An article from the New York times explains it.
grasshopper says
I read a bit further elsewhere. Captchas and recaptchas are different. No matter. Recaptchas do help to decipher blurry words. The New York times article linked to above gets it right.
RationalismRules says
@John Morales Skip to 3:50. It’s the most interesting part of the vid, and there’s no background music!
Nomad says
A lot of the story this video presents is a wee bit distorted. For starters, by the time Recapctha appeared on the scene it was already known that captcha was failing it’s purpose. It was known that computer programs were better at solving them then people, but as there was no better option available they were still used, along with a few other attempts. Forum software in particular tried to do different things, like asking the user simple questions like basic math problems for even asking what the name of the product that the forum exists to support is. I’ve seen web developers discussing the problem, they knew that no method worked so they would try to stack multiple methods up to at least offer some protection. This was not some secret that Google had to discover, they had to have known it by the time they purchased the recaptcha system.
Now on to Google’s new click a box system. Video guy says that if you answer one of their mini-games correctly you’ll “probably” never see one again because the system remembers you. Nope. A friend of mine, when I mentioned this video, groaned and says he hates them and gets them all the time. As to all the super secret data that their “risk analysis engine” is using to judge your humanity, I’m in significant doubt about a lot of that. My experience with the system suggests that it mostly works by counting how often you click through one of those boxes. If you do it too often in too short a time it makes you play a mini-game. This brings into question why they’re harvesting all that other stuff. Well let’s face it, that’s a rhetorical question, I think we all know why they’re doing it. This is Google we’re talking about, hoovering up your information is what they do.
Next problem: About those mini-games. The thinking behind them is wacked out. Google says “oh, look at that, our machine vision algorithms can solve the problems better then people”, and as a solution then come up with a mini-game that their same computer algorithms make, and ask people to solve. Their computers pick a bunch of animal heads from pictures, decide which are dogs, and present them to people. Now it’s true, that kind of image analysis is a lot harder to do and is probably less available to bot users. But the fact remains that they substituted one problem that’s easier for computers to solve then people for another. And as these things develop it’s only going to get easier for them.
And the games get a lot harder then “click the doggies”. I’ve gotten one that was a bunch of low resolution pictures of storefronts harvested from streetview pictures. My task was something vague, I don’t remember exactly what it was but I was supposed to select something like convenience stores. Except these were stores from all over the world, with designs from all sorts of cultures and with signs in many different languages. How the heck was I supposed to understand which store was what kind? This was information that was trivial for Google’s computers. For me, they might as well have been asking me to read a captcha written in an alien language. I had to guess. It was as bad as the worst, most distorted captcha out there. I think I got it right, but it felt like it was as marginal a situation as those 33% captchas.
Recaptcha is no panacea. From my friend’s experience I can tell you that their risk analysis engine can get it badly wrong very easily. That’s if it’s even really using all the data it’s harvesting at all, which I’m in doubt of. I’m guessing my friend was just doing a lot of forum searching or something and was clicking on the box too much for recaptcha’s liking. And the second stage of it can get as bad as the worst, most distorted captchas out there, and the problem is the entire system is designed to be machine readable from the start but is being offered as a way to defeat machine reading.
John Morales says
RationalismRules, thanks.
Dunc says
Will nobody think of the cyborg supersoldiers?
Carl Fink says
Another good one.