How to fool a computer grader


There has been much discussion recently about computer software that seems to be able to grade students high school essays about as well as humans, but of course much faster.

[Human] Graders working as quickly as they can — the Pearson education company expects readers to spend no more than two to three minutes per essay— might be capable of scoring 30 writing samples in an hour.

The automated reader developed by the Educational Testing Service, e-Rater, can grade 16,000 essays in 20 seconds.

Sounds good, right? Except for course for those poor people who are trying to make a few extra bucks by doing the soul-killing job of assessing the quality of a student essay in just two minutes.

But there’s a catch.

A director of writing at MIT Les Perelman says that because these robo-graders work according to an algorithm, it is not hard to find out what it values and thus beat the system. He found that if you write long essays with big words, even if they are nonsensical, you will score high. The algorithm does not like short sentences or paragraphs or sentences that begin with ‘and’ or ‘or’ nor is it enamored of sentence fragments. In other words, all the little rules that good writers will break to create a particular effect will cause your essay to be marked down.

Perelman gives an example of how you can get a high score. The most interesting feature of the algorithm is that it doesn’t care about substance or even truth. It will ignore such trivialities as saying that the war of 1812 began in 1945, provided you say it grammatically.

The substance of an argument doesn’t matter, he said, as long as it looks to the computer as if it’s nicely argued.

For a question asking students to discuss why college costs are so high, Mr. Perelman wrote that the No. 1 reason is excessive pay for greedy teaching assistants.

“The average teaching assistant makes six times as much money as college presidents,” he wrote. “In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”

E-Rater gave him a [top score of] 6. He tossed in a line from Allen Ginsberg’s “Howl,” just to see if he could get away with it.

He could.

This got me thinking that the e-Raters are perhaps reflecting the zeitgeist because as I said in an earlier post, for some time we have been living a fact-free era.

Comments

  1. Emu Sam says

    So, I would send an essay (or more likely, a stack of two hundred) through the program, and then I could read the essays, not proofread. Additionally, I can comment on points where I disagree with the program and adjust the final grade to match. That would save two hundred minutes per assignment, or several months out of my life if I were a professor or otherwise had to grade essays. Chances are that many teachers would use the extra minute to put more effort into commenting on the essays. Five minutes per essay is a very human decision.

    The risk is using the software to the exclusion of reading for content. I see enormous progress being made in the area of computers understanding humans, but each fantastic step we take is one of zillions we need before a computer can do the job.

  2. says

    I’m not sure about ETSs grader, but the Pearson version is a little more astute than that. Of course, it bases its scores on a large sample of human scorers.

    So, if human scorers mark off for incorrect facts, then so will the system.

    Here some more info on the Pearson version. http://kt.pearsonassessments.com/learnMore.php#faq

    Full disclosure, I work for Pearson and am on a project that will be using KT’s engine.

  3. jamessweet says

    Woah. So, there is a research endeavor in the company that I work for which relates to automated grading (can’t say more about it than that; it’s research!) so I was all set to come in here and defend it… but the stuff I am talking about is still stuff with objective answers. I don’t see how you could even pretend to be able to accurately assess a freeform piece of writing! AI is not even close to that!

  4. jamessweet says

    OMG thank you so much for linking me to that. It’s incredible!

    (FWIW, I am somewhat of a 5-paragraph format apologist, though I think it is kept far too rigid for far too long into students’ education. And really, it ought to be the N+2-paragraph format: An introduction, N arguments, and a conclusion. This is a sensible way to lay out a very short essay, and while N=3 is a good number, there is nothing wrong with N=2 or N=4 or N=5, and you still accomplish the same benefits of forcing students into an organized structure)

  5. jamessweet says

    In fairness to the companies involved, I do somewhat agree that a student who is sophisticated enough to employ the “beat the system” strategy here is probably sophisticated enough to write an honest essay that will score just as well. Furthermore, although I totally agree that the sorts of things the algorithm looks for encourage a certain type of formulaic writing rather than actually encouraging good writing, it’s not really any different from having the essays be graded by overworked TA’s: In either case, your best bet is to stick to the formula, write something overwhelmingly expected, and avoid creativity at all costs.

    With the exception that it is incapable of detecting when an essay is composed of semantic mumbo jumbo, it does seem like this thing more or less grades like an overworked TA on a short deadline: Mark off points for sentence fragments and easily-spotted bad grammar, award points for length and vocabulary showmanship, and reward essays which adhere to simple well-known formats.

  6. Mano Singham says

    Thanks for that link. It was great. It takes real skill to write nonsense and Perelman deserved a top grade, don’t you think?

  7. mnb0 says

    It’s the usual problem, isn’t it? Is the computer serving man or is man serving the computer? I can imagine e-Rater being a handy tool, but it doesn’t release teachers from actually reading those papers.

  8. ischemgeek says

    Regarding the “greedy teaching assistant” line: Speaking as a teaching assistant at a university, many of us are grad students who only make ridiculous wages an hour on paper, and in fact work far more than we’re paid for and have our tuition taken out of our wages so at least 1/3 of it is clawed in tuition and fees back by our employers.

  9. iknklast says

    Have you seen Alan Sokol’s hoax essay? Same idea, different media.

    I see the same problem with grammar check on my computer; I ignore it, because I am a trained editor, but I’m amused at how it will pick out a fraction of a sentence, and mark it a fragment. Or it won’t understand different usages of a word.

    When will a computer be mistaken for a human? When it understands how to break the rules properly.

  10. F says

    Fantastic! That’s even more stupid than farming out essay grading to companies whose employees spend two or three minutes grading an essay.

    When you measure “good enough” against the worst-case effort, this is what you get in IT. And it has been normalized to this standard for a long time.

  11. Erin Garlock says

    I imagine most “classic literature” (definition open to anyone’s personal opinion of course) would fail automatically an evaluation by automated tools that exist today. They are especially angry when you try writing with a more Shakespearean flair.

  12. says

    Grading a paper by using a algorithm is blatantly unfair to the writer. I could see possibly using that method as a tool but creative writing does not follow a template.For how much I’m paying for my daughters college I would expect a real person to read the paper. Thank you.

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *