It’s time for student evaluations!

Oh, boy: our twice-a-year ritual, in which we hand out forms in our classes and let our students grade the faculty. And then, in another yearly ritual every fall, the faculty will gather and peer intensely at the numbers, presented with at least three significant digits, and we will see graphs and charts and over-interpreted analyses of these gnomic parameters.

Unfortunately, they probably aren’t as useful as administrators would like to imagine.

Michele Pellizzari, an economics professor at the University of Geneva in Switzerland, has a more serious claim: that course evaluations may in fact measure, and thus motivate, the opposite of good teaching.

His experiment took place with students at the Bocconi University Department of Economics in Milan, Italy. There, students are given a cognitive test on entry, which establishes their basic aptitude, and they are randomly assigned to professors.

The paper compared the student evaluations of a particular professor to another measure of teacher quality: how those students performed in a subsequent course. In other words, if I have Dr. Muccio in Microeconomics I, what’s my grade next year in Macroeconomics II?

Here’s what he found. The better the professors were, as measured by their students’ grades in later classes, the lower their ratings from students.

“If you make your students do well in their academic career, you get worse evaluations from your students,” Pellizzari said. Students, by and large, don’t enjoy learning from a taskmaster, even if it does them some good.

I also have some reservations about this study, though. What if the Macroeconomics II professor simply shares some biases with the Macroeconomics I professor, and is an easy grader? I wouldn’t want my teaching to be evaluated by how well students do in another professor’s course. That’s as scary as the arbitrary roller-coaster of student evaluations. I’ve had a few students openly downgrade me, for instance, because they know I’m an atheist, and they love Jesus so much.

But otherwise, yes, this jibes well with our general assumptions about the process: grade leniently, give light amounts of work, and students will tend to rate you highly. (They’ll also rate you highly if you’re inspiring and enthusiastic and entertaining, too, so it’s not all a drive to slackerdom).

If you must know, my student evaluations are fine — not the highest at my university, but not grounds for concern (oh, yeah, another thing about faculty assessment of these things: apparently, we’re all supposed to be above average, which simply doesn’t work). I generally ignore the numeric scores, which are mostly pointless noise, but the written comments are often actually informative and let me know what aspects of the course I should change next time I teach it.

Also, I had my students evaluate me on Monday, so I’m saying all this after they’ve had an opportunity to hack at me a bit.


  1. billgascoyne says

    I can’t resist a couple of quotations:

    “I have learned silence from the talkative, toleration from the intolerant, and kindness from the unkind; yet, strange, I am ungrateful to these teachers.”
    Kahlil Gibran (1883-1931), “Sand And Foam,” 1926

    “If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.”
    Antoine de Saint-Exupery, French aviator

  2. sugarfrosted says

    And when a university does understand this they get sued by one of their lecturers so he can become essentially tenured.

  3. futurechemist says

    It’s tough. Students should be able to offer feedback, especially when they’re upset at an instructor not doing their job well. Or to praise an instructor who inspires them.

    Problem is, teacher evaluations are basically customer service surveys. Are students giving their ranking because of academic quality? Or because of factors beyond the instructor’s control? Or student frustration that even if 90% of students got As in high school, they can’t all continue to get As in college?

    Even the open-ended comments need to be taken with a grain of salt. I’ve had students complain about what time the course was offered and what time the final exam was scheduled. I don’t have a dual appointment in the registrar’s office, I have no control over teaching an 8am class or having a final exam Friday of final’s week. But if those students also gave me lower numerical rankings because of scheduling, then I’m screwed because only the numerical scores are looked at by the administration.

    Also there’s sexism. I’ve heard from female colleagues that they will get evaluation comments about their appearance or attire. I’d hate to think that a female instructor was getting low ratings, potentially affecting her career, because her students don’t think she’s “pretty” enough.

    At least my institution does make statistical corrections for trends. For example, instructors will tend to get lower evaluations if their courses are: lower level, large, and required for outside majors. And higher scores for small, upper-level electives. Which kind of sucks for people teaching courses like BIO 101.

  4. Mobius says

    One of my favorite professors in grad school once told me he got a student evaluation that complained of “too many testes”.

  5. Rich Woods says

    In my dim and distant student days, the inspectors came in to assess the computing courses at my college. They sat in on several lectures and tutorials, looking serious while making copious notes.

    After a week they also handed out course evaluation forms to all the students. At the end of a plethora of boxes available for ticking and digits waiting to be circled, there was one free-form question: “What, in your opinion, makes a good lecturer?”

    Naturally we conferred amongst ourselves before answering. I never did find out how they dealt with the most common response: “Has a beard and smokes a pipe.” It was sufficient information to identify the person who was without doubt the best teacher I’ve ever been fortunate to have.

  6. numerobis says

    probably aren’t as useful as administrators would like to imagine.

    I suspect you don’t have the same evaluation criteria. You want to teach. The administration wants to be able to fire troublemakers. Evaluations help with the latter.

  7. dianne says

    Similarly, hospitals that do well on patient satisfaction evaluations have worse outcomes than those that do worse. It turns out that concentrating on making people happy rather than doing what needs to be done for them isn’t good for anyone.

  8. blank says

    Informally, we’ve found that asking students “What teaching techniques do you like best?” vs asking them, “What teaching techniques best supported your learning?” can give startlingly different results.

    If you as an instructor want specific feedback on your class, you’re more likely to get what you want by administering your own evaluations to your students. Many LMSs have a setting where students can fill out a survey or assignment and get token extra credit while you get the feedback anonymously. In my current grad seminar, we have the luxury of devoting a significant chunk of time on the last day to collecting anonymous written feedback, some of which we are explicit about and some of which is under the guise of reflecting on the course, like “Which ideas of this course will stick with you after graduation?”

  9. says

    These are essentially measures of leniency. Students like teachers who don’t make them work hard and hand out good grades. That’s basically it. Student evaluations are a major cause of grade inflation, and that’s about the only effect they have.

  10. naturalcynic says

    Outing myself, I was the “publisher” of the unofficial [i.e. student government sanctioned and proudly NOT admin. sanctioned – only tolerated] course/faculty evaluation at Berkeley back in the days when things were, uh, really interesting. [Meaning that tear gas was far more commonly encountered] We totally understood the pitfalls in giving letter grading for this kind of thing, but we did it anyway, because of the demand. What we really appreciated, as writers, was giving students the chance to do a little creative writing about the material in the course and the profs. We published the best samples of the comments as the main feature of the evaluation, along with the impressions of the writer, if he/she had taken the course or had had the prof previously in a different course. It was understood that this was mostly, but certainly not totally, serious. Example:

    Dr. K’s lectures are like a dumptruck full of fertile soil laced with quality manure. It always seems to be too much, but if you make the necessary effort to really appreciate the material, you will flourish and bloom.

    Incidentally, Dr K was my faculty adviser and I was taking the course that quarter. He knew that I included that quote in the course description – and the quote was apt.

    Now it seems to be too serious and quantitative.

  11. says

    One of the questions on our standard student evaluations is about the quality of the room. Always seemed kind of irrelevant to my performance.

    If I had my way (I don’t), I’d throw out the numerical stuff altogether and just ask a couple of general questions: What did you learn this term? What helped you learn the most? What interfered with learning? I’ve always appreciated the written comments, but the checking a box stuff…nope, mostly useless.

    But that’s what most of my peers use to assess the faculty.

  12. says

    One of the things I hated most about faculty evaluations was reporting averages. The evaluation my university used were strictly qualitative, and weren’t constructed well enough top turn them into a since scalar. My evaluations for a capstone-like course were always bi-modal, which was interesting an informative for some of the questions. Unfortunately an average doesn’t mean much, and a standard deviation for a sample size of 15 isn’t exactly useful. During reappointment and tenure we were expected to write on how the course evaluations clearly indicated that we were awesome at our jobs. I’m much happier with my industry evaluations which basically take the following form:

    1) How’s it going?

    2) Any problems?

    3) Here are the things we think you should address to make you happier and better at what you do. Do you agree?

    4) Clearly you don’t take enough vacation, when are you taking more?

  13. says

    I’m on both ends of this process as a lecturer and someone who is responsible for teaching quality across a department. I think it’s well understood that student evaluations are one form of evidence about teaching quality among many, and aren’t definitive. We tend to complement them with peer evaluations, reviews of materials and plans and a variety of other measures.

    I think that study from Bocconi University is intriguing but subject to some of the same projections and biases that we’re also seeing in some of the comments here.

    Yes, it’s possible to get good evaluations by teaching an easy and unchallenging class… although I’d argue that also convincing students that you’re engaged is also necessary, and it’s a tough ask to pull of both simultaneous. It’s also possible to get good evaluations by teaching a very tough, challenging but engaging and transformative course.

    The bottom line is, student evaluations are *too blunt an instrument to tell the difference*!

    So, I’m trying to manage both up and down: help colleagues work on the actual quality of their teaching, and help administrators use the results for what they’re actually, validly usable for and not more.

  14. wcorvi says

    When I taught at U. of Northern Iowa, there were thirty questions on the faculty evals – usually word association, like one: ‘liverality’ and strongly agree to strongly disagree. I didn’t know what that meant, so looked it up in the dictionary; it wasn’t there.
    I asked my department chair, and he didn’t know; he asked the dean, who also didn’t know. Turned out to be a typo, should have been ‘liberality’. Students had been evaluating profs on their liver for many years, and no one even questioned it! Shows how little the results mean, to students OR administrators.

  15. garnetstar says

    I must agree, your numerical evaluations reflect your gradebook. My colleagues and I, when we’re teaching the large-enrollment (300 students each) sections of the same course, have often amused ourselves by calculating our average grade, with 1 — 5 assigned to our A – F grades, to the students’ average “How good was this teacher”? score, where they rate us from 1 – 5. Close correspondence always occurs. And, since we’re doing all this over the entire course enrollment of more than 2000 students every semester, it’s rather statistically compelling.

    It’s asking too much of any human to sit down and think “I didn’t do well in this class, but the teacher was great! She was interesting and enthusiastic and worked hard to teach us. My failure must be all my fault, so I’ll give her a good rating.” That question should be left off. The one that should be looked at most is “How much did you learn in this class?” It’s not that good either, but more telling.

    I do experience the bias in ratings that female professors get: studies show that it’s about 14% lower, at least, for women professors, and worse in large classes. And, I’m afraid that the comments are rarely valuable: one of my favorites was “Professor Garnetstar is a professor at X university. She should dress like one.” And it’s downhill from there, with remarks on every aspect of my appearance and body.

    The rest of the comments, those that aren’t on my body, are one-sentence remarks like “This class was too hard.” Not useful. I’ve gotten to the point where I don’t even read the 600 comments, as there have been times where not one useful one has been written.

    The department, of course, focuses on the answer to only one question, that being “How good was this teacher?”, which, as I say, corresponds mathematically to the gradebook. And they don’t even calculate a weighted average, just a simple one. So, almost everyone ignore their ratings completely.

    The system needs to be re-thought.

  16. Okidemia, fishy on the shore term, host reach in the long run says

    PZM #12

    One of the questions on our standard student evaluations is about the quality of the room. Always seemed kind of irrelevant to my performance.

    Having a room that not too small nor too big, that’s wired, along with a video that’s actually working are nevertheless important components of being able to teach the class first… :) These are some of the challenges I encounter from times to times, but students understand we all have to deal with bad luck or room management events. It is still impacting my performance though (but I’m external to the university, so I have to make everything fit into the class time, including accounting for the delays).

    davidgeelan #14

    It’s also possible to get good evaluations by teaching a very tough, challenging but engaging and transformative course.

    I don’t know if I’m still appreciated after exams, but I like a challenging approach to evals. I always tell the students that they will have a hard time, but I always give them survival advices for exams snafus. I think I tend to grade with high benevolence afterwards though. Also, they know that colleagues and I are going to ponder altogether and the result will always be the same: bad, average and good grades.

    The real evaluation is not coming from students but from colleagues. I can’t ignore when they tell me that maybe exams are a bit too awkward and tricky. (The last time they really reacted to one of the questions, “why don’t we ask you to calculate this?” -Yeah, maybe if that’s challenging to other teachers, it may be the same to students :).

  17. Becca Stareyes says

    Here at the university I work at, faculty can request a Midterm Chat, where someone from the teaching center comes in and talks to the class about how the instructor is doing. There are several neat things about this: it happens in the middle of the class, the fact people are guiding the discussion means you get actual helpful feedback, students can get a sense of what their peers think, and one of the discussion questions is talking about what students need to do themselves to do better (and help them pick out the unessential bits, like that some of the rooms we teach in can be saunas for the first half of fall term*).

    The downside is that it takes a full class period (rather than ‘let me leave for 10 minutes and someone deliver these to the department office’), and it’s not the sort of easily-compared number. Also, it’s voluntary, but I like doing them as a check in.

    * Which is true, but there’s nothing I can do about it.

  18. zoniedude says

    Pardon my pedantry, but re: “apparently, we’re all supposed to be above average, which simply doesn’t work”

    It is quite possible mathematically for nearly all evaluations to be above average. This shows up in Martin’s Paradox where as more and more students score higher on tests the average rises and thus more and more students score below average. Only in this case your system wants more and more professors to score above average, which simply requires that you have more and more really bad evaluations for some professors which drags down the average and thus more and more professors will score above average. In fact, a few professors with abysmal evaluations could lower the average to where nearly all the other professors score above average.

  19. otrame says

    I remember fondly a student, sitting at a desk in a lab we worked in, scowling ferociously at some papers. I said, “What’s up?” and he snarled, “This son of a bitch is trying to make me think.”

    One of the best professors I ever had did a Primate Behavior course that kicked the everliving shit out of every one of us. And made us like it. We loved him. Naturally the university fired him at the end of term.