I must be a lousy teacher

Because my student evaluation of teaching scores are pretty good. Not the best, but OK. And SETs are terrible ways to assess teaching.

These kinds of evaluations are ubiquitous in the US university system, and they kind of drive me crazy: we’re expected to report the details of these numerical scores in our annual reports, I’ve been in meetings where we drone on about the statistics of these things, and of course everyone is expected to get above average scores on them. Personally, I find them totally useless, have no idea how to get a number 5 to a number 6, and basically ignore (except when making my yearly bureaucratic obeisance) the trivial 5 question numerical, so-called “quantitative” part of the student evaluations. What is far more useful are the short comments students get to make on the form: that actually tells me what parts of the class some students disliked, and what parts they found memorable and useful.

I’m not alone. Others find them useless, for good reasons.

There is one important difference between customer evaluations of commercial and educational service providers. Whereas with commercial providers ratings are unilateral, ratings are mutual in the education system. As well as students evaluating their teachers, instructors evaluate their students – such as by their exam performance. In US studies, these ratings have been found to be positively correlated: students who receive better grades also give more positive evaluations of their instructors. Furthermore, courses whose students earn higher grade point averages also receive more positive average ratings.

Proponents of SETs interpret these correlations as an indication of the validity of these evaluations as a measure of teacher effectiveness: students, they argue, learn more in courses that are taught well – therefore, they receive better grades. But critics argue that SETs assess students’ enjoyment of a course, which does not necessarily reflect the quality of teaching or their acquisition of knowledge. Many students would like to get good grades without having to invest too much time (because that would conflict with their social life or their ability to hold down part-time jobs). Therefore, instructors who require their students to attend classes and do a lot of demanding coursework are at risk of receiving poor ratings. And since poor teaching ratings could have damaging effects at their next salary review, instructors might decide to lower their course requirements and grade leniently. Thus, paradoxically, they become less effective teachers in order to achieve better teaching ratings.

The article goes on to show that by several criteria, what student evaluations actually assess is the easiness of a course, and how little the students are challenged by the material.

There’s more to it than that, of course. My campus has a lot of faculty who have won teaching awards, and we have a reputation for being demanding and resisting the trend towards grade inflation, and I know many of them are getting their high SET scores by being engaging and enthusiastic and making students think. Those are important aspects of teaching. But we ought to also be measuring faculty effectiveness at teaching the material, and those little forms don’t do it.

Because student ratings appear to reflect their enjoyment of a course and because teacher strategies that result in knowledge acquisition (such as requiring demanding homework and regular course attendance) decrease students’ course enjoyment, SETs are at best a biased measure of teacher effectiveness. Adopting them as one of the central planks of an exercise purporting to assess teaching excellence and dictating universities’ ability to raise tuition fees seems misguided at best.

Now throw in the fact that SETs are systematically biased against women faculty and that students tend to downgrade minority faculty (they are reflecting cultural biases all too well), and you’ve got a whole grand tower of required makework that doesn’t do the job, and also reinforces trends that we all say we oppose.


  1. shakeb says

    Only time I remember ever having a strong opinion filling out those reviews was on an otherwise good TA whose accent was thick enough he was hard to understand, and the survey asked a directly applicable question on it.

  2. says

    My current defense against bad SETs: before student evaluations are made, I try to make it explicitly clear to each of them exactly what grade they currently have in the course, and how much the final (which they haven’t taken yet) might make a difference. What I’ve often found is that students who are struggling in the course typically have an overly pessimistic view of their performance. I’ve lost track of the number of times I’ve told a student their standing in the course, and they say “What? I’m passing?” There are many students who find uncertainty stressful, and even knowing that they’ve got a C is a relief.

    And then there are the students who are pissed off that they’re only getting an A-…but they’re much fewer in number, at least.

  3. says

    It’s a measure of how easy the course is and how liberally you grade. That’s it. This is what has driven grade inflation. University students just think that they (or daddy) is paying for a degree, and they expect to get one for their money. If you’re making it difficult, that’s unfair because they paid for it.

  4. microraptor says

    I absolutely detested filling these things out in college. Heck, sometimes I didn’t even bother.

  5. naturalcynic says

    What is far more useful are the short comments students get to make on the form:

    Absolutely. As the writer, editor and publisher of the SLATE Supplement to the General Catalog – the wholly student run course evaluation at Berkeley in the late 60s-early 70’s, the comments were almost always more informative than any letter grade that students cold give to the teacher. And that was the bulk of the write-up for courses and instructors. And it was a lot more fun to read them and compile them into the form of our reviews. It was always my impression that what were obviously thoughtful and often humorous were more of a help than simply some quantitative [we used letter grades] scale with no comments. Comments seemed to be from someone who was more engaged.

    …and of course everyone is expected to get above average scores on them.

    That’s what you would expect at the University of Lake Wobegon.

    And SETs are terrible ways to assess teaching.

    And what is?

  6. Bill Buckner says

    They should be greatly simplified. Instead, over the years, they have become increasingly complex. Big bucks are spent to analyze and normalize. Futile attempts are made to adjust them based on the course goals, which leads to gaming the system.

    They should ask:
    1) Was the instructor prepared?
    2) Did the instructor display mastery of the material?
    3) Did the instructor grade promptly?
    4) Was the instructor readily available during posted office hours?
    5) Were the grading policies clearly explained on the syllabus?

    That’s enough. Plus a section for written comments.

  7. jeffj says

    Part of my job is dealing with SETs at a university, and I have personally confirmed the results of the gender bias study. Female instructors’ scores are a tenth of a SD lower, which is about the same penalty required courses get compared to electives. Upper year courses get waaaaay better scores than introductory courses.

    SETs are worse than useless, and I say so to everyone with a bit of influence on the process. So many hours are wasted in committee meetings as faculty debate things like the relative merit of the same question worded in slightly different ways. Meanwhile we get paper forms where the student has filled in the bubbles to spell out an obscene word, or (even worse, as the scanner chokes on responses with more than one answer per question) a zig-zag pattern.

  8. says

    I find my SETs very useful. Not so much to compare me to other people, but primarily to compare me to me. The first year I taught in our elementary biology sequence (1968, if you must know) I was delighted to see that I had scored in the upper 5% of teachers. Then I read more closely and realized that I had misinterpreted the ratings — I was actually in the lower 5%. They hated me. So I worked hard on it, and by the time I had finished teaching in that course, 10 years later, I had gradually risen in the ratings until I was in the upper 25%. I am sure that this helped the students get something out of my teaching.

    We also rate teachers in 18 categories. Some are meaningless to me, but many are useful. If I look at which categories I do best in, and which I do worst in, they remain depressingly the same from course to course and from year to year. Which motivates me to try harder to do better in those particular areas.

  9. Bill Buckner says


    Ha! Somehow your story reminds me of the first time I substituted for another professor. After the class several students came to me and said “I wish you were teaching this course.” It was a little embarrassing, but made me feel good. Until I found out that when someone substitutes for me, they get the same comments.

    Students: the university would be perfect if they’d just go away!

  10. lesherb says

    Plus Trump uses the glowing praise Trump University (sic) received (in post course surveys) to counter the lawsuit presently in the works against his nonsensical money-grabbing venture. Pehaps bringing that to the attention of those in charge might aid in ending them in yourschool, PZ?

  11. garnetstar says

    I’m afraid that it is the case that, statistically speaking, your numerical teaching ratings reflect your grade book (except for women or minority teachers, who receive lower ratings than their gradebooks), and I’ve got the data to prove it.

    I teach two large sections, ca. 600 students total, every semester. There are about 6 – 7 sections total of this class offered every semester, about 2000 students every semester. It has amused my colleagues and I to plot our gradebooks vs. the numerical ratings we get, and note the close correspondence. Every semester.

    And, it’s been some years we’ve done it, at 2000 students/semester, that’s a lot of data. It’s a pretty good correlation. (The university, however, is not yet persuaded.)

    So, one can work to improve one’s teaching, we all do, and it’s not at all necessarily reflected in our ratings. But, there is one certain way to get higher ratings, and that’s to make the customers happy. So, many professors do that.

    Last fall, I had a record number of A’s (not A-, A’s): 22% of the grades were 94% or above. That was by far the most-common grade. Then came the three-week winter break, and I taught the same class again this spring. Same lectures, same procedures and policies,, same grade scale, same old me: there wasn’t time to change much over three weeks. And, this spring, the most-common grade was an F, 18% were <60%.

    The reason is that many of the spring students are those who very deservedly failed the class in the fall (not necessarily from my sections), and those who failed it because of laziness, entitlement, or lack of responsibility, did not acquire those traits over the three weeks.

    Guess how much my ratings changed from my being really excellent professor in the fall, to a really bad one in the spring!

  12. gijoel says

    What I hate is the comments you make are completely ignored, so when you get the same terrible lecturer they make the same, awful droning mistakes as they did before.

  13. chris61 says

    I wonder if student evaluations are more accurate for courses in fields like medicine and engineering where students are motivated to learn (and not just get good grades) because they’ll have to take licensing exams.

  14. blbt5 says

    I found student evaluations to be quite useful for two reasons: 1. It gave them a chance to have their say, which is essential to any healthy relationship, in this case student-teacher: something in industry which is part of quality assurance. and 2. The comments and data allowed comparison of my teaching results not only against my peers but against other institutions. I found over many years that if your testing and grading system is useful, fair and transparent and if you care about your subject and put time and continuing improvements into your presentations, you will get at least an average grade. I know it’s brutal but I was quite proud of my slightly above average grade, as it reflected a careful balance of fairly difficult material in an accelerated timeframe to a fairly naive audience (Biochemistry for Nursing students). Extremely poor evaluations over several years are fair warning for students to look for alternative professors.

  15. davidrichardson says

    I used to work at an industrial training centre in Sweden where we had two sorts of welding courses, taught by the same teachers. The first one was ordinary welding on mild steel and the second one was MIG welding. In each case the participants would arrive on a Monday and spend the day registering and going through things like safety procedures. On Tuesday the participants on the first course would be in the workshop welding things … whilst the participants on the second course would have to sit behind a desk in a classroom learning things like Maths and Chemistry (MIG welding requires a bit more specialist knowledge). Guess which course got the better reviews!