The problem with grades and other summary evaluations

In previous postings (see here and here), I discussed why college rankings vary so much depending on who does the survey. One of the reasons is that different criteria are used to arrive at the rankings, making it difficult to arrive at apples-to-apples comparisons. In this posting, I will discuss why I think that rankings may actually be harmful, even if the measures used to arrive at them are good.

The main problem with rankings is that it requires a single summary score obtained by combining scores from a variety of individual measures, and it seems as if people focus exclusively on that final score and not pay too much attention to the scores on individual measures that went into the summary.

This is a general problem. For example, in course evaluations by students of their teachers, there are usually many questions that ask students to evaluate their teachers on important and specific issues, for example, whether the teacher encourages discussions, is respectful to students, etc.

But there is usually also a question that asks students to give an overall evaluation of the teacher and when such questions exist, those people who usually read the results of the surveys (students, teachers, and department chairs) tend to focus almost exclusively on this summary score and not pay much attention to the other questions. But it is the other questions that provide useful feedback on what kinds of actions need to be taken to improve. For example, a poor score on “encouraging students to discuss” tells a teacher where to look to make improvements. But an overall evaluation of “good” or “poor” for teaching does not tell the teacher anything useful on which to base specific actions.

Teachers face the same problems with course grades. To arrive at a grade for a student, a teacher will make judgments about writing, participation, content knowledge, etc. using a variety of measures. Each of those measures gives useful feedback to the students on their strengths and weaknesses. But as soon as you combine them into a single course grade using a weighted average, then people tend to look only at the grade, even though that really does not tell you anything useful about what a student’s capabilities are. But teachers are required to give grades so we cannot avoid this.

I often hear faculty complain that they give extensive and detailed feedback on students’ written work, only to see students take a quick look at the grade for the paper and then put it away in the their folders. Faculty wonder if students ever read the comments. I too give students a lot of feedback on their writing and have been considering the following idea to try to deal with this issue. Instead of writing the final grade for the paper on the paper itself, I am toying with the idea of omitting that last step and ask the students to estimate the grade that I gave the paper based on their reading of my comments. I am hoping that this will make them examine their own writing more carefully in the light of the feedback they get from others. Then when they have shared with me what grade they think they got and why, I’ll tell them their grade. I am willing to even change it if they make a good case for a change.

I am a little worried that this process seems a little artificial somehow, but perhaps because that is because it is not common practice yet and anything new always feels a little strange. I am going to try it this semester.

Back to college ratings, those can be harmful for another reason and that is that the goals of a school might not mesh with the way that scores are weighted. For example, the US News & World Report rankings take into account incoming students scores on things like the SAT and ACT. But a school that feels that such scores do not measure anything meaningful in terms of student qualities (and a good case can be made for this view) might wish to look at other things it values, like creativity, ingenuity, citizenship, writing, problem solving, etc. Such a school is doomed to sink in the USN&WR rankings, even though it might be able to provide a great college experience for its students.

I am a great believer that getting useful feedback, in whatever area of activity, is an excellent springboard for improving one’s performance and capabilities. In order to do so, one needs criteria, and targeted and valid measures of achievement. But all that useful information can be completely undermined when one takes that last step and combines these various measures in order to get a single score for ranking or overall summary purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *