What student evaluations of teaching are—and aren’t—good for
Watch the video of Philip B. Stark’s presentation (includes presentation slides)
On April 26, 2018, Philip B. Stark visited SFU as part of the Teaching Assessment Working Group (TAWG) Speaker Series to argue that “Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness.”
Stark, a professor of statistics and associate dean in the Division of Mathematical and Physical Sciences at the University of California at Berkeley (UC Berkeley), began his talk by presenting evidence that student evaluations are biased by gender, ethnicity, appearance and a host of other factors to such an extent that they are essentially meaningless as indicators of teaching effectiveness.
He outlined additional concerns—from differing interpretations of seemingly straightforward terms like “fair” to the non-random distribution of survey respondents to the misleading nature of averaged scores that can hide highly polarized responses—that led him to caution against the use of student evaluations for tenure and promotion decisions, especially in isolation.
Interestingly, however, he did not reject the idea of soliciting student feedback on teaching entirely. Instead, he emphasized the need to think carefully about what feedback should be gathered and how it should be used.
What we should—and shouldn’t—be asking
“Shouldn’t students have a voice in evaluating teaching?” asked Stark. “Absolutely. The question is, what’s the appropriate voice, what are they in a position to judge, what kind of information can they provide to inform better teaching?”
He offered a number of possible answers to those questions.
“It’s okay to ask about students’ subjective experience of class … ‘Did you find the class challenging? Did you enjoy the class? Could you read the instructor’s handwriting?’ ”
But other queries are inappropriate, he said, simply because students are not qualified to respond.
“You should avoid abstract questions, omnibus questions and things that require judgement.
“It’s the omnibus, abstract questions that tend to be most malleable, that tend to be most subject to … biases. So asking students a question about ‘Overall, how effective was the instructor?’ is an invitation for biases to be the driver of the answer.”
Other ways to evaluate teaching
Stark’s presentation took place within the context of TAWG’s inquiry into better ways for the university to assess and value teaching.
In his conclusion, and during the question-and-answer session that followed, Stark offered some thoughts on alternative and complementary approaches for teaching evaluation.
“My feeling is that it’s a lot easier to measure inputs than outputs. We can measure effort and engagement—just how seriously is the person contributing to the teaching mission of the university, keeping curriculum up to date, trying to improve, revising the curriculum, developing new courses, supervising undergrads, mentoring, … things like that.”
At UC Berkeley, he observed, several departments including his own have implemented peer observation of teaching. The UC Berkeley Senate has also recommended the use of teaching portfolios.
Stark’s department is “de-emphasizing” the role of student evaluations and employs a number of practices intended to reduce misinterpretation and misuse of evaluation results.
“We have banned the use of averages [in survey reporting] completely. Instead, we are displaying the data in bar charts. We are reporting on response rates and we are putting in caveats on the interpretation.
“We are in the process now of revising the student evaluation items to avoid items that call for judgement.”
Stark acknowledged that no single solution is perfect. In combination, however, these options could provide a more complete, and more accurate, picture of teaching effectiveness.
Watch the video of Philip B. Stark’s presentation (includes presentation slides).