Steve Volk, November 28, 2016
Among the relatively few rules that govern what we do in the classroom and how we do it is the requirement that all teaching faculty hand out evaluation forms “near the end of each semester” (College) or “before the end of each semester” (Conservatory). In the unstructured, devil-may-care past, each department (and each individual in the department) was pretty much free to design its own evaluation form, at least in the College, and I’ll just stick to Arts & Science here since the Conservatory has its own rules. That somewhat chaotic system, which made cross-departmental comparisons difficult since different attributes were measured and recorded on different scales ranging 3-point to a six-point scale, was put to rest some years ago. The current forms are designed around a standard one-to-five scale in six broad areas which the research has shown to produce (the most) valid and reliable results: 1) course organization and clarity, 2) instructor enthusiasm, 3) teacher-student interaction, rapport, and approachability, 4) workload and course difficulty, 5) assessments: exams, papers, grading fairness, and feedback, and 6) self-rated learning. We have standard rules about how they are to be distributed, collected, and returned to the faculty.
That said, there remains a lot of controversy about the value of such an exercise, not just among those who would argue that students shouldn’t be evaluating faculty at all (by my guess, a relatively small number) to those who think that the forms don’t actually tell us much about our teaching, to those who think that they don’t tell us anything about student learning – which is something we actually should be measuring – to those who argue that the research clearly demonstrates that SETs are significantly biased against many different subcategories of faculty: women (female faculty in physics in particular), faculty of color, Asian faculty, international faculty who speak “accented” English, faculty who teach quantitative methods courses, and “less physically attractive” faculty.
There is even research that suggests that the impression that your students form of you in the first week of class will essentially turn up on the SETs 14 weeks later. And let’s not forget the less rigorous studies that indicate that handing out student evaluation forms along with donuts will improve the results. A word to the wise: stay away from the glazed: what a mess!
More seriously, important arguments are emerging that suggest that student evaluations of teaching are a blow to academic freedom and the it is a “folly” to use “Student Evaluations of College Teaching for Faculty Evaluation, Pay and Retention Decisions.”
Knowing all this, it is not a stretch to suggest that we need to engage a new, research-driven conversation on student evaluations. At the very least, we should think seriously about what the bulk of the research on evaluations of teaching has disclosed: that student evaluations of teaching should be only one leg of the teaching evaluation process, a process which should include regular peer evaluation of teaching by faculty trained in such methods and following a standard, cross-college protocol, and a “forensic” examination of course syllabi by outside experts in one’s own field undertaken for reappointment, tenure, and promotion. The latter can suggest whether an instructor is keeping up with the field, incorporating new materials, retaining important “classics,” adequately reflecting where the field is going. Since few of us know the literature in our colleagues’ areas, this is best done by faculty from other colleges and universities who teach in the same field.
But these are for future discussions. Here I will focus on how to use our current SET forms in a way that modestly preserves your sanity while helping you think about your teaching in a more productive fashion.
Do SETs evaluate teaching?
The simple answer is “not really,” or at least not fully. SETs are designed to measure student satisfaction with teaching, not whether students are learning. To be sure, there is an important relationship between student satisfaction and student learning (and, hence, faculty teaching), but it’s not a direct one. If a student finds a faculty member’s approach to be disorganized, his exams to be unfairly graded, or the readings to be insubstantial, then student learning will likely be less than it could have been. But satisfaction does not stand in for “learning,” and SETs are certainly not a measure of student learning. If you want to measure student learning in your classroom – a measurement that is not duplicated on the grade sheet, which will tell you how well students did in your class, not whether they “learned” – you need to be doing other things. But that, too, is a topic for another post.
Since SETs are about student satisfaction, they will necessarily be subjective. Unless you’re some kind of teaching god, and none of us is, in the pile of evaluations you received will be some which classified you as the best teacher they ever had…and some which indicated that the student wouldn’t be disappointed if the earth open up and swallow you. You will have read evaluations from students who thought you were the model of clarity and others who found the course to be an perplexing labyrinth. Turning papers back in two weeks will rank you as a “5” in some students’ opinion, and a “2” for others who expected their papers to be returned within five minutes of handing them in.
Because SETs are about satisfaction, they only “work” (i.e. produce reliable data about your teaching) on the “average,” not by focusing on single responses. If considerably more students think that your exam was fair than consider it to be manifestly inequitable, you can conclude that the exams you give are considered by your students to be fair. Nevertheless, I must quickly add here that I’m using “average” as a layperson, not a statistician. If you ask a statistician, let’s say Phillip Stark, a Professor of Statistics at Berkeley, about using “averages” to rate or rank teaching,, here’s what you’ll get: “Averaging student evaluation scores makes little sense, as a matter of statistics. It presumes that the difference between 3 and 4 means the same thing as the difference between 6 and 7. It presumes that the difference between 3 and 4 means the same thing to different students. It presumes that 5 means the same things to different students in different courses. It presumes that a 4 “balances” a 6 to make two 5s. For teaching evaluations, there’s no reason any of those things should be true.”
Stay with me for another moment, and you can readily see where the deeper problems with SETs are located: by focusing on norms – what is fair, for example – SETs (and those who read them) must assume what the norm is, and many researchers have noted the problems in this. Standardized testing, for example, has been shown to discriminate against black students. Even in low-stakes testing, “fairness” is often in the eye of the beholder, i.e., the person who prepared and distributed the exam. For more on this, see last week’s “Article of the Week,” on implicit bias.
All this said, nonetheless, on a broad level, SETs can help to identify outliers – a class which seems to have been overwhelmingly successful or particularly troubled.
Handing Them Out, Getting Them Back
College rules generally state that SETs are to be handed out in class near the end of the semester. There are new rules governing on-line student evaluations, and OCTET and the Dean’s office can give you some advice in that regard. Since most departments and programs leave it up to the faculty member to decide exactly when to hand them out in the last two weeks, you’re free to pick a time that works for you. So here’s a simple question: do you really want to hand out student evaluations right after you return a graded paper or an exam? Right after a class that went off the tracks? It probably won’t change the results, but think about distributing them at a moment that feels right for you, and usually not the very last class of the semester.
Let students know what SETs are used for: they are to take them seriously, they measure student feedback on six areas that the research has shown to produce valid and reliable results, and they are used in college personnel decisions regarding salaries and promotions. And then you leave the room, having designated a student who will collect them, put them into the big envelop you have been provided with, and deposit them with the departmental administrative assistant.
Then put them out of your mind.
Because of college rules, which are likely similar everywhere, you will receive your teaching evaluations back only after your grades are filed. So, at some point in January or June, after our hard-working AA’s have tabulated and organized the data, we find out that our SETs are ready to be picked up! And this is where you can make some decisions.
First decision: do you rush in to get them, play it cool, like a cat walking around a particularly lovely kibble before pouncing, or pretend that they aren’t there until, sure enough, you have actually forgotten all about them? I usually take the middle route on this, but, in any case, I certainly won’t pick them up on a day when the most prestigious journal in my field has just rejected the article I had been working on for an eternity, nor will I get them right after an unnamed President-elect has just nominated Attila the Hun for a cabinet position. Another hit that day, I just don’t need.
When I finally make the move, I’ll take the forms to my office, put them on my desk, pretend that they aren’t there while I read through my Facebook posts for the last 6 months. Enough, already. I open the folders and read, rapidly, the overall numbers: not what I hoped for, better than it could have been, whatever… Then I put them away for at least a day or two. I don’t think I’m ready to take them on-board just yet, whether the numbers are good, bad, or indifferent. I go back to my email, the article, the gym, until I feel mentally prepared to explore the terrain a bit more carefully.
When I do return to my evaluations, I give myself the time to read them carefully – and usually privately. I don’t pay much attention to the individual numbers – those have been summarized for me, but I read the comments with care… and a mixture of interest, confusion, skepticism, and wonder. How is it that the student who wrote “he is probably the most disorganized professor I’ve ever encountered” attended the same class as the one who commented, “This was a marvel of organization and precision”? What is one to make of such clearly cancelling comments?
Here are a few tricks for trying to give student teaching evaluations the kind of close reading that they merit, neither overestimating their importance nor discounting what they may have to tell us:
- Don’t dwell on the angry outliers. That’s advice more easily given than taken. I have read enough teaching evaluations, my own as well as those of others, to know that there are some students who just didn’t like our classes and have not figured out any helpful or gracious way to say that. The fact that these are (hopefully) a tiny minority and are directly contradicted by the great majority of other comments doesn’t seem to decrease their impact, or the fact that we continue to obsess about them. (I can still quote, verbatim, comments that were written in 1987!) These bitter communiques probably serve some purpose for the student, but they really don’t help us think usefully about our teaching. Be like the Vikings: send them out to sea in a burning boat.
- Evaluate the “cancellers,” when half the students thought the class was paced too fast and the other half too slow. These are harder to deal with and can add to the cynicism of those who think that the whole SET adventure is a waste of time. For the “cancellers,” I try to figure out a bit more about them to see if they represent some legitimate (i.e., widespread) concern about the class or not. Is one side of the debate generally supported by the numbers? Do I score lower in the discussion-oriented questions than in other areas, lower than in previous iterations of the course, or lower than I would have really wanted? Does the demographic information provided by the student add context that is useful and that I should take on board? I am more likely to trust comments from seniors than from first-years, for example. I pay attention to comments that suggest a striking gender or racial difference in terms of how students respond to specific questions. These data are extremely important information and are why we (generally) ask for demographic information on SET forms. A careful reading of this information can help us understand what is going on in our classes on a more precise level. And, if none of the above helps me think about why something I have done works for some and not others, I make a note to myself to ask students explicitly about it the next time I offer the class.
- Focus on those areas that seem to be generating the greatest student concern. Are they having a hard time trying to figure out how the assignments relate to the reading? Do a considerable number worry that they aren’t getting timely or useful feedback? Is there a widespread upset that every class runs too long and students don’t have enough time to get to their next class? For each of the areas where I find a concern that has reached a “critical mass” level and is not just an angry-outlier grievance, I consider what I think about their criticism and whether, given my own goals in the course, I find it legitimate. For example, getting work back on time depends on the size of the class and what I have promised: in a 50-person class if I say I’ll return work within two weeks, and then do so, I won’t worry about students who complain that I only returned their work two weeks after they turned it in.
- Other issues force me to think more about how I teach and what impact that has on student learning. What of students who protest that “there’s too much work for a 100-level class”? I have gotten a lot of those comments, and it makes me wonder why students think a 100-level class should involve less work than a 300-level class? Do we, the faculty, think that a 100-level class should assign less work than a senior seminar? Certainly, upper-level classes will be more “difficult” than 100-level classes: they demand that the students have acquired significant prior knowledge and skills needed to engage at a higher level. But should there be any less work involved in the entry-level class? Since I don’t think so, I wouldn’t change that aspect of the course even if the students complained.
But, ultimately, when student comments suggest what appears to be real areas of concern, when they point to something I am doing in the class that negatively impacts student learning, then I need to regard that issue with the seriousness it deserves. I will think about how I might correct the problem, and, often, the best way to do that is to talk to my colleagues and find someone in my department or outside who can read my SETs with me. That has served me well every time, and it does point to the ultimate utility of SETs for the individual faculty member on a formative level: they can help us to design our teaching to more effectively promote student learning.
What’s Not on the SET?
SETs in the College are geared around six different response areas, and those need to be addressed. But you needn’t be limited by those if you want to add other questions or address other concerns. There are two areas not covered in the SETs that I’ll briefly introduce. The first has to do with diversity and inclusion. There is nothing to stop you from adding a question or set of questions in this area:
Are there parts of the course that you felt could have been more inclusive? If so, please be explicit about the ways that you felt I could have included more diversity in the course? Has any aspect of the course or my teaching disclosed a bias that has impacted your learning, or the learning of others in the course that you have witnessed? Do you have any concrete suggestions for ways that I can improve the classroom environment to encourage more inclusion?
The second area has to do with questions that might better get at student learning (as opposed to student satisfaction). Linda Shadiow and Maryellen Weimer, writing in Faculty Focus last year (Nov. 23, 2015) suggest a series of questions that can help foreground student learning issues. They offer a series of fairly simple sentence stems for students to complete. For example,
- It most helped my learning of the content when…because…
- It would have helped my learning of the content if…because…
- The assignment that contributed most to my learning was…because…
- The reading that contributed the most to my learning was…because…
- The kinds of homework problems that contributed most to my learning were…because…
- The approach I took to my own learning that contributed the most for me was…because…
- The biggest obstacle for me in my learning the material was…because…
- A resource I know about that you might consider using is…because…
- I was most willing to take risks with learning new material when…because…
- During the first day, I remember thinking…because…
- What I think I will remember five years from now is…because…
These questions can be added to the current SETs that you will be handing out. Include an additional sheet with these questions which, still anonymously, can be returned directly to you rather than being tabulated by the department AA’s or becoming a part of your official file. Take a look at these responses before preparing classes for next semester.
SETs are highly problematic, but they are here at least for the present, so it’s wise to think about how to use them to best effect.