While I was getting my Ph.D., I taught almost every semester. I was a head TA for all but one semester which meant that I managed a team of TAs and worked closely with the course instructors, but didn't do any classroom teaching. I taught introductory electricity and magnetism the entire time and as I saw different professors and different students go through the material, I noticed that our course really wasn't optimized to maximize how well students learn the course material. This was surprising to me as I was at an institution that is consistently rated highly by many university rating agencies.
I'll borrow a concept from Wait But Why of how surprising advancements are. For example, my parents used computers the size of a room that has the computational capacity of a computer that now can be worn around the wrist. However, the way my parents were taught isn't too different from how teaching is done now. For the most part, lectures are still a lecturer standing at the front of the room writing on a blackboard. The lecturer may use a powerpoint presentation, and the students may be distracted from their laptops and phones, but the fundamental way of teaching hasn't really changed.
There are, of course, efforts to modernize learning through websites like Coursera and Khan Academy. I think the real benefit of these is real-time feedback to the instructor (in this case, the algorithms behind the course). In the course I taught, the instructor only got individual feedback from the questions students asked (which not everyone does), as they never saw student's homework or test scores except in aggregate. However, taking courses through these online services is not recognized as sufficient mastery when compared to a university course, and I doubt that this will change anytime soon. So, I thought it would be interesting to consider what keeps universities from making efforts to make their teaching more effective.
Universities are not incentivized to teach well
When I started teaching, I thought that the metric I should optimize for was student's understanding of the course material. The problem is that this is not easy to measure. Most grading at the university level seems to be done on a curve (which as I've written before, I have concerns about) to sidestep this issue. Even grading on a curve fairly is not trivial as writing good exams is an art. I think only one instructor I worked with had truly mastered it. Poorly written, or even poorly administered, exams provide a weak signal of student understanding. As instructors write different exams, it's even harder to combine signals from different courses taught by different instructors to get an aggregate view of how students learn.
This is an idea that is talked about in Cathy O'Neil's Weapons of Math Destruction. Because educational quality is something that is ill-defined, rating agencies, like US News, use proxies that seem to correlate with educational quality to rank universities. This sets up a perverse incentive structure, as the university is offering the service of teaching (though some may argue the service they offer is really the degree) but they optimize around the metrics the rating agency chooses instead of actually making the learning experience great. Companies choose to hire from the perceived best schools, which means that if a high school student wants to set up for a great career, it's in their best interest to apply for these top universities even if it means they will not get the best education.
Looking at the US News' metric, it seems that 20-30% of the metric that goes into ranking colleges actually directly comes from how teaching is conducted, in terms of class size and funds allocated to teaching students. An almost equal proportion (22.5%) of the score is reputation, which would be hard to assess from an outsider who has not taken courses at the university. Part of this reputation could be the university's willingness to try techniques such as more active learning in classes, but this is handled in an ad-hoc way and I would like to see it be a more direct part of the metric.
Professors are not rewarded for teaching well
I will preface this section by saying that I worked with some amazing instructors while being a head TA. I don't want to belittle the great work that they do. The fact is, though, that there are instructors who are not so good who are still teaching students and that is the topic of the rest of this post.
I found that as I worked to optimize my students' understanding of course material, that not all of the professors shared the same goal. I found that some professors (but definitely not all) were optimizing for student happiness, under the constraints of having to rank students in terms of performance. This makes sense from the instructor's point of view as the metrics the department sees of the instructor is the grade distribution they assign and the feedback they receive from students (and maybe a peer review by other faculty members). I've always found it shocking that the department did not care what the course TAs thought of the professors, though admittedly this could be a noisy signal as some courses only have one TA.
I felt that some of the instructors who I felt were ineffective at getting their students to understand the material got the highest ratings by students. In fact, it's been shown that student surveys are another metric that is a poor proxy for teaching effectiveness.
In fact, tenure makes it hard to get rid of a professor, and I found that often the tenured professors were some of the worst instructors in terms of student understanding. While being a Nobel prize eligible physicist is a great qualification for getting tenure, this really isn't a necessary or even sufficient condition to be a great lecturer¹. On the flip side, some lecturers were some of the best instructors I worked with but they are hired on limited term contracts and are paid little compared to their tenured colleagues.
Professors are averse to change
As a head TA, I worked with an instructor who was technologically illiterate and could not do much more on his computer than respond to email. There was another professor whose methods I had pedagogical concerns about. When I raised these to him, he responded with something along the lines of "no, we're not changing it because this is the way I've done it for 20 years and the students like it this way." I worked with others who are oversubscribed with research and family obligations that they can't be bothered to learn a new tool for the course they are teaching. This certainly isn't all the instructors I worked with, but the instructors described here are still teaching and will continue to teach unless broad changes are made.
Where I do see some willingness to adopt new tools is when work is being automated on the professor's side. In our course, we used Mastering Physics to automate the grading of homework assignments and reduce the grading burden on TAs. One concern I have with Mastering Physics is that it locks universities into multi-year contracts, creating friction to move to a new system. Combined with the instructor's desires not to learn a new system, this causes Mastering Physics to stick around, and there is not much pressure for Mastering Physics to improve their services. I've found that this has led to the course being stuck with a bad system without much hope for improvement.
I've noticed some quick wins for Mastering Physics that were not implemented in the time I used the service. For example, I noticed that 75% of the wrong answers my students submitted had incorrect units. It would be relatively easy for Mastering Physics to implement a check for this and give the students more useful feedback so that they learn their mistakes more easily.
Apparently, when Mastering Physics was first introduced into introductory physics courses, the professors found, using some standardized physics understanding tests, that understanding of physics actually increased after using Mastering Physics. The professors attributed this to the fact that the students got instant feedback on whether their answers were correct. This sounds great for Mastering Physics, but the problem is that the solution to the problems are now easily accessible and so it is easy to get credit for the homework problems without actually solving them.
I have no clue how large of a fraction of the students actually do the Mastering Physics problems. Sometimes we put problems very similar (or almost identical) to homework problems on the exams and I'm quite surprised that many students will struggle to do the problems on the exam. Even creating new problems is not a solution, as there are services like Chegg that have others work out problems for the student. In theory, this was possible pre-internet, but the scale of the internet makes it that much easier for students to exploit.
Even beyond that, Mastering Physics and other products like Webassign are limited in scope. They're rarely used outside of introductory classes because they can really only check the final answer. I do think there is more that can be automated in grading assignments. A lot of hard science and math questions boil down to choosing the correct starting point and showing a logical progression of steps to the final result. This is entirely within the realm of tasks a computer can do. Even for other disciplines where answers may be short answers, advancements in NLP are probably at the point where an answer could be compared with sample response(s) with quite a high accuracy. We would still need a human to check to make sure the model wasn't performing crazily, but this would greatly speed up the grading process.
Getting rating agencies, universities, and professors to care about teaching effectiveness is not an easy task. To me, the fundamental issue is that teaching effectiveness cannot be measured easily and poor proxies are being used to measure it. If rating agencies could rate universities based on teaching effectiveness, this would put pressure on universities to care more, which would then put pressure on instructors to care more.
I've mentioned before about how analyzing Gradescope data could be useful in helping instructors get a better idea of what each individual student has actually learned. As mentioned earlier, this is quite valuable as the lecturers for the course I taught never graded a homework assignment or a test. I could foresee in the future instructors being given a dashboard where they can look at what individual students understand while also being able to segment by factors such as attendance rate to get near real-time feedback about how they are performing.
But I also think Gradescope may be able to tackle the teaching effectiveness problem. By offering a service that allows for faster, less-biased grading, instructors are likely to adopt the service. I've seen that instructors respond well when tedious parts of their work are automated. As adoption increases, Gradescope gets a unique set of data, with different courses across many different disciplines across many different universities. Making sense of all the data is certainly a challenge, but if I had to guess, I would say that this data has insights about teaching effectiveness that can hopefully drive some of the necessary changes to education.
1. This may not be true in every discipline. I have taken introductory anthropology classes in which the instructor's research was a relevant topic of discussion.