Learning in Public about Teaching Machine Learning

Author

Phil Chodrow

Published

April 25, 2023

In Spring ’23, I taught two sections of Middlebury’s course CSCI 0451, Machine Learning. This is an elective in our computer science department which has been taught a handful of times before I arrived. The prerequisites are data structures, discrete math (which we call Math Foundations of Computing), and linear algebra. [The data structures prerequisite is primarily about computational maturity – we don’t use so much content from this course.] The formal course description, which I contributed to revising, is:

Machine learning algorithms detect patterns in data and use these patterns to make decisions. This course introduces the theory and practice of machine learning. Tasks considered may include classification, regression, clustering, dimensionality reduction, text embedding, and reinforcement learning. Applications may include predictive analytics, data visualization, pattern recognition, and strategic game-playing. Students will also discuss the social implications of automated decision systems.

Additionally, this course was the first ever Responsible Computing (RC) course at Middlebury. There will soon be other courses tagged as RC courses within and beyond the department of computer science. Our CS majors will be required to take at least one such course in order to complete the major.

The course attracted 42 students split across the two sections. Most of these students were computer science majors. There were also some minors and a handful of students from departments like mathematics and economics; I admitted these students provided they demonstrated strong computing skills.

Aims

Student Learning

I had three big-picture aims for what students would do by the end of the course:

I wanted students to become effective, powerful wielders of techniques in data science and machine learning.
I wanted students to understand elements of the theory of machine learning methods, especially the foundation of most methods as optimization problems.
I wanted students to develop critical perspective on the impacts of machine learning systems on human beings.

I split these aims into learning goals:

For wielding techniques, I asked students to navigate the Python data science ecosystem and experiment on their findings.
For theoretical understanding, I asked students to both reason mathematically about algorithms and implement some algorithms from scratch.
For developing critical perspective, I posed a social responsibility goal with several components.

I additionally added a project-based learning goal which asked students to work on a sustained problem of their choosing, in which they would demonstrate learning against at least two of the goals above.

Personal Growth

This was my first time teaching an advanced course at Middlebury or in a computer science department. Because of this, one of my major aims was just to get to know the students better – to understand what they were ready for and what kinds of experiences would promote their learning. A second aim was to experiment with novel techniques and assessment strategies. In particular, this course was my first experiment with ungrading.

Implementation

A Typical Day

The standard “cycle” for the course had four steps:

I had aspirations for more active work in each class period, but ultimately didn’t manage to get it going. This is a possible area of improvement for next time.

Students completed readings.
Based on the readings, students completed a short warmup assignment.
At the beginning of class, students would present the warmup assignment in groups. Groups were persistent throughout the semester, and the warmup presenter for each group was randomized each day.
After the warmup presentations, we would briefly discuss them as a class and draw any necessary connections to the day’s content. We would then move in to a lecture, which was usually either theory or live-coding.

Ungrading

In my implementation, ungrading looked like this:

Guided goal-setting at the beginning of the semester.
A mid-semester and end-of-semester reflection, in which the students would reflect on their effort, achievement, and engagement in the course and propose a letter grade.
A final conference in which the student and I would discuss their reflection and agree on a letter grade.

In terms of assignments, students completed their projects as well as a selection of blog posts on different course topics. Most blog posts involved implementation, experimentation, or navigation in some form, alongside technical writing describing the student’s approach. Some blog posts were more like essays on topics related to social responsibility. I gave students the opportunity to propose their own blog posts, although very few students actually did.

Assignments and Assessment

Warmup assignments were not assessed in any way. The main external incentive behind warmups was social: there was a chance of being called on to present the warmup activity to the student’s group. There was no penalty for passing or asking for help, although it was relatively rare that students did this.

Blog posts were assessed on a scale that was designed to express whether I felt the student had more to learn from the assignment. From the syllabus:

No revisions suggested: you’ve done great work and should focus on the next thing.

Revisions useful: you have opportunities for improvement on this assignment, but focusing on the next topic or assignment may be a better use of your time—use your judgment.

Revisions encouraged: the best use of your time is to respond to feedback and resubmit, rather than moving on to the next assignment.

Incomplete: the assignment isn’t sufficiently complete for it to be used as evidence of your learning.

In practice, these were represented as point-scores 0-3 in Canvas. I also used these point “scores” to communicate with students about the progress of the class through the course; e.g. saying that “the median student has a ‘score’ of 10 points total.” In practice, I think that this was perhaps a step too close to points-based grading, and may have caused some students to focus on getting “3”s (no revisions suggested) or wondering how many “points” they needed in order to earn a certain grade. Both of these tendencies, while natural, are contrary to the environment I was trying to create, and I may try to do this differently next time.

Projects were assessed in an individualized manner based on what was achieved in the project and the students’s contribution to their group’s work. The assessment was informal and used as evidence of learning and achievement in the context of setting the student’s final grade in the course.

What Worked?

There was positive feedback around several elements of the course. My overall sense is that:

The pace and difficulty of the course are about right (after a small correction early in the semester).
The blog post format is effective, supporting student learning and helping them take pride in their work.
Blog posts were pretty well-aligned with the content of lectures.
I was fairly clear in communicating my expectations and logistics.

From a custom feedback survey administered in the final week of class (36/42 respondents):

92% of students agreed that “The format and structure of the course supported me in demonstrating my learning and accomplishment.”
94% agreed that “Readings, warmups, and lectures prepared me to approach the course assignments.”
94% of students agreed that “I understand how the content of the course related to values, impact, and social responsibility related to the more technical content.”
85% of students agreed that “I am proud of the work that I completed in this course.”
92% of students agreed that “The process of writing blog posts (in addition to implementing algorithms and doing experiments) supported my learning.”
97% of students agreed that “I understand what Phil expects from me and how my final letter grade will be determined.”

67% of students reported spending 8-12 hours per week on the course, which is roughly my target area. 22% spent 4-8 hours.

I made a mistake in designing this question by not clarifying whether this included time spent in class or not.

55% of students felt that the pace of the course was about right, with 33% finding it “a little too fast” and 11% finding it “a little too slow.”

Blog posts came up as something that worked well for students. When asked what worked well for them, students said:

“Having the opportunity to revise my work and submit things when I felt they were ready to submit.” “Blog posts - they were time consuming but I learned a lot.” “the blogposts! I feel like they were the best avenue to make sure I understood what happened in class, and gave me the chance to implement what I had learned.”

Students also expressed that they appreciated the flexible deadlines and ability to choose some assignments over others as ways to demonstrate their learning while avoiding busywork. I also received good feedback in both the survey and in several written reflections about the course content related to values and social responsibility.

What Didn’t?

There was also some clear feedback around things that didn’t work so well, or worked for some students and not others.

The warmup activity implementation was a mixed bag, in terms of relevance to course content and in terms of group composition.
Live-coding during lecture was a bit mixed: 70% of students agreed that they found it helpful, while 22% neither agreed nor disagreed and 8% disagreed.
The theoretical content of the course was not always sufficiently connected by practical applications, leaving some students confused and discouraged.

Student reception of the warmup activities was mixed.

Only 67% of students agreed that “Daily warmups helped me keep on top of the material.”, with 28% neither agreeing nor disagreeing and 6% disagreeing.
72% of students agreed that “Working with the same warmup group every day helped me feel comfortable discussing my ideas in class.” 22% neither agreed nor disagreed, and 6% disagreed.

“Warmups. I think often they were just burdensome and didn’t reinforce learning, and I would have liked to have been in different groups throughout the semester instead of the same one.”

The course started with some theoretical foundations, but not with a review of required math concepts. For several students, this was a bad fit:

“I wish the beginning part of the course was a little less math heavy, and we got more invested in the math towards the end. It felt a little overwhelming at first, and then I felt ready for it by the time we were focusing less on the math” “Math at the beginning of the semester almost deterred me from this course. I’m glad I stuck with it though, because I really enjoyed the the less theoretical portion of the course.” “Something that worked poorly for me in this course was the lack of review of mathematical concepts… A brief review at the beginning of the course would have made it easier to understand many of the algorithms we implemented in the course.” “…after the first two classes of the semester, I almost dropped the course due to the fact that the math felt outside of my capabilities”

Some students also struggled to maintain a good pace on the blog posts, or even clearly understand what pace counted as “good.”

“Not getting on top of my blog posts early enough and falling behind.” “The lack of concrete deadlines made it difficult for me to keep on top of my work, specifically in the second half of the semester when I was also working on the final project.”

Advice to Peers

I asked students about the advice they’d give their friends who might take the class next time. They had some great tips!

stay on top of the blog posts and do the daily warmups. also go to office hours if you are confused, Phil is helpful and there will likely be CS0451 peers there to talk through assignments with. Review after each class using lecture notes so that you have a solid understanding of the concepts taught in class. Treat the suggested deadlines as real deadlines as best as you can! Go to office hours and discuss the course material in study groups. Decide on a goal that would support your learning, but isn’t too out of your reach that you’ll end up stressing a whole lot. Allocate a little bit of time each day to work on your blog posts instead of cramming them in. This was what I did and I’m glad I did it. Be realistic in your goal setting and focus on what you want to get out of the course. You don’t have to complete every blog post, but put adequate time and effort into the ones you choose to complete and choose topics you are interested in.

Revised Sequencing

My sense from feedback and observation is that different course sequencing might have helped tell a better story for which students were more prepared. First, I think there is in general some opportunity for better thematic unity. Second, I think that students may be prepared to grapple with some more challenging mathematics if (a) they have some more support in the course to help with this and (b) they more clearly see the motivation for doing this in the context of machine learning models. This could look roughly like the following:

Part 1: Using Models (6-7 lectures)

Idea of machine learning, outline of the ML workflow and the sklearn API (2 lectures)
- Working with data frames and numpy
Feature maps, overfitting, cross-validation, and vectorization, decision regions, confusion matrices and error rates (2-3 lectures)
Allocative bias: case study, quantification, critical perspectives (2 lectures)

Part 2: How Models Work (8 lectures)

Linear models (5 lectures)
- Linear models and empirical risk, logistic regression
  - multiclass logistic regression?
- Convexity
- Gradient descent, stochastic gradient descent, and other optimization methods
Regularization (1 lecture)
Nearest-neighbors and kernel methods, SVM (1-2 lectures)

Part 3: Deep Learning (7-8 lectures)

Motivation: the problem of features
Automatic differentiation
Components of Torch
Image classification (1-2)
Text classification and word embedding (1-2)
Text generation

Assignments

Up and running with Quarto, maybe very brief demos with Pandas and numpy.
Penguins.*
Overfitting experiments and solutions? Text or image vectorization?*
Auditing allocative bias / limits of the quantitative approach.*
Logistic regression and gradient descent.*
Linear regression / unsupervised learning with linear algebra.
Adam and automatic differentiation. Maybe for 1d logistic regression?
Guest speaker? Research essay? Critical perspectives?
Deep learning practice?

Project proposal. Project presentation. Project blog post.*

Intention: project effort is roughly 2-3 blog posts.

*Core assignments, have to complete these in order to earn above a C, maybe? 1 more than core, plus good project: B 2 more than core, plus good project: A- 3 more than core plus good project: A

Optional Extras / Alternatives

Kernel logistic regression.
Implement something fun: primal SVM with Pegasos, decision tree, etc?

Timelines

Maybe we do a policy that

Many of the assignments can be left as is. I don’t think they took a ton of value out of implementing perceptron; going straight into logistic regression as a first implementation problem would likely to be better.

Assessment

Having more direction for assignments completed might be helpful; students could benefit from some calibration on what counts as “enough.” Might be worth even considering moving over to a version of specs grading: a specific number need to be completed to a specific degree of quality in order to be counted.

Speaker

Our engagement with TG had a few hiccups, but overall went quite well, especially regarding student engagement and morale. It would be nice to have this be a larger and recurring thing that happens, possibly once a semester, as part of the CS department standard offering. Could potentially be part of us reviving a seminar?

Tooling

I worked in Jupyter Lab for live-coding, but the students are all familiar with VSCode from CSCI 0201 and so it would probably make sense to just move over there, especially since that’s my comfort area too.