Developers as Future-Makers

This lab is adapted from an activity created by Evan Peck (Bucknell University).

Reminder: you and your partner are a team! You should not move forward to one activity until you are both comfortable with the previous activity.

In a recent lab, we studied the potential role of algorithms in prioritizing the needs of some people over others. Here, we’ll experiment with this same idea in another, very real context that may impact your future prospects for jobs and other opportunities.

In this lab, you’ll practice:

For-loops
List operations
Processing large quantities of data

Whose Application Gets Read?

I should note that Middlebury Admissions states that they read every single application, and I have no reason to doubt this. I use the example of college admissions only because it’s one that’s perhaps fresh in many of your memories. However, this is a very real scenario in job applications, and it is likely to have an impact on your own hiring someday.

Imagine that you are consulting for Prestigious University (PU), a very famous university that receives tens of thousands of applications every year. Due to recent budget cuts, there aren’t enough staff to read every single application. PU has hired you to design a filtering algorithm for them that is designed to find a subset of applicants whose applications should be read more carefully. PU’s theory is that if an applicant has, say, consistently very poor grades, then their application may not even be worth reaading. So, another way of saying things is that your algorithm should screen out those applicants.

Applicant Grade Data

Data on the applicants’ grades is supplied as a list. The list contains:

Of course, the fourth year data is partial.

Overall GPA in the applicant’s first, second, third, and fourth years of high school.
GPA in each of the following groups of courses in the applicant’s third and fourth years:
- Literature
- Science
- Math
- Social studies
- Electives

So, a complete set of data contains a total of 9 numbers, like this:

EXAMPLE = [3.4, 3.3, 2.7, 3.6, 4.1, 3.5, 2.8, 3.2, 3.0]

Here are the positions of each of the scores:

FIRST_YEAR      = 0 
SECOND_YEAR     = 1
THIRD_YEAR      = 2
FOURTH_YEAR     = 3
LITERATURE      = 4
SCIENCE         = 5
MATH            = 6
SOCIAL_STUDIES  = 7
ELECTIVES       = 8

For example, we can access the applicant’s third-year GPA like this:

EXAMPLE[THIRD_YEAR]

Activity 1

Create two new Python files in Thonny, and save them in the same location on your computer.

applications.py is a file that you’ll need, but won’t modify. After creating the file, copy all the code here into applications.py. You should then make no further modifications.
lab.py is the primary file in which you’ll complete activities for this lab. After creating the file, copy all the code here into lab.py.

In the applications.py file, I’ve written some functions to help you extract different parts of the data. Given a list of data like the example above, you can get the GPAs for the four years like this:

get_year_scores(EXAMPLE)

You can get the GPAs for the five subjects like this:

get_subject_scores(EXAMPLE)

Let’s see this running in code.

Activity 2

Copy and paste the following code into lab.py, under the ACTIVITY 2 area of the file. Then, run the file.

# import the functions and variables from the applications file
from applications import *         

# the EXAMPLE variable is some example data

print("EXAMPLE DATA")
print("------------")
print(EXAMPLE)

print("\nGPAs for 4 Years")
print("---------------")
print(get_year_scores(EXAMPLE))

print("\nGPAs for 5 Subjects")
print("------------------")
print(get_subject_scores(EXAMPLE))

Discuss with your partner and make sure you understand what get_year_scores() and get_subject_scores() are doing. These functions are documented: try doing help(get_year_scores) if you’d like to see the documentation.

Analyze Applicants

We’re now going to complete a series of functions that are going to analyze an applicant. Each of these functions returns a Boolean value (True or False) which determine whether the application is recommended to be read by a human reader for further review.

Your long-term goal is to implement the following functions, and check that they work. We’re not going to implement all of them right now, though.

analyze_applicant_1(): recommends applicants that have an average of their five subject areas above 3.0. I’ve written this one for you, as well as an example, to help you get started.
analyze_applicant_2(): recommends applicants that achieved a GPA above 3.2 in at least three of their four years of secondary school.
analyze_applicant_3(): recommends applicants that have no GPA score below 2.5 in any of their five subject areas.
analyze_applicant_4(): recommends applicants that achieved a higher average GPA in their third and fourth years than in the first and second years.

Activity 3

Copy and paste the following code into lab.py, under the ACTIVITY 3 area of the file. Then, run the file. Talk through the code with your partner to ensure that you understand how it works.


def analyze_applicant_1(data):
    """
    recommends an applicant if the average of their subject scores is larger than 3.0
    """

    # get the subject scores
    subject_scores = get_subject_scores(data)

    # compute the average (mean) of the subject scores
    # average is sum of all scores divided by number of scores
    total = 0
    for i in range(len(subject_scores)):
        total = total + subject_scores[i]
    mean = total / len(subject_scores)

    # return whether the mean is > 3.0
    return mean > 3.0

EXAMPLE_2 = [3.2, 2.7, 2.9, 3.1, 3.0, 2.5, 2.9, 3.3, 2.2]
EXAMPLE_3 = [3.4, 3.3, 2.7, 3.6, 2.8, 3.5, 2.8, 3.2, 3.0]
    
print(analyze_applicant_1(EXAMPLE_2))
print(analyze_applicant_1(EXAMPLE_3))

Activity 4

Now, implement the function analyze_applicant_2(). This function should return True if the applicant had a GPA of 3.2 or higher in at least three out of their four years. Write your function definition in the ACTIVITY 4 area of lab.py.

Once you’ve implemented your function, paste the following code below your function definition.

print("\nExamples for analyze_applicant_2()")
print("----------------------------------")    
print(analyze_applicant_2(EXAMPLE_2))
print(analyze_applicant_2(EXAMPLE_3))

Run the file. Do analyze_applicant_1() and analyze_applicant_2() agree or disagree on the two examples?

Create Your Own Recommender

Later in the assignment, we’ll implement analyze_applicant_3() and analyze_applicant_4(). For now, though, we’re going to focus on your creative ideas for making a recommender filter.

Activity 5

In English, not code, write down a rule for recommending applicants. You are welcome to use ideas from the analyze_applicant_*() functions described above, but please make sure to include your own original spin. What do you think would be the most fair way to recommend applicants for further review?

Suggestion: use a rule that includes information from both the year scores and the subject-area scores.

Activity 6

Find another group that has also completed Activity 5. Ask them to describe THEIR answer, and write it down in the Activity 6 area.

Implement!!

Activity 7

Implement four functions:

analyze_applicant_3() from above.
analyze_applicant_4() from above.
our_analyze_applicant(), your idea from Activity 5.
their_analyze_applicant(), the idea of the other group from Activity 6.

Write your function definitions in the Activity 7 area. You don’t need to demonstrate them on any examples, although you may wish to do so in order to check for errors.

Test At Scale

If we only had to look at one or two applicants, we wouldn’t need a program for this at all – we’d just do it by hand. But what if we have 100,000 applicants?

I’ve included some functions (which you imported from applications.py) to help you generate and view large numbers of random example apps:

create_app() creates a single random application.
create_apps(num_apps) will create a list containing num_apps applications.
print_apps(apps, first_n) will print (not return) the first first_n apps in a list.

So, for example:

apps = create_apps(1000)  # create 1000 apps
print_apps(apps, 5)       # print the first 5

Let’s first write a function that will allow us to experiment with the behavior of one of your filtering functions. This is a complicated function – take your time and build it up slowly!

Activity 8

Implement a function called analyze_many_apps() that will show an analysis of a large set of applications. This function should print several messages, but does not need to return anything.

Your function should take a list of applications, which you can assume was generated by create_apps(). It should:

Inside the function, create a list called recommended of apps that were recommended by analyze_applicant_1(). Create another list called not_recommended of apps that were NOT recommended by analyze_applicant_1().
- HINT: Loop through the elements of apps. Use an if-statement to check whether an app was accepted. If it was accepted, append() it to the recommended list. Otherwise, append() it to the not_recommended list.
Print ---SUCCESSFUL APPS---
Print 5 examples of recommended apps.
- HINT: If you’ve constructed your lists correctly, all you need to do is call print_apps(recommended, 5).
Print ---UNSUCCESSFUL APPS---
Print 5 examples of not-recommended apps. Same hint as Step 3.
Print ---------------------
Compute the recommendation rate. This is the percentage of applications that were recommended.
- HINT: If you constructed your lists correctly, the percentage is len(recommended)/len(apps).
Print the acceptance rate.

Once you’ve implemented this function, run it like this:

apps = create_apps(100000) # 100000 apps
analyze_many_apps(apps)

Place both your function definition and an example of calling the function in the ACTIVITY 8 area of lab.py.

Now we can use the function you just wrote to help us form an initial judgment about whether analyze_applicant_1() is a good method for filtering applicants:

Activity 9

Record the approximate acceptance rate from your analysis. Then, write down whether you think analyze_applicant_1() is a fair way to filter applicants, and explain why. Do this in the ANALYSIS COMPARISON area of lab.py.

You may want to run analyze_many_apps() several times in order to get a feel for things when composing your answer to this question.

Now that we are able to analyze a single algorithm for filtering applicants, we can use it to analyze the other analyze_applicant() functions.

Activity 10

Modify your code from Activity 8 so that instead of analyze_applicant_1(), it uses analyze_applicant_2(), analyze_applicant_3(), analyze_applicant_4(), our_analyze_applicant(), and their_analyze_applicant(). Repeat Activity 9 for each of these different analysis methods. For each of these, you should record the acceptance rate, whether or not you think the filter method is fair, and why.

We’ve done a lot of analysis now. Which of these algorithms do you think is the best one?

Activity 11

Which of these algorithms do you think is the best? Keep in mind that we care both about who gets recommended by the algorithm, but also how many. Remember, recommending more applicants dramatically increases the human labor required for PU to read applications manually.

Final Reflection

Activity 12

With your partner, read this article:

Amazon scraps secret AI recruiting tool that showed bias against women

Can you think of any ways in which your recommended algorithm for filtering applicants could show unjust bias against certain groups of individuals? Write a few sentences about why or why not in the ACTIVITY 12 area of lab.py.

One member of your group should submit the file lab.py in the Lab 3 assignment on Gradescope. Make sure to add your partner to the submission! You don’t need to submit the applications.py file. You also don’t need to worry about anything with the autograder.