Lab: International Standard Book Numbers

This lab is adapted from an assignment by Peter-Michael Osera, Samuel Rebelsky, and Nicole Eikmeier at Grinnell College.

Reminder: you and your partner are a team! You should not move forward to one activity until you are both comfortable with the previous activity.

Introduction and Setup

In this lab activity, we’ll build on our skills with strings and functions, and combine them with new skills related to loops and lists. Along the way, we’ll introduce the map-filter-reduce paradigm for handling sets of data.

Activity 0

Create two new files in Thonny.

lab.py is where you’ll write your code. Follow this link, select all the code, and paste it into your lab.py file.
data.py contains the data that you’ll use in the final part of this assignment. Follow this link, select the entire contents of the page (try cmd + a on Mac or ctrl + a on Windows), and paste the result into the data.py file.

Please save both of these files in the same folder!

The data variable that we imported with this code is a list. In the ACTIVITY 0 area, write code that:

Prints the first 10 elements of the data variable.
Prints the length of the data variable, optionally with a nice message.

International Standard Book Numbers

Our task this week is to determine the validity of an International Standard Book Number (ISBN). An ISBN is a unique number that identifies a book. You may have seen these numbers inside or on the back of many of the books you read. ISBNs often come with barcodes that make it easy for librarians to scan them.

Figure 1: An example of an ISBN and barcode on the back of a book. This ISBN has 13 digits. To keep things simple today, we are only going to work with the ISBN-10 format, which has only 10 digits. ISBN-10 was the standard until recently, when it was determined that we needed more numbers for more books!

Anatomy of an ISBN

A 10-digit ISBN number has two parts:

9 identifier digits, which are all digits from 0 to 9.
1 check digit. The purpose of the check digit is to confirm the accuracy of the identifier digits. If the check digit does not match the identifier digits, then this may identify a typo in the 9 identifier digits. The check digit can be either a digit between 0 and 9 or the letter X.

For example, consider the ISBN 1438946685. In this ISBN, the first 9 digits 143894668 contain information about the book, while the last digit 5 is the check digit. Another possible ISBN is 486910583X, since the last digit is allowed to be an X.

Activity 1

Write a function called well_formed() to check whether an input string s has the structure of a well-formed ISBN. well_formed(s) should return True if and only if all of the following are true:

s is a string with 10 characters.
The first 9 characters of s are all digits between 0 and 9.
The last character of s is EITHER a digit between 0 and 9 or the character X.

If any of these conditions are not met, well_formed(s) should return False.

For example:

well_formed("1438946686") # True
well_formed("486910583X") # True
well_formed("1438945")    # False (wrong number of digits)
well_formed("1434X88945") # False (only last digit can be X)
well_formed("143418894H") # False (last digit must be integer or X)

Hint: You can check of a character char is a digit by checking whether char in "0123456789". Similarly, you can check whether char is a valid final digit of an ISBN by checking char in "0123456789X".

The Check Digit

The final digit of the ISBN is the check digit. The check digit is calculated according to a specific algorithm, based on the first 9 digits. Its purpose is to guard against possible typos when recording ISBN numbers. If the check digit doesn’t match the first 9 numbers, then it’s very likely that someone made a typo when recording the ISBN. In this case, we say that the ISBN is corrupted. It is likely necessary to check and correct the ISBN.

Here’s how check digits are calculated. Suppose that the first 9 digits of an ISBN are 143894668.

In left-to-right order, multiply the first digit by 10, the second digit by 9, the third digit by 8, and so on. The rightmost-digit is multiplied by 2. We then add up those terms, obtaining a sum that I’ll call s. In our example, we would do:

    s = 10×1 + 9×4 + 8×3 + 7×8 + 6×9 + 5×4 + 4×6 + 3×6 + 2×8 = 258

Next, we take this sum (call it s) and compute s modulo 11. Recall that modulo computes the remainder of the division of two numbers. You can compute s modulo 11 using the syntax s % 11. In our example, 258 % 11 = 5
We then subtract this result from 11 to obtain a result d:

    d = 11 - 5 = 6

Almost there!
- If d == 11, then the check_digit is 0.
- If d == 10, then the check_digit is `X``.
- Otherwise, the check_digit is d.

In our case, d = 6, and so check_digit has value 6. So, the full ISBN would be 1438946686, with 6 being the check_digit.

Activity 2

Write a function called check_digit() whose input is a string containing the first 9 digits of an ISBN. Its output should be the check digit, computed according to the algorithm above. The output should always be a string.

For example:

check_digit("143894668") # returns "6"
check_digit("563872373") # returns "0"
check_digit("203740588") # returns "X"

Check your function in these three cases, and include the checks in your code.

Hints:

Convert an integer to a string like this: str(6).
Given a string s = "143894668", the integer value of the ith digit of s is int(s[i]).
To do the sum in Step 1, you can use the following pattern:

    total = 0
    for i in range(9):
        total = total + int(first_9_digits[i]) * (Y-i)

Here, Y is an integer that you will need think about and fill in.

Suppose now that we’re given a string like "203740588X". We’d like to figure out whether this string is a correct ISBN, a corrupted ISBN (check digit doesn’t match), or not an ISBN at all.

Activity 3

Write a function called classify_single_input(). This function should accept a single string as input. Its output is one of three strings: "correct", "corrupted", or "not_ISBN".

An input string s is "not_ISBN" if well_formed(s) is False. If well_formed(s) is True, AND if s[9] agrees with check_digit(s[:9]), then s is "correct". Otherwise, s is "corrupted". Check your function on the following test cases:

classify_single_input("1438946686") # "correct"
classify_single_input("2037405886") # "corrupted"
classify_single_input("ISB NUMBER") # "not_ISBN"

Include these checks in your code.

Processing Batches of Data

One of the main powers of computation is to process batches of data. In this part of the lab activity, we’ll practice using the map-filter-reduce paradigm to process data. Put simply:

Figure 2: The stages of map-filter-reduce, as a sandwich. Image source

Map operations take lists of data and produce lists of the same length containing new information.
Filter operations take lists of data and produce smaller lists, based on some criterion.
Reduce operations summarize lists of data into a single value.

Suppose now that you are librarian and you have received a large batch of ISBN numbers in a data file. However, you’re not sure which ones are valid! Some of the data are actually not ISBNs at all, while others are ISBNs that may be corrupted in some way. Here’s an example of your input data:

This is just a regular list; I’ve written it vertically to make it easier to see the individual entries. Python will process it normally!

[
    "1438946686",
    "5638723730",
    "203740588X",
    "2037405886",
    "hello",
    "563ZXY3730",
    "ISB NUMBER",
    .
    .
    .
]

Some of these are valid ISBNs, some are corrupted ISBNs with incorrect check digits, and others are not ISBNs at all.

Map Operations

A map operation consists in applying the same function to each entry of a list, resulting in a new list of the same length. Let’s write a map operation to classify each ISBN in a list.

Activity 4

Write a function called classify_many_inputs(). This function should accept a list of strings as input. Its output is a list of strings the same length. The entries of this list should state whether each entry of the input list is "correct", "corrupted", or "not_ISBN".

For example:

input_nums = ["1438946686", "2037405886", "ISB NUMBER"]
classification = classify_input(input_nums) 
print(classification)

>>> ["correct", "corrupted", "not_ISBN"]

You can test your function on the data variable that we imported at the top of lab.py. Here’s a sample function call and the expected output.

print(classify_many_inputs(data[:10]))

>>> ['corrupted', 'correct', 'not_ISBN', 'correct', 'corrupted', 
     'not_ISBN', 'correct', 'corrupted', 'correct', 'correct']

Perform this check and include it in your code.

Filter Operations

A filter operation takes a list as input and creates a new list by choosing some items from that list according to a criterion. The easiest way to do this is with an if statement. Filter operations almost always result in lists that are smaller than their original inputs.

Activity 5

Write two functions: filter_correct_ISBNs() and filter_corrupted_ISBNs(). Each of these functions takes the same input as classify_many_inputs(): a list of strings.

filter_correct_ISBNs() should return a list containing only correct ISBNs (i.e. inputs s such that classify_single_input(s) is "correct").
filter_corrupted_ISBNs() should return a list containing only corrupted ISBNs (i.e. inputs s such that classify_single_input(s) is "corrupted").

You don’t need to include any checks in this part – we’ll check in the next activity.

Reduce Operations

A reduce operation takes a list as input and produces a single value, such as a number or string. A simple example of a reduce operation is computing the length of a list.

Activity 6

Write a function called summarize_ISBNs(). This function should accept a list of strings as input. It should print a message describing the number of "correct", "corrupted", and "not_ISBN" entries in the input. Here’s the expected output for the data variable that we imported earlier.

summarize_ISBNs(data)
>>> There are 678 correct ISBNs
>>> There are 146 corrupted ISBNs
>>> There are 176 entries which are not valid ISBNs

Include this check in your code file.

Activity 7

Great work! One member from your team should submit this file on Gradescope, adding the partner’s name to the submission upload.