Machine problem: Unix utilities (Warmup)

Objectives

  1. Warm up with some C/Unix programming
  2. Understand how to retrieve and submit source code via GitHub
  3. Learn how to use Vagrant and/or Docker to set up a consistent development environment for this and future machine problems

Overview

You will be implementing basic versions of three standard Unix utilities, whose behavior is described below. First, though, you must retrieve the starter codebase and set up your development environment.

Code repository

Each machine problem will require you to access and clone a separate, private Git repository containing the starter codebase, implement and test your solution on your own machine, then push your changes so we can evaluate them.

You will claim your private repository using a GitHub invitation link -- these will always be found next to the assignment writeup links on the course website. For this assignment, the invitation link is https://classroom.github.com/a/W8NBtWN5. After following the link, you'll be prompted to create an account on GitHub (or sign in, if you already have one) and pick your IIT username from a list to accept the assignment. GitHub will then clone the starter code into a private repository and take you to a URL that looks something like this (where USER is your own GitHub username):

https://github.com/cs450os/mp-warmup-USER

This is your repository's homepage on GitHub. You can always come back here to see the status of your work as reflected on GitHub. To submit work you will push your committed changes to this repository, and we will pull them for grading. Remember, if your work isn't here we can't see it!

Next, you should clone this repository on the computer where you'll be doing your work. On your repository's GitHub page, you should see a green button labeled "Clone or download". Click it to obtain the repository URL. You can then use the Git client of your choice (command line or graphical) to clone it locally.

Before continuing, take a moment to edit the "AUTHOR" file in your repository -- in it you will find an honor pledge that you should sign with your information and today's date.

In order to work on your code, you need to first set up a development environment consistent with the one we'll use to test your submissions.

Setting up your development environment

There are a number of tools that you can use to set up your development environment. All of them are widely used in industry, and will be useful additions to your toolkit:

Linux / Mac instructions

For Linux and Mac (macOS 10.14 and newer), we recommend a purely Docker-based workflow. Start by installing Docker (instructions for Linux and Mac).

After installing Docker, open a terminal and cd into the repository you cloned earlier and run the following command:

docker run -itv $PWD:/mp cs450/xv6 bash

This command runs an interactive shell in a container based on the image "cs450/xv6" (which it will download first from Docker Hub, if necessary), and mirrors the contents of the current directory as /mp within the container. If successful, you will get a prompt like this:

root@3e1c6cbe4f79:/#

From there, you can cd mp to get to your repository within the container, and you're ready to build and test! You can edit source code from within the container (we installed a bunch of useful tools for you), or you can edit them in an external IDE -- any changes you make in your host filesystem to the repository will be reflected within the container.

You can exit from the container and start a new one anytime with the docker run command above.

Windows instructions

For Windows, we recommend a VirtualBox/Vagrant/Docker based solution. If you're running Windows 10 and are willing to deal with some potential hiccups you can try the Docker instructions for Linux/Mac above -- otherwise, continue reading.

First, install VirtualBox and install Vagrant. We'll use both of these tools.

Next, open a command prompt and cd into the repository you cloned earlier and run the following command:

vagrant up

This will download and fire up an Ubuntu Linux "box" in VirtualBox, install Docker on it, then download an image used to create containers for working on class assignments. This will likely take a while. If this step hangs indefinitely or you see an error mentioning the unavailability of "VT-x", you may need to turn on virtualization acceleration in your BIOS before trying again. See this link for instructions.

When the command completes, try running vagrant status --- it should produce the following output:

Current machine states:

default                   running (virtualbox)

The VM is running. To stop this VM, you can run `vagrant halt` to
shut it down forcefully, or you can run `vagrant suspend` to simply
suspend the virtual machine. In either case, to restart it again,
simply run `vagrant up`.

This means the virtual machine is now running. You can connect to it using the command vagrant ssh, which will result in the following:

Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-64-generic x86_64)
...

vagrant@ubuntu-focal:/vagrant$

You're now logged into the virtual machine. Now you can run the command:

docker run -itv /vagrant:/mp cs450/xv6 bash

This command runs an interactive shell in a container based on the image "cs450/xv6" (which was downloaded in the previous step), and mirrors the contents of the current directory as /mp within the container. If successful, you will get a prompt like this:

root@3e1c6cbe4f79:/#

From there, you can cd mp to get to your repository within the container, and you're ready to build and test! You can edit source code from within the container (we installed a bunch of useful tools for you), or you can edit them in an external IDE -- any changes you make in your host filesystem to the repository will be reflected within the container.

You can exit the container and virtual machine anytime you wish. The virtual machine will continue running in the background, though, unless you explicitly halt it or restart the host system.

You can halt the virtual machine with the command vagrant halt. If you ever need to restart the virtual machine, run vagrant up again. To get to the container prompt, run vagrant ssh followed by the full docker run command given above.


If you run into build issues on a Windows host that mention the '\r' character, it's likely due to Windows line endings (\r\n). You can fix this by forcing Linux style line endings (\n). Start by running the following commands in your repository:

git config core.eol lf
git config core.autocrlf false

Next, you’ll need to delete and checkout all the files again so that you have the correct line endings. Do this:

git rm --cached -r .
git reset --hard

Implementing UNIX utilities

tr

The tr ("translate") utility, per the manual page, "copies the standard input to the standard output with substitution or deletion of selected characters." It is convenient in situations where we'd like to convert between line ending characters, lower/uppercase text, delete extraneous characters, etc.

When invoking tr, we can provide it with two strings of equal length. The first string is the list of characters to replace, and the second is the list of characters to replace them with.

Here's a typical interaction --- notice that because tr uses standard input and the command line buffers input by line, after invoking the utility it translates input on a line-by-line basis. (The line starting with '$' is the command prompt and entered command; this is followed by alternating lines of input and output text.)

$ tr abc 123
abracadabra
12r131d12r1
A man a plan a canal
A m1n 1 pl1n 1 31n1l

To end input we use the ^D (Ctrl-D) keypress, which sends an end-of-file (EOF) character to tr.

Here's another interaction where we use the -d flag to indicate that we want to delete the characters in the string from the input.

$ tr -d abc
abracadabra
rdr
a man a plan a canal
 mn  pln  nl

When we want to use tr to process the contents of a file, we typically do so using a shell feature known as I/O redirection. Suppose we have a file named "test.txt" with the following data:

apples,bananas,cats
this is not a fruit

We can run tr on it as follows:

$ tr ', ' ' -' < test.txt
apples bananas cats
this-is-not-a-fruit

The '<' character indicates that the shell should take the contents of the named file ("test.txt") and use it as standard input to tr. Also note that the single quotes used around the strings at the command line allow us to include spaces (and other special characters) in the replacement/substitution strings --- the quotes themselves are not sent as part of the command line arguments to the program.

Below we use tr on the same file, with the -d option:

tr -d ', ' < test.txt
applesbananascats
thisisnotafruit

zip

zip is a compression utility. The actual Unix zip utility supports a number of different compression algorithms, but we'll be using a very simple form of compression known as run-length encoding (described below). zip will take a filename when invoked and output the compressed version of that file to standard output.

Because the output of zip is not intended to be human readable, we use I/O redirection again to send the compressed output to a file. Here's how we might use zip to compress the contents of the file "test.txt" into "test.zip".

$ zip test.txt > test.zip

The '>', in this case, tells the shell to send the standard output of zip into the named file on the right.

The run-length compression algorithm works by simply scanning for identical adjacent bytes in the input file and printing just a single copy to the output preceded by a count. For instance, if the input is as follows:

aaaaaaaaaaaaaaaaaaaabbbbbbbbbbcccccddde

Run-length encoding would nominally output:

20a10b5c3d1e

Critically, however, since we need to be able to read and decode the compressed output (say, to obtain the original uncompressed version), the encoder will consistently print out each count as a 4-byte integer. This means that while the input to zip may be ASCII (and therefore human-readable), its output will not be. We can use another Unix utility -- od ("octal dump") -- to view the contents of non-human-readable (aka. binary) files. Assuming the sample input above (aaaaaaaaaaaaaaaaaaaabbbbbbbbbbcccccddde) is saved in a file named "test.txt", here's a sample interaction.

$ od -t x1z test.txt
0000000 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61  >aaaaaaaaaaaaaaaa<
0000020 61 61 61 61 62 62 62 62 62 62 62 62 62 62 63 63  >aaaabbbbbbbbbbcc<
0000040 63 63 63 64 64 64 65 0a                          >cccddde.<
0000050

$ zip test.txt  > test.zip

$ od -t x1z test.zip
0000000 14 00 00 00 61 0a 00 00 00 62 05 00 00 00 63 03  >....a....b....c.<
0000020 00 00 00 64 01 00 00 00 65 01 00 00 00 0a        >...d....e.....<
0000036

We start by viewing the contents of "test.txt" using od (read the manual page for od for an explanation of the flags we use). This tells us that the ASCII codes for a, b, c, ... are 61, 62, 63, .... We also see 0a at the end of the file, which is the newline character.

After zip-ping the file, od shows us that the run-length encoded version consists of 30 total bytes. Each 5-byte sequence consists of a 4-byte integer (encoded in little-endian) followed by a 1-byte ASCII code from the uncompressed file. All values are shown in hex (e.g., 0x14 is decimal 20).

Because of the 4-byte integer encoding, the maximum count value that can be written is 4,294,967,296. While this is theoretically a problem, you don't need to worry about it for the assignment (it can also be easily solved by separating over-long runs of identical bytes into separate run-length encodings).

unzip

unzip is invoked with the filename of a file compressed by zip, and prints out the uncompressed version to standard output.

Given the output file "test.zip" from the previous example, here's unzip in action:

$ unzip test.zip
aaaaaaaaaaaaaaaaaaaabbbbbbbbbbcccccddde

Implementation Details

Your implementations of the tr, zip, and unzip utilities described above will go into the source files "mytr.c", "myzip.c", and "myunzip.c" (which compile to the binaries mytr, myzip, and myunzip) found in the source repository. You should not create any additional source files or external dependencies, as our grading script will not copy those (and your program will fail to build/run).

The working specifications of the three utilities are presented in the previous section, but there are some details / edge cases to consider:

  1. When the commands are invoked without any arguments or the incorrect number of arguments, they should print usage information and exit with error code 1. The correct usage output is already included in the provided skeleton code.

  2. If mytr is given replacement and substitution strings of different lengths, it should print the error message "STRING1 and STRING2 must have the same length" and exit with error code 1.

  3. If the specified file doesn't exist (or can't be opened for another reason), both myzip and myunzip should print an error and exit with error code 1.

  4. When the utilities are invoked with valid arguments and run to completion, they should terminate with exit code 0.

I/O and String library functions

A number of standard library functions should prove helpful in your implementation. First, a list of them (below their required header files) for easy reference:

#include <stdio.h>

int    printf(char *format, ...);

FILE  *fopen(char *path, char *mode);
int    fclose(FILE *stream);

int    fgetc(FILE *stream);
int    fputc(int c, FILE *stream);

size_t fread(void *ptr, size_t size, size_t nitems, FILE *stream);
size_t fwrite(void *ptr, size_t size, size_t nitems, FILE *stream);


#include <string.h>

int    strcmp(char *s1, char *s2);
size_t strlen(char *s);

To look up the manual page for a function, use the command man 3 FUNC_NAME. The 3 refers to section 3 of the manual pages, dedicated to library functions. (Section 2 is for system calls, which will come in handy later in the semester.) We'll give you a brief overview of the functions, but you have plenty of manual-page reading ahead of you --- best get started soon!

Testing and Evaluation

Remember that all your testing should be done within the container you set up in the first part of this lab! If you don't do this, it's possible differences between our testing environment and your development environment will make your code fail (this is especially true of later assignments), and you won't get any credit for your work at all!

Build the executables using the default Makefile target --- i.e., by just typing make. This will generate the "mytr", "myzip", and "myunzip" files. You can run them manually with the commands ./mytr, ./myzip, and ./myunzip (the ./ means to look in the current directory for the named executable).

A test script is provided that runs 16 different tests defined in the "tests/" subdirectory. Each test is defined by at least five files, where the filename is the numerical identifier for the test, and the extension is one of "desc", "run", "out", "rc", "err" --- the contents of these files are described below:

To run the test suite, simply run the command make test. The first 6 tests are for "mytr", and the next 10 are divided evenly between "myzip" and "myunzip". If they all succeed, you'll see:

test 1: passed
test 2: passed
test 3: passed
test 4: passed
test 5: passed
test 6: passed
test 7: passed
test 8: passed
test 9: passed
test 10: passed
test 11: passed
test 12: passed
test 13: passed
test 14: passed
test 15: passed
test 16: passed

If a test fails, it will stop testing at that point and print out a brief explanation of the error (and how to go about locating the correct result). If you want to run all tests without stopping at failing ones, run make test-cont.

Each test is worth 2 points; the machine problem has a maximum score of 32 points.

Submission

Make sure that you edited the "AUTHOR" file in your repository and signed the honor pledge with your student information. If you don't do this we won't be able to associate your submission with you!

To submit your work, simply commit all your changes to the "AUTHOR", "mytr.c", "myzip.c", "myunzip.c" files and push your work to your GitHub repository.