Assignment: Scheduling

Table of Contents

Objectives

This assignment focuses on the high-level policies used by operating systems to schedule processes. In it, you are asked to perform "paper & pencil" scheduling simulations, to use an existing scheduling simulator to run some experiments, and to complete the implementation of an MLFQ simulator written in a high-level language (Python).

Your submission for this assignment should consist of a single compressed archive (zip or tar.gz) which should extract into three separate directories. The first directory will contain a PDF that has all your solutions for part 1, the second will contain the PDF writeup for part 2 along with all the configuration files you created, and the third will contain your source code and PDF writeup for part 3.

Part 1: Paper & Pencil Simulation (24 points)

Consider the following processes:

Process CPU Burst Arrival Time
P0 2 0
P1 10 1
P2 6 5
P3 3 7
  1. FCFS
    • Draw a Gantt chart illustrating the execution of these processes using the FCFS scheduling algorithm. (4 points)
    • What is the average process turnaround time? (2 points)
    • What is the average process waiting time? (2 points)
  2. Preemptive SJF
    • Repeat question 1 (all parts) using the preemptive shortest job first (SJF) scheduling algorithm.
  3. Round Robin
    • Repeat question 1 (all parts) using the round robin scheduling algorithm with a time quantum \(q=3\).
    • Note that in the event that a process completes its time quantum at the same time that another process arrives, the process just completing its quantum is queued ahead of the arriving process.

Part 2: Experimenting with a scheduling simulator (30 points)

For this part you'll use an existing process scheduling simulator to help test hypotheses about the behavior of various scheduling algorithms. The simulator you will use can be obtained at http://vip.cs.utsa.edu/simulators/guides/ps/ps_doc.html (documentation for the simulator is also found there). Here's a direct link to the zip file download.

If you need help getting the simulator up and running, there are a couple of related screencasts posted in the course Vimeo channel.

For each question in this part you should adhere to the following pattern:

  1. You'll make and briefly justify a prediction / hypothesis concerning a scheduling-related question.
  2. You'll design an experiment meant to test and prove/disprove your hypothesis
  3. Based on your (simulated) experimental results, you'll either justify or revise your original hypothesis

It's the ol' scientific method at work! You won't be graded on the correctness of your initial hypotheses, but please demonstrate some critical thought and include justification. While running your experiments, note that if you use the same random number seed across different runs the values used to drive the simulation will not vary (though they will be "randomly" generated per the distributions you specify) — this is quite convenient!

Please attach all your setup files (used as input to the simulator), and append all your writeups (containing your hypotheses, analyses, and conclusions) to the same PDF file.

  1. SJF vs. PSJF

    Both SJF and PSJF optimize for wait time, though the latter allows for preemption while doing so. The question, then, is under what conditions would one be preferable to the other?

    More specifically, what load characteristics (as described by the process configuration file) might make it so that SJF performs better than PSJF, or vice versa? Factors to consider include:

    • the average duration of CPU and I/O bursts
    • the statistical distribution of the bursts
    • the mix of CPU and I/O bound processes (you might want to read the MLFQ assignment below for a discussion of this aspect)
    • the ratio of context switch time to CPU burst length (for reference, real modern OSes require tens to hundreds of microseconds to perform a context switch, whilst timeslices are typically measured on the millisecond timescale)
  2. SJFA: tuning alpha

    Next, you are to determine the ideal values of alpha to be used with the SJFA algorithm for different combinations of CPU and I/O burst distributions. Recall that alpha is the parameter used to compute the exponential moving average. You should create trace files that specify sets of processes with:

    • varying constant CPU & I/O burst times
    • varying uniformly distributed CPU & I/O burst times
    • varying exponentially distributed CPU & I/O burst times

    What is (roughly) an ideal value for alpha in each case? Why? (If there isn't an ideal value, justify that, too).

    Remember that "constant" indicates that CPU bursts will always be of the given length, "uniform" indicates CPU bursts that are uniformly distributed over the given range, and "exponential" indicates a memoryless distribution with the specified mean.

    You will probably want to run your simulations for an extended period of time to be able to draw some experimental conclusions. Setting the "duration" in your run file to something like "constant 1000" would be a good start.

  3. Round Robin (RR): tuning q

    Using the same settings (for different probability distributions) you devised for the previous question, what are ideal values for the round-robin time quantum q? You'll want to choose a reasonable, fixed value for the context switch time across all your runs (see question 1 for a discussion of reasonable values).

Part 3: Implementing an MLFQ simulator (50 points)

For this part of the assignment you will implement a multi-level feedback queue (MLFQ) based scheduling simulator.

A (slightly dated) screencast demonstrating the salient bits of the provided template code can be viewed at http://vimeo.com/channels/cs450/37786237.

Some simplifying assumptions

First, we will somewhat simplify our understanding of how a MLFQ scheduler behaves. Recall in class that when we discussed a MLFQ approach to scheduling, it was understood that queues being drawn from could be given different slices of CPU times. E.g., given a MLFQ with 3 queues – of high, medium, and low priority – it would be possible to configure the system such that jobs from the high priority queue consume 60% of CPU time, while jobs from the medium and low priority queues consume 30% and 10%, respectively.

In the model we will use for this lab, we assume that if jobs exist on multiple queues then only jobs on the highest priority queue will be scheduled. Only when higher priority queues are "drained" will lower priority queues be drawn from.

Furthermore, it will also be assumed that every queue will be scheduled round-robin with a fixed time quantum (where each queue may be assigned a different time quantum). Recall that in a MLFQ, jobs that run to the end of their time quanta are moved "down" (if possible) to a lower priority queue, and those that are run to completion in a single queue before exhausting their quanta are moved "up".

Finally, we will also simplify the types of jobs that can enter (and be scheduled in) our system. Two categories of jobs — I/O bound and CPU bound — will be considered. I/O bound jobs, on average, execute shorter CPU bursts and relatively longer I/O bursts. This category includes the interactive processes that are typically prioritized on modern operating systems. CPU bound jobs, on the other hand, require longer CPU bursts and make less use of I/O operations.

Configuration

Your simulator will take as input the following:

  • a list of time quanta to use in the different queues that comprise the MLFQ
  • a trace file that describes the CPU and I/O bursts of running jobs

and produce the following output:

  • average wait time
  • average response time (where response time is defined as the length of time taken to complete a given CPU burst, including time spent in the ready queue)
  • average turnaround time
  • the number of queue "hops" for each job
  • the average length of each queue over the simulation

A sample trace file is show below:

2
0 0 3 8 20 8
1 1 1 10

The first line indicates that there are 2 processes to be simulated, and each of the following lines describes a single process, where:

  • the first number is the job ID (JID)
  • the second number is the job's arrival time (e.g., job 0 arrives at time 0)
  • the third number specifies the number of bursts in the process's execution trace
  • all following numbers indicate burst lengths; the trace always starts with a CPU burst, is followed by alternating IO and CPU bursts, and terminates with a CPU burst
    • e.g., job 0 has 3 bursts: 2 CPU bursts of length 8 separated by an IO burst of length 20
    • e.g., job 1 has 1 burst: a CPU burst of length 10

Use this job generator to create randomized job trace files that contain a spread of I/O and CPU bound jobs. Note that the configurable options include:

  • the random number generator seed
  • the number of jobs
  • the I/O bound skew – this value determines the fraction of jobs that will be I/O bound
  • the mean number of bursts
  • the CPU burst distribution
    • when set to "Normal", burst lengths will be distributed about the mean according to the normal distribution (with a standard deviation of 20% of the mean)
    • when set to "Exponential", burst lengths will be distributed about the mean according to the exponential (memoryless) distribution

Invocation

After building your code (if necessary), we expect to be able to run your simulator as follows:

./your_sim -q 2 4 10 -c jobs.conf

where the -q flag is followed by a variable number of time quanta to be used for the queues in the MLFQ scheduler (the highest priority queue will use the first listed quantum, the next highest priority queue the following one, and so on), and the -c flag is used to specify a configuration file.

Note that while an arbitrary number of queues is theoretically possible, your implementation need only support between 1 and 5 queues. A single-queue backed MLFQ will of course simply behave as a standard round-robin scheduler.

Python skeleton code

You are provided with a functional scheduling simulation driver written in Python, which should make your life much easier. The most recent version of the code is at https://bitbucket.org/michaelee/cs450-mlfq/. The code includes a working round-robin scheduler and an empty class stub for your own MLFQ implementation.

Note that the operation and output of the provided code should serve as a reference for your own, should you decide to implement the scheduler from scratch. A sample session follows:

> python sched_sim.py -q 5 -s rr -c jobs.conf

  JID |    Avg CPU |   Avg wait | Avg rspnse | Turnaround
---------------------------------------------------------
    0 |       8.00 |       2.50 |      10.50 |         41
    1 |      10.00 |       7.00 |      17.00 |         17
---------------------------------------------------------
  Avg |       9.00 |       4.75 |      13.75 |      29.00


  JID | # Preempts
------------------
    0 |          2
    1 |          1

Avg queue length = 0.50
Max queue length = 1.00

The "python sched_sim.py -q 5 -s rr -c jobs.conf" command runs the simulator through the Python interpreter and invokes a round-robin scheduler with quantum 5 on the jobs specified in "jobs.conf" (which contains the same specification used in the example earlier). Run the simulator with the "-h" flag for more information.

It is also possible to show verbose debugging information, as below:

> python sched_sim.py -q 5 -s rr -c jobs.conf -v

INFO: Time 0
INFO:  - Arrival of [ job 0 ; burst len = 8, total wait = 0 ]
INFO:  - Scheduling [ job 0 ; burst len = 8, total wait = 0 ] with q=5
INFO: Time 1
INFO:  - Arrival of [ job 1 ; burst len = 10, total wait = 0 ]
INFO: Time 5
INFO:  - Job 0 quantum expired
INFO:  - Scheduling [ job 1 ; burst len = 10, total wait = 4 ] with q=5
INFO: Time 10
INFO:  - Job 1 quantum expired
INFO:  - Scheduling [ job 0 ; burst len = 3, total wait = 5 ] with q=5
INFO: Time 13
INFO:  - Job 0 CPU burst complete
INFO:  - Blocking job 0 for 20
INFO:  - Scheduling [ job 1 ; burst len = 5, total wait = 7 ] with q=5
INFO: Time 18
INFO:  - Job 1 terminated
INFO: Time 33
INFO:  - Job 0 completed I/O
INFO:  - Scheduling [ job 0 ; burst len = 8, total wait = 5 ] with q=5
INFO: Time 38
INFO:  - Job 0 quantum expired
INFO:  - Scheduling [ job 0 ; burst len = 3, total wait = 5 ] with q=5
INFO: Time 41
INFO:  - Job 0 terminated

  JID |    Avg CPU |   Avg wait | Avg rspnse | Turnaround
---------------------------------------------------------
    0 |       8.00 |       2.50 |      10.50 |         41
    1 |      10.00 |       7.00 |      17.00 |         17
---------------------------------------------------------
  Avg |       9.00 |       4.75 |      13.75 |      29.00


  JID | # Preempts
------------------
    0 |          2
    1 |          1

Avg queue length = 0.50
Max queue length = 1.00

Before starting your implementation, you should at a minimum read through the round-robin scheduler implementation (found in "rr.py"). The simulator code in "sched_sim.py" may be helpful, though understanding all of it is optional.

Implementation details

As explained earlier, your implementation will need to run through all the jobs in the configuration file in order to produce the following output:

  • average wait time
  • average response time (where response time is defined as the length of time taken to complete a given CPU burst, including time spent in the ready queue)
  • average turnaround time
  • the number of queue switches for a given job
  • the average length of each queue (as seen by incoming jobs)

The first three metrics are automatically computed for you by the simulator, but the last two must be tracked and computed in the MLFQ implementation (its "print_report" method is invoked at the end of the simulation to output the final statistics). A run with 5 jobs and 3 queues might produce output as follows:

> python sched_sim.py -q 5 10 15 -c demo.conf -s mlfq

  JID |    Avg CPU |   Avg wait | Avg rspnse | Turnaround
---------------------------------------------------------
    0 |       3.50 |       4.75 |       8.25 |        232
    1 |       3.32 |       1.21 |       4.54 |       1116
    2 |       4.15 |       0.27 |       4.42 |       3818
    3 |      10.50 |       8.00 |      18.50 |        137
    4 |      15.23 |       2.57 |      17.80 |       2300
---------------------------------------------------------
  Avg |       7.34 |       3.36 |      10.70 |    1520.60


  JID | # Switches
------------------
    0 |          2
    1 |         11
    2 |         30
    3 |          1
    4 |         49

Queue | Avg length
------------------
    0 |       0.06
    1 |       0.11
    2 |       0.02

When jobs first arrive in the system, the MLFQ scheduler will initially schedule their (first) CPU bursts in the top-level queue. If they fail to complete in that queue's time quantum, they will be moved down to the next queue. Moving back up is contingent on completing an entire burst (important: not just the portion of a burst remaining after being preempted!) within the current queue's quantum. When a job completes its subsequent I/O burst and makes it back into the ready state, the scheduler should place it in the last queue it was scheduled in.

It is possible that a job currently scheduled from a low priority queue may be running when a job becomes ready and enters a higher priority queue. In such an event, your scheduler should preempt the lower priority job so as to execute the newly ready, higher priority job. Note that this is enabled in the provided simulator via the scheduler's "needs_resched" and "job_preempted" methods (see the stubbed out "mlfq.py" file for more details).

Important: note that while the scheduler may (and should) maintain a history of past events and process statuses, it should not be cognizant of any future events (in particular, the scheduler shouldn't know of upcoming burst durations).

Analysis & Improvement

After you've completed the implementation of your MLFQ scheduler, we can finally start testing out some hypotheses!

  1. For starters, try to determine a good queue setup (i.e., in the number of queues and associated time quanta) for a reasonably large set of processes. We recommend that you create different job trace files using the job generator, varying the number of jobs from 10-20, with a mean number of bursts ≥ 100, and keeping the default skew. Generate traces using both normal and exponential distributions. Use the computed average CPU burst lengths in the simulator's output to help you come up with reasonable queue setups.

    At a minimum, try MLFQ configurations with 2-5 different queues using varying time quanta. Methodically track your configurations and associated results in a spreadsheet, and generate graphs (using your tool of choice) to visualize your results. Below is an example of a graph for one trace file created in Excel (note: the data points are made up!).

    mlfq-graph1.png

    If you need help with the graphing, you can download a Microsoft Excel document (here in untested Office 97 format) containing all the made up graphs I created for this writeup.

    Briefly summarize your findings by answering the following questions (based on your graphed results):

    • What is the best general setup (# queues and quanta), and why?
    • Does increasing the number of queues help? how and why?
    • What effect does changing the burst distribution have on your results? why?
  2. Now that you know how best to make use of your MLFQ implementation, the next step is to compare the MLFQ scheduler to a basic round-robin (RR) scheduler. In particular, it would be interesting to see how different mixes of I/O and CPU bound processes affect their behavior.

    Keeping the other factors constant, create job trace files with I/O bound skew values ranging from 0.1-0.9. Using a reasonable quantum for the RR scheduler (the average CPU burst length is a good start), run your MLFQ and RR schedulers on the job traces to obtain average response and turnaround times. Graph your results so as to easily compare the schedulers over the range of skew values. Your plots should look something like this (again, data points are made up):

    mlfq-graph2.png

    You may also wish to include graphs that describe the difference in responses times for one or more trace files, such as the following:

    mlfq-graph3.png

    Summarize your findings by answering the following questions:

    • In general, how does the MLFQ scheduler behave differently from RR with respect to I/O and CPU bound jobs in the traces?
    • How do MLFQ and RR compare over the different skew values? In particular, under what situations do they differ significantly? Why?
  3. Now for some fun! Propose a non-trivial improvement to the MLFQ scheduler that can be implemented in the simulation framework you're using. Describe what sort of results you hope to see, and why. Next, implement the proposed improvement and describe the experimental results, using graphs similar to those you already created. Were you correct? Explain.

    Here are some suggested improvements (you only need to implement one):

    • a tunable "backoff" mechanism that prohibits too-frequent queue hopping
    • a heuristic for keeping a job from being moved into a higher priority queue when its CPU burst is less than the quantum but greater than some threshold (e.g., greater than 90% of the quantum)
    • dynamically creating queues (of lower or higher priority) when too many jobs appear to be either under- or over-running the smallest and largest time quanta

Grading

Implementation

5 points
To establish a base level of correctness, your MLFQ implementation should produce the same results as the given round-robin scheduler when configured with one queue.
10 points
We will compare your results on randomized process traces with our reference implementation – while some deviation is expected (e.g., due to differences in how and when queue lengths are observed), results are expected to be close. We will inspect your implementation for logical flaws and may request an interview if serious, undocumented discrepancies arise.
5 points
Your implementation should be modular and clear, and your MLFQ scheduler should be decoupled from the simulator. (This comes mostly for free if you use the provided template code.)

Analysis & Improvement

10 points for each part
For the first two questions in the analysis section, your approach and writeup should be clear, logical, and ultimately justified by the graphs you include. For question 3, your improvement should be functional, and your conclusion should once again be logical and motivated by the results you gather. In all cases we should be able to reproduce your results (so do include the configuration settings you used for the job generator and your own simulator sessions).

Submission

Your submission will consist of your packaged source code and your analysis writeup in a single PDF document. You should make it easy for us to turn on/off improvements you've made for part 3 of the analysis section (e.g., using a command line argument) — this should be clearly documented in your writeup. So that we can easily test your work, do not change any of the command line arguments supported by the original driver program!