x86 & xv6 overview

CS 450: Operating Systems
Michael Lee <lee@iit.edu>
Agenda

- Motivation
- x86 ISA
- PC architecture
- UNIX
- xv6
Motivation

- OS relies on many low-level hardware mechanisms to do its job
- To work on an OS kernel, we must be intimately familiar with the underlying ISA and PC hardware
  - Hardware may dictate what is or isn’t possible, and influence how we represent and manage system-level structures
- We focus on x86, but all modern ISAs support the mechanisms we need
  - e.g., xv6 has been ported to ARM already
\$ x86 \$
Documentation

- Intel IA-32 Software Developer’s Manuals are complete references
- Volume 1: Architectural Overview
- Volume 2: Instruction Set Reference
- Volume 3: Systems Programming Guide
- Many diagrams in slides taken from them
x86 coverage

- Timeline
- Syntax
- Registers
- Instruction operands
- Instructions and sample usage
- Processor modes
- Interrupt & Exception handling
Timeline

- **1978**: Intel released 8086, a 16-bit CPU
- **1982**: 80186 and 80286 (still 16-bit)
- **1985**: 80386 was the first 32-bit x86 CPU (aka i386/IA-32)
- **2000**: AMD created x86-64: 64-bit ISA compatible with x86
- **2001**: Intel released IA-64 “Itanium” ISA, *incompatible* with x86
  - End-of-life announced in 2019 (i.e., official failure)
x86 ISA

- xv6 uses the IA-32 ISA
  - But we can still build/run it on x86-64!
- x86 is a CISC ISA, so we have:
  - Memory operands for non-load/store instructions
  - Complex addressing modes
  - Relatively large number of instructions
Syntax / Formatting

- Two common variants: Intel and AT&T syntax
  - *Intel syntax* common in Windows world
    - e.g., `mov DWORD PTR [ebp-4], 10 ; format: OP DST, SRC`
  - *AT&T syntax* common in UNIX world (default GCC output)
    - e.g., `movl $10, -4(%ebp) # format: OP SRC, DST`
- We will use this syntax
Registers

- 8 general-purpose registers
- 6 segment registers for addressing
- Status & Control register
- Program counter / Instruction pointer
- (Many others — including control registers — coming up later)
General purpose registers

- Can be directly manipulated, but some have special applications
- Most can be accessed as full 32-bit values, or as 16/8-bit subvalues
- Each register is, by convention, *volatile* or *non-volatile*
  - A *volatile* register may be clobbered by a function call; i.e., its value should be saved — maybe on the stack — if it must be preserved
  - A *non-volatile* register is preserved (by callees) across function calls
The special uses of general-purpose registers by instructions are described in Chapter 5, "Instruction Set Summary," in this volume. See also: Chapter 3, Chapter 4 and Chapter 5 of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes 2A, 2B & 2C.

The following is a summary of special uses:

- **EAX** — Accumulator for operands and results data
- **EBX** — Pointer to data in the DS segment
- **ECX** — Counter for string and loop operations
- **EDX** — I/O pointer
- **ESI** — Pointer to data in the segment pointed to by the DS register; source pointer for string operations
- **EDI** — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations
- **ESP** — Stack pointer (in the SS segment)
- **EBP** — Pointer to data on the stack (in the SS segment)

As shown in Figure 3-5, the lower 16 bits of the general-purpose registers map directly to the register set found in the 8086 and Intel 286 processors and can be referenced with the names AX, BX, CX, DX, BP, SI, DI, and SP. Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes).

### 3.4.1.1 General-Purpose Registers in 64-Bit Mode

In 64-bit mode, there are 16 general-purpose registers and the default operand size is 32 bits. However, general-purpose registers are able to work with either 32-bit or 64-bit operands. If a 32-bit operand size is specified: EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D are available. If a 64-bit operand size is specified: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R15 are available. R8D-R15D/R8-R15 represent eight new general-purpose registers. All of these registers can be accessed at the byte, word, dword, and qword level. REX prefixes are used to generate 64-bit operand sizes or to reference registers R8-R15.

Registers only available in 64-bit mode (R8-R15 and XMM8-XMM15) are preserved across transitions from 64-bit mode into compatibility mode then back into 64-bit mode. However, values of R8-R15 and XMM8-XMM15 are undefined after transitions from 64-bit mode through compatibility mode to legacy or real mode and then back through compatibility mode to 64-bit mode.

#### General-Purpose Registers

<table>
<thead>
<tr>
<th>Register</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>%eax</td>
<td>Return value</td>
</tr>
<tr>
<td>%ebx</td>
<td>—</td>
</tr>
<tr>
<td>%ecx</td>
<td>Counter</td>
</tr>
<tr>
<td>%edx</td>
<td>—</td>
</tr>
<tr>
<td>%ebp</td>
<td>Frame/Base pointer</td>
</tr>
<tr>
<td>%esi</td>
<td>Source index (for arrays)</td>
</tr>
<tr>
<td>%edi</td>
<td>Destination index (for arrays)</td>
</tr>
<tr>
<td>%esp</td>
<td>Stack pointer</td>
</tr>
</tbody>
</table>

%eax, %ecx, %edx are volatile registers
# Instruction Operands

<table>
<thead>
<tr>
<th>Mode</th>
<th>Example(s)</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>Immediate</td>
<td>$0x42, $0xd00d</td>
<td>Literal value</td>
</tr>
<tr>
<td>Register</td>
<td>%eax, %esp</td>
<td>Value found in register</td>
</tr>
<tr>
<td>Direct</td>
<td>0x4001000</td>
<td>Value found in address</td>
</tr>
<tr>
<td>Indirect</td>
<td>(%esp)</td>
<td>Value found at address in register</td>
</tr>
<tr>
<td>Base-Displacement</td>
<td>8(%esp), 8(%esp), 8(%esp, %esi, 4)</td>
<td>Given D(B), value found at address D+B (i.e., address in base register B + numeric offset D)</td>
</tr>
</tbody>
</table>
| Scaled Index  | 8(%esp,%esi,4)  | Given D(B, I, S), value found at address D+B+I×S  
S ∈ {1, 2, 4, 8}; D and I default to 0 if left out, S defaults to 1 |

**Memory references**
Instructions

- Instructions have 0-3 operands
- For many 2 operand instructions, one operand is both read and written
  - e.g., `addl $1, %eax  # %eax = %eax + 1`
- Instruction suffix indicates width of operands (l/w/b → 32/16/8 bits)
- Arithmetic operations populate EFLAGS register bits, including ZF (zero result), SF (signed/neg result), CF (carry-out of MSB occurred), OF (overflow occurred)
- Used by subsequent conditional instructions (e.g., jump if result = zero)
As the IA-32 Architecture has evolved, flags have been added to the EFLAGS register, but the function and placement of existing flags have remained the same from one family of the IA-32 processors to the next. As a result, code that accesses or modifies these flags for one family of IA-32 processors works as expected when run on later families of processors.

### 3.4.3.1 Status Flags

The status flags (bits 0, 2, 4, 6, 7, and 11) of the EFLAGS register indicate the results of arithmetic instructions, such as the ADD, SUB, MUL, and DIV instructions. The status flag functions are:

- **CF (bit 0)** Carry flag
  - Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise. This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic.

- **PF (bit 2)** Parity flag
  - Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise.

- **AF (bit 4)** Auxiliary Carry flag
  - Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic.

- **ZF (bit 6)** Zero flag
  - Set if the result is zero; cleared otherwise.

- **SF (bit 7)** Sign flag
  - Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)

- **OF (bit 11)** Overflow flag
  - Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two's complement) arithmetic.

Of these status flags, only the CF flag can be modified directly, using the STC, CLC, and CMC instructions. Also the bit instructions (BT, BTS, BTR, and BTC) copy a specified bit into the CF flag.

---

**Figure 3-8. EFLAGS Register**

X ID Flag (ID)  
X Virtual Interrupt Pending (VIP)  
X Virtual Interrupt Flag (VIF)  
X Alignment Check / Access Control (AC)  
X Virtual-8086 Mode (VM)  
X Resume Flag (RF)  
X Nested Task (NT)  
X I/O Privilege Level (IOPL)  
S Overflow Flag (OF)  
C Direction Flag (DF)  
X Interrupt Enable Flag (IF)  
X Trap Flag (TF)  
S Sign Flag (SF)  
S Zero Flag (ZF)  
S Auxiliary Carry Flag (AF)  
S Parity Flag (PF)  
S Carry Flag (CF)  

S Indicates a Status Flag  
C Indicates a Control Flag  
X Indicates a System Flag  

- Reserved bit positions. DO NOT USE.  
- Always set to values previously read.
## Arithmetic

<table>
<thead>
<tr>
<th>Instruction(s)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>{add, sub, imul} src, dst</td>
<td>dst = dst {+, -, x} src</td>
</tr>
<tr>
<td>neg dst</td>
<td>dst = –dst</td>
</tr>
<tr>
<td>{inc, dec} dst</td>
<td>dst = dst {+, -} 1</td>
</tr>
<tr>
<td>{sal, sar, shr} src, dst</td>
<td>dst = dst {&lt;&lt;, &gt;&gt;, &gt;&gt;&gt;} src (arithmetic &amp; logical shifts)</td>
</tr>
<tr>
<td>{and, or, xor} src, dst</td>
<td>dst = dst {&amp;,</td>
</tr>
<tr>
<td>not dst</td>
<td>dst = ~dst (bitwise)</td>
</tr>
</tbody>
</table>

*src* can be an immediate, register, or memory operand; *dst* can be a register or memory operand. But at most one memory operand!
# Conditions and Branches

<table>
<thead>
<tr>
<th>Instruction(s)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>cmp</strong> <em>src</em>, <em>dst</em></td>
<td>dst – src (discard result but set flags)</td>
</tr>
<tr>
<td><strong>test</strong> <em>src</em>, <em>dst</em></td>
<td>dst &amp; src (discard result but set flags)</td>
</tr>
<tr>
<td><strong>jmp</strong> <em>target</em></td>
<td>Unconditionally jump to target (change %eip)</td>
</tr>
<tr>
<td><strong>{je, jne}</strong> <em>target</em></td>
<td>Jump to target if dst equal/not equal src (ZF=1 / ZF=0)</td>
</tr>
<tr>
<td><strong>{jl, jle}</strong> <em>target</em></td>
<td>Jump to target if dst ≤ src (SF≠OF / ZF=1 or SF≠OF)</td>
</tr>
<tr>
<td><strong>{jg, jge}</strong> <em>target</em></td>
<td>Jump to target if dst ≥ src (ZF=0 and SF=OF / SF=OF)</td>
</tr>
<tr>
<td><strong>{ja, jb}</strong> <em>target</em></td>
<td>Jump to target if dst above/below src (CF=0 and ZF=0 / CF=1)</td>
</tr>
</tbody>
</table>

*Target* is usually an address encoded as an immediate operand (e.g., jmp $0x4001000), but addresses may be stored in a register or memory, in which case *indirect addressing* is required, which uses the * symbol. E.g., jmp *%eax (jump to address in %eax), jmp *0x4001000 (jump to address found at address 0x4001000)
E.g., basic control structures

```c
if (cond) {
    // if-clause
} else {
    // else-clause
}
...
```

```assembly
testl %eax, %eax  # $eax = cond
ej ELS
je ENDIF
jmp ENDIF
```

```assembly
ELSE:
    # else-clause
ENDIF:
    # ...
```

```c
while (cond) {
    // loop-body
}
...
```

```assembly
testl %eax, %eax  # $eax = cond
je ENDLOOP
jf ENDLOOP
```

```assembly
 LOOP:
    # loop-body
    testl %eax, %eax
    jne LOOP
ENDLOOP:
    # ...
```
Data movement

<table>
<thead>
<tr>
<th>Instruction(s)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>mov src, dst</td>
<td>Copy data from src to dst (memory → memory moves not possible)</td>
</tr>
<tr>
<td>movzbl src, dst</td>
<td>Copy 8-bit value to 32-bit target (&amp; other variants), using zero-fill</td>
</tr>
<tr>
<td>movsbl src, dst</td>
<td>Copy 8-bit value to 32-bit target (&amp; other variants), using sign-extension</td>
</tr>
<tr>
<td>{cmove/ne} src, dst</td>
<td>Move data from src to dst if ZF=1 / ZF=0</td>
</tr>
<tr>
<td>{cmovg/ge/l/le/a/b/...}</td>
<td>Conditionally move data from src to dst (per jump naming conventions)</td>
</tr>
</tbody>
</table>

Address computation

| lea address, dst     | dst = address (no memory access! just computes value of address)             |
## Functions and Call stack

<table>
<thead>
<tr>
<th>Instruction(s)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>push \textit{src}</td>
<td>Push src onto stack</td>
</tr>
<tr>
<td>pop \textit{dst}</td>
<td>Pop top of stack into dst</td>
</tr>
<tr>
<td>call \textit{target}</td>
<td>Push current %eip (address of instruction after call) onto stack, jump to target</td>
</tr>
<tr>
<td>\textit{leave}</td>
<td>Restore frame pointer (%ebp) and clears stack frame</td>
</tr>
<tr>
<td>\textit{ret}</td>
<td>Pop top of stack into %eip</td>
</tr>
</tbody>
</table>

All instructions above implicitly adjust %esp and access the stack.

\textit{target} may use \textit{indirect addressing} as well, e.g., call *%eax (call function whose address is in %eax)
Function calls

- Functions make extensive use of the call stack — leads to convention-driven prologue and epilogue blocks in assembly code

- Typical function prologue:
  - Save old frame pointer and establish new frame pointer
  - Save non-volatile register values we might clobber (“callee-saved”)
  - Load needed parameters from prior stack frame
  - Allocate stack space for any local data
Function calls

- Typical function epilogue:
  - Place return value in %eax
  - Deallocate any space used for local data
  - Restore/Pop any clobbered non-volatile register values
  - Restore/Pop old frame pointer
  - Return
Function calls (Optimization)

- Many of these steps may be optimized (simplified or neglected altogether) by the compiler!
- Prefer registers to stack-based args or local vars (regs vs. memory)
- %%esp doesn’t always reflect the top of the stack (only need to do this if calling another function)
- lea often used in surprising ways (addressing modes as arithmetic)
Call Stack

- Maintains dynamic state and context of executing program
- Saved frame pointers (previous values of %ebp) create a chain of stack frames
- Useful to navigate for debugging and tracing! (e.g., gdb “backtrace”)
E.g., function calls

```c
int main() {
    int x=10, y=20;
    sum(x, y);
    return 0;
}

int sum(int a, int b) {
    int ret = a + b;
    return ret;
}
```

```asm
sum: # unoptimized
pushl %ebp
movl %esp, %ebp
movl %edi, -4(%ebp)
movl %esi, -8(%ebp)
movl -4(%ebp), %edi
movl -8(%ebp), %esi
call sum
movl -12(%ebp), %eax
popl %ebp
ret

sum: # optimized
leal (%edi,%esi), %eax
ret
```
Processor modes

- When an x86 system first boots up, it runs in **16-bit real mode** (8086 compatible) — all addresses reference “real” memory locations

- **16/32-bit protected modes** add privilege levels, virtual memory, and other mechanisms useful to the OS (e.g., for multitasking)

- **64-bit long mode** removes some instructions and adds 64-bit registers and addressing
Real mode addressing

- Only 16-bit registers, but support for **20-bit** addresses (1MB address space) through the use of segment registers: CS, DS, ES, SS

- Left-shift segment number by 4 (i.e., $\times 16$) to obtain base address, and add to offset to compute 20-bit physical address

- Code (via IP) and Stack (via SP and BP) accesses automatically use CS (code segment) and SS (stack segment) to compute addresses

- e.g., if IP=0x4000 and CS=0x1100, $\text{CS:IP}$ refers to physical address $0x1100 \times 16 + 0x4000 = 0x15000$
Protected mode

- Privileged instructions are only available in supervisor mode

- CPL flag is found in the CS register — will cover mechanism for updating CPL later

- Segment registers (expanded to CS, DS, SS, ES, FS, GS) no longer hold base addresses, but selectors

- Selectors are used to load segment descriptors from a descriptor table which describe location/size/status/etc. of segments
If paging is not used, the processor maps the linear address directly to a physical address (that is, the linear address goes out on the processor's address bus). If the linear address space is paged, a second level of address translation is used to translate the linear address into a physical address. See also: Chapter 4, "Paging."

3.4.1 Logical Address Translation in IA-32e Mode

In IA-32e mode, an Intel 64 processor uses the steps described above to translate a logical address to a linear address. In 64-bit mode, the offset and base address of the segment are 64-bits instead of 32 bits. The linear address format is also 64 bits wide and is subject to the canonical form requirement.

Each code segment descriptor provides an L bit. This bit allows a code segment to execute 64-bit code or legacy 32-bit code by code segment.

3.4.2 Segment Selectors

A segment selector is a 16-bit identifier for a segment (see Figure 3-6). It does not point directly to the segment, but instead points to the segment descriptor that defines the segment. A segment selector contains the following items:

- **Index (Bits 3 through 15)** — Selects one of 8192 descriptors in the GDT or LDT. The processor multiplies the index value by 8 (the number of bytes in a segment descriptor) and adds the result to the base address of the GDT or LDT (from the GDTR or LDTR register, respectively).
- **TI (table indicator) flag (Bit 2)** — Specifies the descriptor table to use: clearing this flag selects the GDT; setting this flag selects the current LDT.

Figure 3-5. Logical Address to Linear Address Translation

Figure 3-6. Segment Selector

<table>
<thead>
<tr>
<th>Offset (Effective Address)</th>
<th>Linear Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>31(63)</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Segment Descriptor

Descriptor Table

Logical Address

Base Address

Index

Table Indicator

0 = GDT

1 = LDT

Requested Privilege Level (RPL)
Segmentation

- Segment descriptors allow arbitrarily complex memory mapping and access control (e.g., restricted access), among other things

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Base 31:24</td>
<td>Segment base address</td>
</tr>
<tr>
<td>G</td>
<td>Granularity</td>
</tr>
<tr>
<td>D / S</td>
<td>Default operation size (0 = 16-bit segment; 1 = 32-bit segment)</td>
</tr>
<tr>
<td>AVL</td>
<td>Segment base address</td>
</tr>
<tr>
<td>Seg. Limit 19:16</td>
<td>Segment limit</td>
</tr>
<tr>
<td>P</td>
<td>Descriptor privilege level</td>
</tr>
<tr>
<td>DPL</td>
<td>Descriptor type (0 = system; 1 = code or data)</td>
</tr>
<tr>
<td>S</td>
<td>Segment present</td>
</tr>
<tr>
<td>Type</td>
<td>Segment type</td>
</tr>
<tr>
<td>Base 23:16</td>
<td>Segment limit</td>
</tr>
<tr>
<td>L</td>
<td>64-bit code segment (IA-32e mode only)</td>
</tr>
<tr>
<td>AVL</td>
<td>Available for use by system software</td>
</tr>
<tr>
<td>BASE</td>
<td>Segment base address</td>
</tr>
</tbody>
</table>

Figure 3-8. Segment Descriptor
protection against some kinds of program bugs.

Figure 3-2. Flat Model
More complexity can be added to this protected flat mode to provide more protection. For example, for the paging mechanism to provide isolation between user and supervisor code and data, four segments need to be defined: code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space. This flat segmentation model along with a simple paging structure can protect the operating system from applications, and by adding a separate paging structure for each task or process, it can also protect applications from each other. Similar designs are used by several popular multitasking operating systems.

### 3.2.3 Multi-Segment Model

A multi-segment model (such as the one shown in Figure 3-4) uses the full capabilities of the segmentation mechanism to provide hardware enforced protection of code, data structures, and programs and tasks. Here, each program (or task) is given its own table of segment descriptors and its own segments. The segments can be completely private to their assigned programs or shared among programs. Access to all segments and to the execution environments of individual programs running on the system is controlled by hardware.

Figure 3-3. Protected Flat Model
Access checks can be used to protect not only against referencing an address outside the limit of a segment, but also against performing disallowed operations in certain segments. For example, since code segments are designated as read-only segments, hardware can be used to prevent writes into code segments. The access rights information created for segments can also be used to set up protection rings or levels. Protection levels can be used to protect operating-system procedures from unauthorized access by application programs.

3.2.4 Segmentation in IA-32e Mode

In IA-32e mode of Intel 64 architecture, the effects of segmentation depend on whether the processor is running in compatibility mode or 64-bit mode. In compatibility mode, segmentation functions just as it does using legacy 16-bit or 32-bit protected mode semantics.

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures.

Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

3.2.5 Paging and Segmentation

Paging can be used with any of the segmentation models described in Figures 3-2, 3-3, and 3-4. The processor’s paging mechanism divides the linear address space (into which segments are mapped) into pages (as shown in Figure 3-1). These linear-address-space pages are then mapped to pages in the physical address space. The paging mechanism offers several page-level protection facilities that can be used with or instead of the segment-based protection.
Segment descriptor tables

- Kernel is responsible for maintaining descriptor tables
  - System wide (Global)
  - Task-specific (Local)
- Must be set up before transitioning to protected mode

Each system must have one GDT defined, which may be used for all programs and tasks in the system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for each separate task being run, or some or all tasks can share the same LDT.

The GDT is not a segment itself; instead, it is a data structure in linear address space. The base linear address and limit of the GDT must be loaded into the GDTR register (see Section 2.4, "Memory-Management Registers"). The base address of the GDT should be aligned on an eight-byte boundary to yield the best processor performance. The limit value for the GDT is expressed in bytes. As with segments, the limit value is added to the base address to get the address of the last valid byte. A limit value of 0 results in exactly one valid byte. Because segment descriptors are always 8 bytes long, the GDT limit should always be one less than an integral multiple of eight (that is, $8N - 1$).

The first descriptor in the GDT is not used by the processor. A segment selector to this "null descriptor" does not generate an exception when loaded into a data-segment register (DS, ES, FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is made to access memory using the descriptor. By initializing the segment registers with this segment selector, accidental reference to unused segment registers can be guaranteed to generate an exception.

The LDT is located in a system segment of the LDT type. The GDT must contain a segment descriptor for the LDT segment. If the system supports multiple LDTs, each must have a separate segment selector and segment descriptor in the GDT. The segment descriptor for an LDT can be located anywhere in the GDT. See Section 3.5, "System Descriptor Types", for information on the LDT segment-descriptor type.

An LDT is accessed with its segment selector. To eliminate address translations when accessing the LDT, the segment selector, base linear address, limit, and access rights of the LDT are stored in the LDTR register (see Section 2.4, "Memory-Management Registers").

When the GDTR register is stored (using the SGDT instruction), a 48-bit "pseudo-descriptor" is stored in memory (see top diagram in Figure 3-11). To avoid alignment check faults in user mode (privilege level 3), the pseudo-descriptor should be located at an odd word address (that is, address MOD 4 is equal to 2). This causes the segment descriptor to be accessed as a 48-bit value in memory.
Control & System registers

- Transitioning between real & protected mode, and activating/controlling other hardware features are governed by control & system register flags.
When loading a control register, reserved bits should always be set to the values previously read. The flags in control registers are:

- **Cache Disable (bit 30 of CR0)** — Enables or disables caching. When set, caching is disabled. When clear, caching is enabled.
- **Not Write-through (bit 29 of CR0)** — Enables and disables write-back. When set, write-back is enabled; when clear, write-back is disabled.
- **Reserved**
- **Page-Directory Base**
- **Page-Fault Linear Address**
- **Reserved**
- **Page-Fault Linear Address (PDBR)**
- **OSXSAVE**
- **OSXFSAVE**
- **OSXMMEXCT**
- **Reserved**
- **Reserved**

The CD flag is set, caching is restricted as described in Section 11.5.3, "Preventing Cache invalidation." If the CD flag is clear, caching is enabled. Setting the NW flag while the CD flag is clear results in a general protection exception (#GP).

On Intel 64 processors, enabling and disabling IA-32e mode operation also requires modifying CR0.PG.

Refer to Table 11-5 for detailed information about these flags.
.code16  # Assemble for 16-bit mode
.globl start
start:
  cli  # BIOS enabled interrupts; disable

  # Zero data segment registers DS, ES, and SS.
xorw   %ax,%ax  # Set %ax to zero
movw   %ax,%ds  # -> Data Segment
...

  # Switch from real to protected mode. Use a bootstrap GDT that makes
  # virtual addresses map directly to physical addresses so that the
  # effective memory map doesn't change during the transition.
lgdt   gdtdesc
movl   %cr0, %eax
orl    $CR0_PE, %eax
movl   %eax, %cr0
ljmp   $(SEG_KCODE<<3), $start32

.code32  # Tell assembler to generate 32-bit code now.
start32:
  # Set up the protected-mode data segment registers
movw   $(SEG_KDATA<<3), %ax  # Our data segment selector
movw   %ax, %ds  # -> DS: Data Segment
...

  # Bootstrap GDT
gdt:
SEG_NULLASM  # null seg
SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)  # code seg
SEG_ASM(STA_W, 0x0, 0xffffffff)  # data seg
Paging

- Protected mode also enables virtual memory via *paging*
- A much more granular (but potentially expensive) form of virtual memory
  - Will discuss this in detail later!
- Kernel must set up and maintain per-process structures for paging, too
Interrupts & Exceptions

- Note: terminology differs a bit from what we used in CS 351!

- Include all events that require special CPU attention, typically by transferring control from the active task to an interrupt/exception handler

- **Interrupts** are hardware-sourced events requesting CPU attention
  - Typically unrelated to executing instruction
  - Can also be generated by software with `int N` instruction
Exceptions

- Errors/Events arising due to the *currently executing instruction*

- Subclasses:
  - **Faults**: can be corrected — after handler, return to state prior to faulting instruction (e.g., page fault)
  - **Traps**: reported immediately after execution of instruction (e.g., debugging breakpoint, system call), regular return
  - **Abort**: severe errors; cannot return to task
“Task”

- Represents a context that can be interrupted
- TSS segment is used to define the currently executing task
  - General purpose registers
  - Control registers (including EFLAGS, EIP, LDTR, etc.)
  - Stack pointers for different privilege levels
7.1.2 Task State

The following items define the state of the currently executing task:

- The task's current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS).
- The state of the general-purpose registers.
- The state of the EFLAGS register.
- The state of the EIP register.
- The state of control register CR3.
- The state of the task register.
- The state of the LDTR register.
- The I/O map base address and I/O map (contained in the TSS).
- Stack pointers to the privilege 0, 1, and 2 stacks (contained in the TSS).
- Link to previously executed task (contained in the TSS).

Prior to dispatching a task, all of these items are contained in the task’s TSS, except the state of the task register. Also, the complete contents of the LDTR register are not contained in the TSS, only the segment selector for the LDT.

7.1.3 Executing a Task

Software or the processor can dispatch a task for execution in one of the following ways:

- An explicit call to a task with the CALL instruction.
- An explicit jump to a task with the JMP instruction.
- An implicit call (by the processor) to an interrupt-handler task.
- An implicit call to an exception-handler task.
- A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set.

All of these methods for dispatching a task identify the task to be dispatched with a segment selector that points to a task gate or the TSS for the task. When dispatching a task with a CALL or JMP instruction, the selector in the instruction may select the TSS directly or a task gate that holds the selector for the TSS. When dispatching a task

---

**Figure 7-1. Structure of a Task**
The processor updates dynamic fields when a task is suspended during a task switch. The following are dynamic fields:

- **General-purpose register fields** — State of the EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI registers prior to the task switch.
- **Segment selector fields** — Segment selectors stored in the ES, CS, SS, DS, FS, and GS registers prior to the task switch.
- **EFLAGS register field** — State of the EFLAGS register prior to the task switch.
- **EIP (instruction pointer) field** — State of the EIP register prior to the task switch.
- **Previous task link field** — Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field (which is sometimes called the back link field) permits a task switch back to the previous task by using the IRET instruction.

The processor reads the static fields, but does not normally change them. These fields are set up when a task is created. The following are static fields:

- **LDT segment selector field** — Contains the segment selector for the task’s LDT.

![Figure 7-2. 32-Bit Task-State Segment (TSS)](image-url)
Handling Interrupts/Exceptions

- **Interrupt Descriptor Table (IDT)** contains descriptors (aka “gates”) associating service routines with interrupt/exception numbers
- 255 total indices (aka vector numbers):
  - 0-31: architecture-defined
  - 32-255: user-defined; can be assigned to I/O devices
Because interrupts are delivered to the processor core only once, an incorrectly configured IDT could result in incomplete interrupt handling and/or the blocking of interrupt delivery. IA-32 architecture rules need to be followed for setting up IDTR base/limit/access fields and each field in the gate descriptors. The same applies for the Intel 64 architecture. This includes implicit referencing of the destination code segment through the GDT or LDT and accessing the stack.

### 6.11 IDT DESCRIPTORS

The IDT may contain any of three kinds of gate descriptors:

- **Task-gate descriptor**
- **Interrupt-gate descriptor**
- **Trap-gate descriptor**

Figure 6-2 shows the formats for the task-gate, interrupt-gate, and trap-gate descriptors. The format of a task gate used in an IDT is the same as that of a task gate used in the GDT or an LDT (see Section 7.2.5, “Task-Gate Descriptor”). The task gate contains the segment selector for a TSS for an exception and/or interrupt handler task.

Interrupt and trap gates are very similar to call gates (see Section 5.8.3, “Call Gates”). They contain a far pointer (segment selector and offset) that the processor uses to transfer program execution to a handler procedure in an exception- or interrupt-handler code segment. These gates differ in the way the processor handles the IF flag in the EFLAGS register (see Section 6.12.1.2, “Flag Usage By Exception- or Interrupt-Handler Procedure”).

**Figure 6-1. Relationship of the IDTR and IDT**

![Diagram showing the relationship between the IDTR and IDT](image)
When the processor performs a call to the exception- or interrupt-handler procedure:

- If the handler procedure is going to be executed at a numerically lower privilege level, a stack switch occurs. When the stack switch occurs:
  a. The segment selector and stack pointer for the stack to be used by the handler are obtained from the TSS for the currently executing task. On this new stack, the processor pushes the stack segment selector and stack pointer of the interrupted procedure.
  b. The processor then saves the current state of the EFLAGS, CS, and EIP registers on the new stack (see Figures 6-4).
  c. If an exception causes an error code to be saved, it is pushed on the new stack after the EIP value.

- If the handler procedure is going to be executed at the same privilege level as the interrupted procedure:
  a. The processor saves the current state of the EFLAGS, CS, and EIP registers on the current stack (see Figures 6-4).
  b. If an exception causes an error code to be saved, it is pushed on the current stack after the EIP value.

Figure 6-3. Interrupt Procedure Call
Interrupt/Exception Vectors

Table 6-1. Protected-Mode Exceptions and Interrupts

<table>
<thead>
<tr>
<th>Vector</th>
<th>Mnemonic</th>
<th>Description</th>
<th>Type</th>
<th>Error Code</th>
<th>Source</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>#DE</td>
<td>Divide Error</td>
<td>Fault</td>
<td>No</td>
<td>DIV and IDIV instructions.</td>
</tr>
<tr>
<td>1</td>
<td>#DB</td>
<td>Debug Exception</td>
<td>Fault/ Trap</td>
<td>No</td>
<td>Instruction, data, and I/O breakpoints; single-step; and others.</td>
</tr>
<tr>
<td>2</td>
<td>—</td>
<td>NMI Interrupt</td>
<td>Interrupt</td>
<td>No</td>
<td>Nonmaskable external interrupt.</td>
</tr>
<tr>
<td>3</td>
<td>#BP</td>
<td>Breakpoint</td>
<td>Trap</td>
<td>No</td>
<td>INT3 instruction.</td>
</tr>
<tr>
<td>4</td>
<td>#OF</td>
<td>Overflow</td>
<td>Trap</td>
<td>No</td>
<td>INTO instruction.</td>
</tr>
<tr>
<td>5</td>
<td>#BR</td>
<td>BOUND Range Exceeded</td>
<td>Fault</td>
<td>No</td>
<td>BOUND instruction.</td>
</tr>
<tr>
<td>6</td>
<td>#UD</td>
<td>Invalid Opcode (Undefined Opcode)</td>
<td>Fault</td>
<td>No</td>
<td>UD instruction or reserved opcode.</td>
</tr>
<tr>
<td>7</td>
<td>#NM</td>
<td>Device Not Available (No Math Coprocessor)</td>
<td>Fault</td>
<td>No</td>
<td>Floating-point or WAIT/FWAIT instruction.</td>
</tr>
<tr>
<td>8</td>
<td>#DF</td>
<td>Double Fault</td>
<td>Abort</td>
<td>Yes (zero)</td>
<td>Any instruction that can generate an exception, an NMI, or an INTR.</td>
</tr>
<tr>
<td>9</td>
<td>#TS</td>
<td>Coprocessor Segment Overrun (reserved)</td>
<td>Fault</td>
<td>No</td>
<td>Floating-point instruction.¹</td>
</tr>
<tr>
<td>10</td>
<td>#NP</td>
<td>Invalid TSS</td>
<td>Fault</td>
<td>Yes</td>
<td>Task switch or TSS access.</td>
</tr>
<tr>
<td>11</td>
<td>#SS</td>
<td>Segment Not Present</td>
<td>Fault</td>
<td>Yes</td>
<td>Loading segment registers or accessing system segments.</td>
</tr>
<tr>
<td>12</td>
<td>#GP</td>
<td>Stack-Segment Fault</td>
<td>Fault</td>
<td>Yes</td>
<td>Stack operations and SS register loads.</td>
</tr>
<tr>
<td>13</td>
<td>—</td>
<td>General Protection</td>
<td>Fault</td>
<td>Yes</td>
<td>Any memory reference and other protection checks.</td>
</tr>
<tr>
<td>14</td>
<td>#PF</td>
<td>Page Fault</td>
<td>Fault</td>
<td>Yes</td>
<td>Any memory reference.</td>
</tr>
<tr>
<td>15</td>
<td>—</td>
<td>Software-generated interrupts.</td>
<td>Fault</td>
<td>No</td>
<td>Any memory reference.</td>
</tr>
<tr>
<td>16</td>
<td>#MF</td>
<td>x87 FPU Floating-Point Error</td>
<td>Fault</td>
<td>No</td>
<td>x87 FPU floating-point or WAIT/FWAIT instruction.</td>
</tr>
<tr>
<td>17</td>
<td>#AC</td>
<td>Alignment Check</td>
<td>Fault</td>
<td>Yes (Zero)</td>
<td>Any data reference in memory.²</td>
</tr>
<tr>
<td>18</td>
<td>#MC</td>
<td>Machine Check</td>
<td>Abort</td>
<td>No</td>
<td>Error codes (if any) and source are model dependent.³</td>
</tr>
<tr>
<td>19</td>
<td>#XM</td>
<td>SIMD Floating-Point Exception</td>
<td>Fault</td>
<td>No</td>
<td>SSE/SSE2/SSE3 floating-point instructions⁴</td>
</tr>
<tr>
<td>20</td>
<td>#VE</td>
<td>Virtualization Exception</td>
<td>Fault</td>
<td>No</td>
<td>EPT violations⁵</td>
</tr>
<tr>
<td>21-31</td>
<td>—</td>
<td>Intel reserved. Do not use.</td>
<td>Fault</td>
<td>No</td>
<td>Intel reserved. Do not use.</td>
</tr>
<tr>
<td>32-255</td>
<td>—</td>
<td>User Defined (Non-reserved) Interrupts</td>
<td>Interrupt</td>
<td>Yes</td>
<td>External interrupt or INT n instruction.</td>
</tr>
</tbody>
</table>

NOTES:
1. This exception can occur only on processors that support the architectural exception mechanism and the x87 FPU.
2. Any data reference in memory.
3. Error codes (if any) and source are model dependent.
4. SSE/SSE2/SSE3 floating-point instructions.
5. EPT violations.
Gate Descriptors

- Interrupts invoke *Interrupt Gates*
- Exceptions invoke *Trap Gates*

**Interrupt Gate**

- Offset 31..16
- P
- DPL
- Offset 15..0

**Trap Gate**

- Offset 31..16
- P
- DPL
- Offset 15..0

---

**Legend**

- DPL: Descriptor Privilege Level
- Offset: Offset to procedure entry point
- P: Segment Present flag
- Selector: Segment Selector for destination code segment
- D: Size of gate: 1 = 32 bits; 0 = 16 bits

Reserved
Privilege level checks

- Hardware ensures that interrupts cannot transfer control from more-privileged to less-privileged code
  - i.e., enforce CPL ≥ destination segment DPL
- For interrupts generated with `int` instruction, gate DPL is checked to restrict interrupts
  - i.e., enforce CPL ≤ gate DPL
Masking Interrupts

- Most external interrupts can be masked (i.e., ignored), by setting the IF (interrupt flag) in EFLAGS
  - cli/sti instructions: clear/set interrupt flag
- IF is automatically cleared when an interrupt (but not a trap) gate is taken
- How is this useful?
Interrupt Procedure

When the processor performs a call to the exception- or interrupt-handler procedure:

• If the handler procedure is going to be executed at a numerically lower privilege level, a stack switch occurs. When the stack switch occurs:
  a. The segment selector and stack pointer for the stack to be used by the handler are obtained from the TSS for the currently executing task. On this new stack, the processor pushes the stack segment selector and stack pointer of the interrupted procedure.
  b. The processor then saves the current state of the EFLAGS, CS, and EIP registers on the new stack (see Figures 6-4).
  c. If an exception causes an error code to be saved, it is pushed on the new stack after the EIP value.

• If the handler procedure is going to be executed at the same privilege level as the interrupted procedure:
  a. The processor saves the current state of the EFLAGS, CS, and EIP registers on the current stack (see Figures 6-4).
  b. If an exception causes an error code to be saved, it is pushed on the current stack after the EIP value.
To return from an exception- or interrupt-handler procedure, the handler must use the IRET (or IRETD) instruction. The IRET instruction is similar to the RET instruction except that it restores the saved flags into the EFLAGS register. The IOPL field of the EFLAGS register is restored only if the CPL is 0. The IF flag is changed only if the CPL is less than or equal to the IOPL. See Chapter 3, "Instruction Set Reference, A-L," of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A, for a description of the complete operation performed by the IRET instruction.

If a stack switch occurred when calling the handler procedure, the IRET instruction switches back to the interrupted procedure's stack on the return.

### 6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures

The privilege-level protection for exception- and interrupt-handler procedures is similar to that used for ordinary procedure calls when called through a call gate (see Section 5.8.4, "Accessing a Code Segment Through a Call Gate"). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL.

An attempt to violate this rule results in a general-protection exception (#GP). The protection mechanism for exception- and interrupt-handler procedures is different in the following ways:

- Because interrupt and exception vectors have no RPL, the RPL is not checked on implicit calls to exception and interrupt handlers.
- The processor checks the DPL of the interrupt or trap gate only if an exception or interrupt is generated with an INTn, INT3, or INTO instruction. Here, the CPL must be less than or equal to the DPL of the gate. This restriction prevents application programs or procedures running at privilege level 3 from using a software interrupt to access critical exception handlers, such as the page-fault handler, providing that those handlers are...
Returning from interrupt

- Need to restore previous stack/register states, and potentially return to previous privileged level (switch stacks)

- `iret` instruction automatically restores state using values saved on stack (including EIP, CS, EFLAGS) by interrupt procedure
Figure 2-1. IA-32 System-Level Registers and Data Structures

This page mapping example is for 4-KByte pages and the normal 32-bit physical address size.