← All notes
CPSC 313

CPSC 313: Computer Hardware & OS

Memory hierarchy, caching, virtual memory, processes, and concurrency. How the hardware-OS boundary actually works.

CMemoryOSCachingVirtual MemoryProcesses

2024-04-20


Overview

CPSC 313 bridges the gap between software and hardware. After this course, you'll understand what actually happens when your program runs — from the CPU fetching instructions to the OS scheduling your process.


Memory Hierarchy

Modern systems use a hierarchy of storage to balance speed and cost:

CPU Registers   → ~1 cycle       (bytes)
L1 Cache        → ~4 cycles      (KB)
L2 Cache        → ~12 cycles     (MB)
L3 Cache        → ~30 cycles     (MB)
RAM (DRAM)      → ~100 cycles    (GB)
SSD             → ~10,000 cycles (TB)
HDD             → ~10M cycles    (TB)

Principle of locality makes caching work:

  • Temporal locality: recently used data is likely to be used again
  • Spatial locality: data near recently used data is likely to be used soon

Cache Organization

A cache is organized into sets, each containing ways (lines).

  • Direct-mapped: 1 way — simple but prone to conflict misses
  • Set-associative: N ways — better hit rate, small LRU overhead
  • Fully associative: all lines in one set — best hit rate, expensive hardware

Cache miss types:

  1. Cold miss — first access, nothing is cached yet
  2. Conflict miss — two addresses map to the same cache set
  3. Capacity miss — working set doesn't fit in cache
// Cache-friendly: row-major traversal
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        sum += A[i][j];  // Accesses memory sequentially ✓

// Cache-unfriendly: column-major traversal
for (int j = 0; j < N; j++)
    for (int i = 0; i < N; i++)
        sum += A[i][j];  // Jumps N*4 bytes each step ✗

Virtual Memory

Virtual memory gives each process the illusion of its own private address space.

Translation: Virtual Address → Physical Address via page table

  • Pages are typically 4KB
  • Page Table Entry (PTE) stores: physical frame number, valid bit, dirty bit, reference bit
  • TLB (Translation Lookaside Buffer) caches recent translations

Page faults: accessing a page not in RAM triggers a fault → OS loads page from disk.

Virtual Address:   [ VPN (20 bits) | Offset (12 bits) ]
                         ↓
                    Page Table
                         ↓
Physical Address:  [ PFN (20 bits) | Offset (12 bits) ]

Processes & the OS

A process is a running program: code + data + stack + open files + registers.

Context switch: the OS saves one process's state (registers, PC) and loads another's.

// fork() creates a child process
pid_t pid = fork();
if (pid == 0) {
    // Child: pid == 0
    execv("/bin/ls", args);
} else {
    // Parent: pid == child's PID
    wait(NULL);  // Wait for child to finish
}

Process states: Running → Blocked → Ready → Running


Signals & Concurrency

Signals are software interrupts sent to processes:

void handler(int sig) {
    // Handle SIGINT (Ctrl+C)
    printf("Caught signal %d\n", sig);
}

signal(SIGINT, handler);

Race conditions occur when multiple threads access shared data without synchronization.

Mutex prevents concurrent access:

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&lock);
shared_counter++;           // Critical section
pthread_mutex_unlock(&lock);

Key Takeaways

  • Locality is everything. Write cache-friendly code: sequential access patterns, small working sets.
  • Virtual memory is an abstraction. Every pointer you dereference goes through a TLB lookup + page table walk.
  • The OS is a resource manager. It multiplexes CPU (scheduling), memory (paging), and I/O (device drivers) across processes.
  • Concurrency is hard. Always protect shared state. Assume signals can interrupt your code anywhere.
  • Understanding hardware makes you a better programmer. You can't optimize what you don't understand.