Overview
CPSC 313 bridges the gap between software and hardware. After this course, you'll understand what actually happens when your program runs — from the CPU fetching instructions to the OS scheduling your process.
Memory Hierarchy
Modern systems use a hierarchy of storage to balance speed and cost:
CPU Registers → ~1 cycle (bytes)
L1 Cache → ~4 cycles (KB)
L2 Cache → ~12 cycles (MB)
L3 Cache → ~30 cycles (MB)
RAM (DRAM) → ~100 cycles (GB)
SSD → ~10,000 cycles (TB)
HDD → ~10M cycles (TB)
Principle of locality makes caching work:
- Temporal locality: recently used data is likely to be used again
- Spatial locality: data near recently used data is likely to be used soon
Cache Organization
A cache is organized into sets, each containing ways (lines).
- Direct-mapped: 1 way — simple but prone to conflict misses
- Set-associative: N ways — better hit rate, small LRU overhead
- Fully associative: all lines in one set — best hit rate, expensive hardware
Cache miss types:
- Cold miss — first access, nothing is cached yet
- Conflict miss — two addresses map to the same cache set
- Capacity miss — working set doesn't fit in cache
// Cache-friendly: row-major traversal
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
sum += A[i][j]; // Accesses memory sequentially ✓
// Cache-unfriendly: column-major traversal
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
sum += A[i][j]; // Jumps N*4 bytes each step ✗
Virtual Memory
Virtual memory gives each process the illusion of its own private address space.
Translation: Virtual Address → Physical Address via page table
- Pages are typically 4KB
- Page Table Entry (PTE) stores: physical frame number, valid bit, dirty bit, reference bit
- TLB (Translation Lookaside Buffer) caches recent translations
Page faults: accessing a page not in RAM triggers a fault → OS loads page from disk.
Virtual Address: [ VPN (20 bits) | Offset (12 bits) ]
↓
Page Table
↓
Physical Address: [ PFN (20 bits) | Offset (12 bits) ]
Processes & the OS
A process is a running program: code + data + stack + open files + registers.
Context switch: the OS saves one process's state (registers, PC) and loads another's.
// fork() creates a child process
pid_t pid = fork();
if (pid == 0) {
// Child: pid == 0
execv("/bin/ls", args);
} else {
// Parent: pid == child's PID
wait(NULL); // Wait for child to finish
}
Process states: Running → Blocked → Ready → Running
Signals & Concurrency
Signals are software interrupts sent to processes:
void handler(int sig) {
// Handle SIGINT (Ctrl+C)
printf("Caught signal %d\n", sig);
}
signal(SIGINT, handler);
Race conditions occur when multiple threads access shared data without synchronization.
Mutex prevents concurrent access:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&lock);
shared_counter++; // Critical section
pthread_mutex_unlock(&lock);
Key Takeaways
- Locality is everything. Write cache-friendly code: sequential access patterns, small working sets.
- Virtual memory is an abstraction. Every pointer you dereference goes through a TLB lookup + page table walk.
- The OS is a resource manager. It multiplexes CPU (scheduling), memory (paging), and I/O (device drivers) across processes.
- Concurrency is hard. Always protect shared state. Assume signals can interrupt your code anywhere.
- Understanding hardware makes you a better programmer. You can't optimize what you don't understand.