Overview

CPSC 221 is the core data structures course — the difference between a solution that works and one that scales. Understanding why certain structures are fast (and slow) is the skill that separates strong engineers from weak ones.

Complexity Analysis

Big-O Notation

Big-O describes how runtime grows as input size grows. We care about the dominant term:

| Complexity | Name | Example | |---|---|---| | O(1) | Constant | Array index access | | O(log n) | Logarithmic | Binary search | | O(n) | Linear | Linear scan | | O(n log n) | Linearithmic | Merge sort | | O(n²) | Quadratic | Bubble sort | | O(2ⁿ) | Exponential | Naive recursion |

Trees

Binary Search Tree (BST)

Property: For every node, all left descendants < node < all right descendants.

struct Node {
    int val;
    Node* left;
    Node* right;
};

// In-order traversal (gives sorted output)
void inorder(Node* root) {
    if (!root) return;
    inorder(root->left);
    cout << root->val << " ";
    inorder(root->right);
}

Search / Insert / Delete: O(log n) average, O(n) worst (unbalanced)

AVL Trees

Self-balancing BST. Maintains the invariant: |height(left) - height(right)| ≤ 1. Rotations (single/double) restore balance after insertions. All operations guaranteed O(log n).

Hash Tables

Store key-value pairs with O(1) average access.

Hash function maps keys to indices. Collisions are handled by:

Chaining — each bucket holds a linked list
Open addressing — probe for the next empty slot (linear, quadratic, double hashing)

Load factor α = n / m (items / buckets). Keep α < 0.7 for good performance.

// Simple hash for integers
int hash(int key, int tableSize) {
    return key % tableSize;
}

Graphs

A graph G = (V, E) with vertices V and edges E.

Directed vs. undirected
Weighted vs. unweighted
Representations: adjacency matrix O(V²) space, adjacency list O(V+E) space

BFS (Breadth-First Search)

Explores level by level. Uses a queue. Finds shortest path in unweighted graphs.

void bfs(int start, vector<vector<int>>& adj) {
    vector<bool> visited(adj.size(), false);
    queue<int> q;
    q.push(start);
    visited[start] = true;
    while (!q.empty()) {
        int v = q.front(); q.pop();
        cout << v << " ";
        for (int u : adj[v]) {
            if (!visited[u]) {
                visited[u] = true;
                q.push(u);
            }
        }
    }
}

DFS (Depth-First Search)

Explores as far as possible before backtracking. Uses a stack (or recursion). Good for cycle detection, topological sort.

Sorting Algorithms

| Algorithm | Best | Average | Worst | Stable? | |---|---|---|---|---| | Bubble sort | O(n) | O(n²) | O(n²) | Yes | | Merge sort | O(n log n) | O(n log n) | O(n log n) | Yes | | Quick sort | O(n log n) | O(n log n) | O(n²) | No | | Heap sort | O(n log n) | O(n log n) | O(n log n) | No |

Merge sort divides the array, sorts each half, then merges — classic divide-and-conquer.

Quick sort picks a pivot, partitions elements around it, recurses. Fast in practice (good cache behavior) but O(n²) worst case with bad pivot choice.

Key Takeaways

Choose the right structure first. Hash table for lookups, BST for ordered data, heap for priority queues.
Amortized cost matters. Dynamic arrays are O(1) amortized even though resizing is O(n).
Balance is everything in trees. An unbalanced BST is just a slow linked list.
Graph problems are everywhere. Social networks, routing, dependency resolution, scheduling — all graph problems.

CPSC 221: Data Structures & Algorithms