← All notes
CPSC 221

CPSC 221: Data Structures & Algorithms

Trees, graphs, hash tables, sorting algorithms, and complexity analysis. The foundational toolkit every CS student needs.

C++TreesGraphsHash TablesBig-OSorting

2023-12-10


Overview

CPSC 221 is the core data structures course — the difference between a solution that works and one that scales. Understanding why certain structures are fast (and slow) is the skill that separates strong engineers from weak ones.


Complexity Analysis

Big-O Notation

Big-O describes how runtime grows as input size grows. We care about the dominant term:

| Complexity | Name | Example | |---|---|---| | O(1) | Constant | Array index access | | O(log n) | Logarithmic | Binary search | | O(n) | Linear | Linear scan | | O(n log n) | Linearithmic | Merge sort | | O(n²) | Quadratic | Bubble sort | | O(2ⁿ) | Exponential | Naive recursion |


Trees

Binary Search Tree (BST)

Property: For every node, all left descendants < node < all right descendants.

struct Node {
    int val;
    Node* left;
    Node* right;
};

// In-order traversal (gives sorted output)
void inorder(Node* root) {
    if (!root) return;
    inorder(root->left);
    cout << root->val << " ";
    inorder(root->right);
}
  • Search / Insert / Delete: O(log n) average, O(n) worst (unbalanced)

AVL Trees

Self-balancing BST. Maintains the invariant: |height(left) - height(right)| ≤ 1. Rotations (single/double) restore balance after insertions. All operations guaranteed O(log n).


Hash Tables

Store key-value pairs with O(1) average access.

Hash function maps keys to indices. Collisions are handled by:

  • Chaining — each bucket holds a linked list
  • Open addressing — probe for the next empty slot (linear, quadratic, double hashing)

Load factor α = n / m (items / buckets). Keep α < 0.7 for good performance.

// Simple hash for integers
int hash(int key, int tableSize) {
    return key % tableSize;
}

Graphs

A graph G = (V, E) with vertices V and edges E.

  • Directed vs. undirected
  • Weighted vs. unweighted
  • Representations: adjacency matrix O(V²) space, adjacency list O(V+E) space

BFS (Breadth-First Search)

Explores level by level. Uses a queue. Finds shortest path in unweighted graphs.

void bfs(int start, vector<vector<int>>& adj) {
    vector<bool> visited(adj.size(), false);
    queue<int> q;
    q.push(start);
    visited[start] = true;
    while (!q.empty()) {
        int v = q.front(); q.pop();
        cout << v << " ";
        for (int u : adj[v]) {
            if (!visited[u]) {
                visited[u] = true;
                q.push(u);
            }
        }
    }
}

DFS (Depth-First Search)

Explores as far as possible before backtracking. Uses a stack (or recursion). Good for cycle detection, topological sort.


Sorting Algorithms

| Algorithm | Best | Average | Worst | Stable? | |---|---|---|---|---| | Bubble sort | O(n) | O(n²) | O(n²) | Yes | | Merge sort | O(n log n) | O(n log n) | O(n log n) | Yes | | Quick sort | O(n log n) | O(n log n) | O(n²) | No | | Heap sort | O(n log n) | O(n log n) | O(n log n) | No |

Merge sort divides the array, sorts each half, then merges — classic divide-and-conquer.

Quick sort picks a pivot, partitions elements around it, recurses. Fast in practice (good cache behavior) but O(n²) worst case with bad pivot choice.


Key Takeaways

  • Choose the right structure first. Hash table for lookups, BST for ordered data, heap for priority queues.
  • Amortized cost matters. Dynamic arrays are O(1) amortized even though resizing is O(n).
  • Balance is everything in trees. An unbalanced BST is just a slow linked list.
  • Graph problems are everywhere. Social networks, routing, dependency resolution, scheduling — all graph problems.