CLRS, Studied Rigorously
A self-contained, interactive companion covering foundations through the master method — with formal definitions, worked proofs, live algorithm visualizations, and checked exercises.
This guide is designed for a single purpose: to help you learn the material as a mathematician would. Definitions come first. Theorems follow. Every claim that matters is proved, and the most important proofs appear in full. Running alongside the formal text are interactive elements — a step-through of INSERTION-SORT, a live merge-sort recursion, a recursion-tree builder, and a master-theorem dispatcher — intended not as decoration, but as a second channel for intuition. Work through the quizzes; attempt the exercises before revealing solutions. The ornament ❧ marks the end of a section.
A brief notational remark. Arrays are indexed from $1$ throughout (as CLRS does), so an array $A$ of length $n$ has entries $A[1], A[2], \ldots, A[n]$. The notation $A[i\mathinner{.\,.}j]$ denotes the subarray from index $i$ to $j$ inclusive.
Getting Started
We begin with the most concrete object in the book: a small, specific algorithm, and the tools to say exactly what "efficient" means for it. The two algorithms in this chapter — insertion sort and merge sort — serve as the bedrock upon which all subsequent analysis rests. Master these and you have the vocabulary to attack everything that follows.
2.1Insertion Sort
Insertion sort is the algorithm most people would invent if handed a pile of index cards to sort. You pick up one card at a time, and slide each new card into its correct position among the cards you have already sorted. Formally:
Output. A permutation $\langle a'_1, a'_2, \ldots, a'_n \rangle$ of the input such that $a'_1 \leq a'_2 \leq \cdots \leq a'_n$.
The Algorithm
We present insertion sort in the CLRS pseudocode style. The variable $j$ marks the "current" card being inserted; indices $1$ through $j-1$ are already in sorted order; $i$ scans backward through that sorted prefix looking for the correct insertion point.
for j = 2 to A.length key = A[j] // Insert A[j] into the sorted sequence A[1..j-1]. i = j - 1 while i > 0 and A[i] > key A[i + 1] = A[i] i = i - 1 A[i + 1] = key
Interactive Trace
Step through the algorithm on a concrete input. The blue cell holds the current key; amber cells are being compared; green cells are in their final sorted positions relative to the prefix seen so far.
Best and Worst Case Analysis
Let $t_j$ denote the number of times the while-loop test on line 5 is executed for a given value of $j$. We count the cost of each line as a constant $c_k$ multiplied by the number of times that line executes:
| Line | Cost | Times |
|---|---|---|
| 1 for j = 2 to n | $c_1$ | $n$ |
| 2 key = A[j] | $c_2$ | $n-1$ |
| 4 i = j - 1 | $c_4$ | $n-1$ |
| 5 while ... | $c_5$ | $\sum_{j=2}^{n} t_j$ |
| 6 A[i+1] = A[i] | $c_6$ | $\sum_{j=2}^{n}(t_j - 1)$ |
| 7 i = i - 1 | $c_7$ | $\sum_{j=2}^{n}(t_j - 1)$ |
| 8 A[i+1] = key | $c_8$ | $n-1$ |
The total running time $T(n)$ is the sum of these products. We analyse two extremes.
If $A$ is already in ascending order, then for each $j$ the test A[i] > key on line 5 is false on the first evaluation. Hence $t_j = 1$ for all $j = 2, \ldots, n$.
$$T(n) = c_1 n + (c_2 + c_4 + c_5 + c_8)(n-1) = an + b$$
for constants $a, b$. This is a linear function of $n$, so the best-case running time is $\Theta(n)$.
If $A$ is in strictly decreasing order, each key $A[j]$ must be compared with every element $A[1], \ldots, A[j-1]$; the inner loop runs until $i = 0$. Hence $t_j = j$, giving
$$\sum_{j=2}^{n} t_j = \sum_{j=2}^{n} j = \frac{n(n+1)}{2} - 1, \qquad \sum_{j=2}^{n}(t_j - 1) = \frac{n(n-1)}{2}.$$
Substituting and collecting constants gives
$$T(n) = \left(\tfrac{c_5}{2} + \tfrac{c_6}{2} + \tfrac{c_7}{2}\right)n^2 + \left(c_1 + c_2 + c_4 + \tfrac{c_5}{2} - \tfrac{c_6}{2} - \tfrac{c_7}{2} + c_8\right)n - (c_2 + c_4 + c_5 + c_8).$$
This is a quadratic function of $n$, so the worst-case running time is $\Theta(n^2)$.
Loop Invariant — Proof of Correctness
A loop invariant is an assertion about the state of a computation that holds at designated points during the execution of a loop. We use it to prove correctness the same way mathematical induction proves statements about the natural numbers: we must establish three properties.
Maintenance. If the invariant is true before an iteration, it remains true before the next iteration.
Termination. When the loop terminates, the invariant (combined with the reason the loop terminated) yields a useful property that helps show the algorithm is correct.
We verify the three conditions.
Initialization. Before the first iteration, $j = 2$, so $A[1\mathinner{.\,.}j-1] = A[1\mathinner{.\,.}1]$, which is a single-element subarray. It is trivially sorted and trivially a permutation of itself. The invariant holds. (That the base case is $j=2$ and not $j=1$ is a common confusion — the invariant is stated at the start of an iteration, and the first iteration starts with $j=2$.)
Maintenance. Assume the invariant holds at the start of iteration $j$: that is, $A[1\mathinner{.\,.}j-1]$ is a sorted permutation of the original $A[1\mathinner{.\,.}j-1]$. We must show it holds at the start of iteration $j+1$, i.e. that $A[1\mathinner{.\,.}j]$ is a sorted permutation of the original $A[1\mathinner{.\,.}j]$.
The body of the iteration saves $\textit{key} = A[j]$ and then repeatedly shifts elements of $A[1\mathinner{.\,.}j-1]$ one position to the right — replacing $A[i+1]$ with $A[i]$ — for as long as $A[i] > \textit{key}$ and $i > 0$. (A formal proof at this level requires its own inner invariant for the while-loop; informally: after each inner iteration, $A[i+2\mathinner{.\,.}j]$ is a rightward shift of the original $A[i+1\mathinner{.\,.}j-1]$, and every element in that range exceeds $\textit{key}$.) The loop terminates when either $i = 0$ or $A[i] \leq \textit{key}$; in either case, the assignment $A[i+1] = \textit{key}$ places $\textit{key}$ immediately to the right of an element that does not exceed it (or at position $1$). Thus $A[1\mathinner{.\,.}j]$ is sorted and is a permutation of the original $A[1\mathinner{.\,.}j]$.
Termination. The loop terminates when $j$ exceeds $n$, i.e. when $j = n+1$. Substituting into the invariant, the subarray $A[1\mathinner{.\,.}n]$ consists of the elements originally in $A[1\mathinner{.\,.}n]$, in sorted order. But $A[1\mathinner{.\,.}n]$ is the entire array, so the algorithm has correctly sorted $A$.■
key (plus loop indices) beyond the array itself.Selected Exercises
Exercise 2.1-3 — Linear Search, with loop invariant
Problem. Consider the searching problem: given a sequence $A = \langle a_1, \ldots, a_n \rangle$ and a value $v$, return the index $i$ such that $v = A[i]$, or NIL if $v$ does not appear. Write pseudocode for linear search and prove it correct using a loop invariant.
for i = 1 to A.length if A[i] == v return ireturn NIL
Invariant. At the start of each iteration of the for-loop, the value $v$ does not appear in the subarray $A[1\mathinner{.\,.}i-1]$.
Initialization. Before iteration $i=1$, the subarray $A[1\mathinner{.\,.}0]$ is empty, so $v$ vacuously does not appear in it.
Maintenance. If we reach the start of iteration $i+1$, we did not return in iteration $i$, so $A[i] \neq v$. Combined with the invariant, $v$ does not appear in $A[1\mathinner{.\,.}i]$.
Termination. The loop exits either by returning an index $i$ (in which case $A[i] = v$, correct) or by completing all iterations. In the latter case $i = n+1$ and the invariant says $v \notin A[1\mathinner{.\,.}n]$, so returning NIL is correct.■
Exercise 2.1-4 — Binary Addition
Problem. Given two $n$-bit integers $A$ and $B$ stored in arrays of length $n$, design an algorithm that produces their $(n+1)$-bit binary sum $C$.
carry = 0for i = 1 to n C[i] = (A[i] + B[i] + carry) mod 2 carry = (A[i] + B[i] + carry) / 2 // integer divisionC[n+1] = carry
The running time is $\Theta(n)$ — each bit is touched a constant number of times. A loop invariant (left to you) would state that after iteration $i$, the low $i$ bits of $C$ together with carry correctly represent $A[1\mathinner{.\,.}i] + B[1\mathinner{.\,.}i]$.
2.2Analyzing Algorithms
To analyze an algorithm is to predict the resources it will require. Usually we care about time, but memory, bandwidth, or energy may be equally relevant. Before any analysis is possible we must commit to a model of computation; CLRS fixes the Random-Access Machine (RAM).
Instructions execute one after another, with no concurrency. Each instruction takes a constant amount of time. The primitive instructions include arithmetic (add, subtract, multiply, divide, remainder, floor, ceiling), data movement (load, store, copy), and control (conditional and unconditional branch, subroutine call and return). Data types are integer and floating-point, with a word size that grows only modestly — specifically, with $c \log n$ bits for some constant $c \geq 1$ on inputs of size $n$.
Input Size and Running Time
Two quantities need precise definition.
Input size. The natural measure depends on the problem. For sorting, it is the number of items $n$. For integer multiplication, it is the total number of bits in the input. For a graph algorithm, it is often a pair $(|V|, |E|)$.
Running time. On a given input, the running time is the number of primitive operations (or "steps") executed. We follow the convention that each line of pseudocode takes constant time, though the constants may differ from line to line. This makes our analysis machine-independent up to a constant factor — exactly the granularity $\Theta$-notation (developed in Chapter 3) is designed for.
Worst Case, Best Case, Average Case
For a fixed algorithm, the running time depends on the input. We distinguish:
- Worst-case running time $T(n)$: the maximum running time over all inputs of size $n$. This is the default. It gives an upper bound for any input, it occurs often in practice, and the "average case" is frequently as bad as the worst.
- Best-case running time: the minimum over inputs of size $n$. Typically too optimistic to be useful on its own.
- Average-case running time: the expectation over inputs of size $n$, under some distribution — commonly the uniform distribution. Requires probabilistic analysis and a justified distribution.
Order of Growth
Even the exact expression for $T(n)$ depends on machine-specific constants. For comparing algorithms, only the rate of growth matters as $n \to \infty$. For insertion sort we summarize:
$$T_{\text{best}}(n) = \Theta(n), \qquad T_{\text{worst}}(n) = \Theta(n^2), \qquad T_{\text{avg}}(n) = \Theta(n^2).$$
2.3Designing Algorithms
Insertion sort uses an incremental design: we build up the sorted prefix one element at a time. In this section we introduce a different paradigm that will reappear throughout the book.
2.3.1 The Divide-and-Conquer Approach
Conquer the subproblems by solving them recursively. If the subproblem size is small enough (the base case), solve it directly.
Combine the solutions to the subproblems into the solution for the original problem.
Merge Sort
Merge sort applies this paradigm to sorting. To sort $A[p\mathinner{.\,.}r]$:
- Divide: split the $n$-element subarray into two subarrays of $n/2$ elements each, namely $A[p\mathinner{.\,.}q]$ and $A[q+1\mathinner{.\,.}r]$ where $q = \lfloor (p+r)/2 \rfloor$.
- Conquer: sort each half recursively using merge sort.
- Combine: merge the two sorted halves into a single sorted array.
The recursion bottoms out when a subarray has just one element (which is trivially sorted). The key subroutine is MERGE, which assumes $A[p\mathinner{.\,.}q]$ and $A[q+1\mathinner{.\,.}r]$ are each already sorted.
n_1 = q - p + 1n_2 = r - qlet L[1..n_1 + 1] and R[1..n_2 + 1] be new arraysfor i = 1 to n_1 L[i] = A[p + i - 1]for j = 1 to n_2 R[j] = A[q + j]L[n_1 + 1] = ∞R[n_2 + 1] = ∞i = 1j = 1for k = p to r if L[i] <= R[j] A[k] = L[i] i = i + 1 else A[k] = R[j] j = j + 1
The use of $\infty$ as a sentinel eliminates a tedious boundary check: neither $L$ nor $R$ can be exhausted before position $r$ because each contains a value ($\infty$) that is never less than the other.
if p < r q = ⌊(p + r) / 2⌋ Merge-Sort(A, p, q) Merge-Sort(A, q + 1, r) Merge(A, p, q, r)
Interactive Merge-Sort Trace
Watch the recursion unfold. The tree shows each subproblem; highlighted nodes are being processed; the array below shows the current state of $A$ after each MERGE completes.
Correctness of MERGE
Initialization. Before the first iteration, $k = p$, so $A[p\mathinner{.\,.}k-1] = A[p\mathinner{.\,.}p-1]$ is empty; it contains the $0$ smallest elements trivially. Since $i = j = 1$, $L[i]$ and $R[j]$ are the first (smallest) elements of their respective sorted arrays.
Maintenance. Suppose $L[i] \leq R[j]$; then $L[i]$ is the smallest element not yet copied back. The body copies $L[i]$ to $A[k]$ and increments both $i$ and $k$; the invariant holds for the new values. The case $L[i] > R[j]$ is symmetric.
Termination. The loop terminates when $k = r+1$. The invariant then says $A[p\mathinner{.\,.}r]$ contains the $r - p + 1 = n_1 + n_2$ smallest elements of $L$ and $R$ in sorted order. The only elements not among these are the two sentinels $\infty$. Hence $A[p\mathinner{.\,.}r]$ is exactly the sorted union of the original $L[1\mathinner{.\,.}n_1]$ and $R[1\mathinner{.\,.}n_2]$.■
2.3.2 Analyzing Divide-and-Conquer Algorithms
The running time of a divide-and-conquer algorithm is naturally described by a recurrence. Let $T(n)$ be the running time on a problem of size $n$. If the problem is small enough (size $\leq c$ for some constant $c$), the direct solution takes constant time $\Theta(1)$. Otherwise:
- The divide step takes $D(n)$ time.
- We solve $a$ subproblems, each of size $n/b$, in total time $a \cdot T(n/b)$.
- The combine step takes $C(n)$ time.
This gives the general recurrence
$$T(n) = \begin{cases} \Theta(1) & \text{if } n \leq c, \\ aT(n/b) + D(n) + C(n) & \text{otherwise.} \end{cases}$$
The Merge-Sort Recurrence
For merge sort on $n$ elements with $n > 1$:
- Divide: compute the midpoint $q$. This is $D(n) = \Theta(1)$.
- Conquer: recursively sort two subproblems of size $n/2$, totalling $2T(n/2)$.
- Combine: the MERGE procedure runs in $\Theta(n)$ on an $n$-element subarray (it is a single for-loop from $p$ to $r$ with constant-time body).
Therefore
$$T(n) = \begin{cases} \Theta(1) & \text{if } n = 1, \\ 2T(n/2) + \Theta(n) & \text{if } n > 1. \end{cases}$$
A rigorous derivation appears in §4.4 via the recursion-tree method and in §4.5 as Case 2 of the master theorem. The intuition, however, is easy to state now: the recursion tree for merge sort has depth $\log_2 n$, and each level performs $\Theta(n)$ work in total (the top level does $n$ work, the next level does $n/2 + n/2 = n$ work, and so on). Total work: $\Theta(n) \cdot \Theta(\log n) = \Theta(n \log n)$.
Selected Exercise
Exercise 2.3-5 — Binary search and its recurrence
Problem. Referring back to the searching problem, observe that if $A$ is sorted we can check the midpoint against $v$ and eliminate half of the array. This is binary search. Write iterative and recursive pseudocode and argue that the worst-case running time is $\Theta(\log n)$.
low = 1; high = A.lengthwhile low <= high mid = ⌊(low + high) / 2⌋ if A[mid] == v return mid elseif A[mid] < v low = mid + 1 else high = mid - 1return NIL
Each iteration either returns or halves the search range (from size $n$ to $\lfloor n/2 \rfloor$ or smaller). The number of halvings required to shrink $n$ to $0$ is $\lfloor \lg n \rfloor + 1$, and each iteration does constant work. Hence worst case $\Theta(\log n)$. The recurrence is $T(n) = T(n/2) + \Theta(1)$, which we will solve formally by the master theorem.
Growth of Functions
Chapter 2 justified the convention that lines of pseudocode run in constant time — but only up to a constant. To compare algorithms meaningfully we need a vocabulary that ignores constant factors and low-order terms. That vocabulary is asymptotic notation.
3.1Asymptotic Notation
The symbols $O$, $\Theta$, $\Omega$, $o$, and $\omega$ describe the asymptotic running time of an algorithm — its behavior as $n \to \infty$. Each describes a set of functions; equations like "$T(n) = O(n^2)$" are abuses of notation that should be read as "$T(n)$ is an element of the set $O(n^2)$".
$\Theta$-Notation — Asymptotic Tight Bound
For a given function $g(n)$, we denote by $\Theta(g(n))$ the set of functions $$\Theta(g(n)) = \{\, f(n) : \exists\, c_1, c_2 > 0,\ n_0 > 0\ \text{such that}\ 0 \leq c_1 g(n) \leq f(n) \leq c_2 g(n)\ \text{for all } n \geq n_0 \,\}.$$
Informally: $f(n)$ is squeezed between $c_1 g(n)$ and $c_2 g(n)$ for all sufficiently large $n$. Note the requirement $f(n) \geq 0$ — all functions we use with asymptotic notation are asymptotically nonnegative.
We must find constants $c_1, c_2, n_0 > 0$ such that $$c_1 n^2 \leq \tfrac{1}{2} n^2 - 3n \leq c_2 n^2 \quad \text{for all } n \geq n_0.$$
Dividing by $n^2$ (valid for $n > 0$): $c_1 \leq \tfrac{1}{2} - \tfrac{3}{n} \leq c_2$. The right inequality holds for all $n \geq 1$ with $c_2 = \tfrac{1}{2}$. The left inequality holds for all $n \geq 7$ with $c_1 = \tfrac{1}{14}$ (since then $\tfrac{1}{2} - \tfrac{3}{7} = \tfrac{1}{14}$). Choosing $n_0 = 7$, $c_1 = \tfrac{1}{14}$, $c_2 = \tfrac{1}{2}$ completes the proof.■
$O$-Notation — Asymptotic Upper Bound
Big-$O$ gives an upper bound only, not necessarily tight. Note $\Theta(g(n)) \subseteq O(g(n))$.
The idiomatic use: "the running time of insertion sort is $O(n^2)$" — meaning no input causes more than a constant times $n^2$ steps. Because $O$ bounds the worst case from above, it is the workhorse of the field.
$\Omega$-Notation — Asymptotic Lower Bound
($\Rightarrow$) If $f = \Theta(g)$, there exist $c_1, c_2, n_0$ with $c_1 g \leq f \leq c_2 g$ for $n \geq n_0$. The upper inequality with constant $c = c_2$ gives $f = O(g)$; the lower with $c = c_1$ gives $f = \Omega(g)$.
($\Leftarrow$) If $f = O(g)$ with constant $c_2$ and $n_0' $, and $f = \Omega(g)$ with constant $c_1$ and $n_0''$, then for all $n \geq \max(n_0', n_0'')$ we have $c_1 g \leq f \leq c_2 g$, establishing $f = \Theta(g)$.■
$o$ and $\omega$ — Non-tight Bounds
$f(n) = o(g(n))$ if for every positive constant $c$, there exists $n_0 > 0$ with $0 \leq f(n) < c g(n)$ for all $n \geq n_0$. Equivalently: $\displaystyle\lim_{n \to \infty} \frac{f(n)}{g(n)} = 0$.
$f(n) = \omega(g(n))$ if for every positive constant $c$, there exists $n_0$ with $0 \leq c g(n) < f(n)$ for all $n \geq n_0$. Equivalently: $\displaystyle\lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty$.
A useful analogy: $O$ is "$\leq$", $\Omega$ is "$\geq$", $\Theta$ is "$=$", $o$ is "$<$", and $\omega$ is "$>$". Examples: $2n = o(n^2)$, $n^2/2 \neq o(n^2)$; $n^2 = \omega(n)$, $n^2 \neq \omega(n^2)$.
Properties — Transitivity, Reflexivity, Symmetry
Transitivity. For any notation in $\{O, \Theta, \Omega, o, \omega\}$: $$f = \Theta(g)\ \text{and}\ g = \Theta(h) \Rightarrow f = \Theta(h),$$ and analogously for the others.
Reflexivity. $f = \Theta(f)$, $f = O(f)$, $f = \Omega(f)$. (Not $o$ or $\omega$!)
Symmetry. $f = \Theta(g) \iff g = \Theta(f)$.
Transpose symmetry. $f = O(g) \iff g = \Omega(f)$; $f = o(g) \iff g = \omega(f)$.
Proof Techniques — Worked Problems
Suppose for contradiction that $6n^3 = \Theta(n^2)$. Then there exist $c_2, n_0 > 0$ such that $6n^3 \leq c_2 n^2$ for all $n \geq n_0$, i.e. $6n \leq c_2$ for all $n \geq n_0$. But $6n \to \infty$, so this is false. Contradiction.■
We need constants $c_1, c_2, n_0$ such that $c_1(f+g) \leq \max(f,g) \leq c_2(f+g)$ for $n \geq n_0$.
Upper bound. Clearly $\max(f,g) \leq f + g$, so $c_2 = 1$ works.
Lower bound. We have $f + g \leq 2 \max(f, g)$, hence $\max(f,g) \geq \tfrac{1}{2}(f + g)$. So $c_1 = \tfrac{1}{2}$ works.
Both bounds hold as soon as both $f, g \geq 0$, which is true for all $n \geq n_0$ by hypothesis.■
3.2Standard Notations and Common Functions
Before leaving growth of functions, we record a catalogue of standard objects — monotonicity, floors and ceilings, modular arithmetic, polynomials, exponentials, logarithms, factorials, and iterated logarithms — together with the comparisons between them that recur constantly in analysis.
Monotonicity
A function $f$ is monotonically increasing if $m \leq n \Rightarrow f(m) \leq f(n)$; strictly increasing if the conclusion is $f(m) < f(n)$; and analogously for decreasing.
Floors and Ceilings
$\lfloor x \rfloor$ is the largest integer $\leq x$, $\lceil x \rceil$ the smallest integer $\geq x$. For all real $x$: $x - 1 < \lfloor x \rfloor \leq x \leq \lceil x \rceil < x + 1$. Both are monotonically increasing. A fact we will use repeatedly: for any integer $n$ and positive integers $a$ and $b$,
$$\left\lceil \frac{\lceil n/a \rceil}{b} \right\rceil = \left\lceil \frac{n}{ab} \right\rceil, \qquad \left\lfloor \frac{\lfloor n/a \rfloor}{b} \right\rfloor = \left\lfloor \frac{n}{ab} \right\rfloor.$$
Polynomials, Exponentials, Logarithms
A polynomial in $n$ of degree $d$ is a function $p(n) = \sum_{i=0}^{d} a_i n^i$ with $a_d \neq 0$. If $a_d > 0$, then $p(n) = \Theta(n^d)$. We say $f(n)$ is polynomially bounded if $f(n) = O(n^k)$ for some constant $k$.
The following inequality captures the key comparison between exponentials and polynomials:
A symmetric statement holds for logarithms versus polynomials:
Taken together, these are the backbone of the algorithm designer's intuition:
constants $\prec$ logs $\prec$ polynomials $\prec$ exponentials $\prec$ factorials.
Growth Rate Reference Table
| Notation | Name | Example | Comment |
|---|---|---|---|
| $\Theta(1)$ | constant | array indexing | independent of $n$ |
| $\Theta(\log n)$ | logarithmic | binary search | base is irrelevant (constant factor) |
| $\Theta(n)$ | linear | linear search | touch each input once |
| $\Theta(n \log n)$ | linearithmic | merge sort | optimal comparison sort |
| $\Theta(n^2)$ | quadratic | insertion sort (worst) | nested loops over input |
| $\Theta(n^3)$ | cubic | naïve matrix multiply | triple-nested loops |
| $\Theta(2^n)$ | exponential | subset enumeration | intractable for large $n$ |
| $\Theta(n!)$ | factorial | brute-force TSP | grows faster than any exponential |
Logarithm Identities
The following identities are used without comment throughout CLRS. For $a, b, c > 0$ and $n$ real:
$a = b^{\log_b a}$ • $\log_c(ab) = \log_c a + \log_c b$ • $\log_b a^n = n \log_b a$
$\log_b a = \dfrac{\log_c a}{\log_c b}$ • $\log_b(1/a) = -\log_b a$ • $\log_b a = \dfrac{1}{\log_a b}$
$a^{\log_b c} = c^{\log_b a}$
Factorials — Stirling's Approximation
Functional Iteration and $\log^*$
The notation $f^{(i)}(n)$ means $f$ applied $i$ times to $n$: $f^{(0)}(n) = n$, $f^{(i)}(n) = f(f^{(i-1)}(n))$ for $i \geq 1$. The iterated logarithm is
$$\lg^*(n) = \min\{\, i \geq 0 : \lg^{(i)}(n) \leq 1 \,\}.$$
That is, $\lg^* n$ counts how many times we must apply $\lg$ to $n$ before the result drops to at most $1$. This function grows unimaginably slowly:
| $n$ | $\lg^* n$ |
|---|---|
| $n \leq 1$ | $0$ |
| $n = 2$ | $1$ |
| $n \in \{3, 4\}$ | $2$ |
| $n \in [5, 16]$ | $3$ |
| $n \in [17, 65536]$ | $4$ |
| $n \in [65537, 2^{65536}]$ | $5$ |
For any conceivable physical input, $\lg^* n \leq 5$. This is why algorithms with running times such as $O(m \cdot \alpha(m, n))$ (where $\alpha$ is the inverse Ackermann function — even slower than $\lg^*$) are said to run in "effectively linear" time.
Divide & Conquer
When an algorithm is recursive, its running time is naturally expressed as a recurrence — an equation defining $T(n)$ in terms of $T$ at smaller values. This chapter develops three techniques for solving such recurrences: the substitution method (guess and prove by induction), the recursion-tree method (sum work level-by-level to generate a guess), and the master method (apply a theorem). In practice one often uses the recursion tree to guess, then the substitution method to prove, or — when applicable — skips directly to the master theorem.
4.3The Substitution Method
The substitution method has two steps:
- Guess the form of the solution.
- Use mathematical induction to find the constants and show that the solution works.
The name comes from substituting the guessed solution for the function when the inductive hypothesis is applied to smaller values.
Worked Example — Solving $T(n) = 2T(\lfloor n/2 \rfloor) + n$
We will show that $T(n) = O(n \lg n)$. (Observe that this is the merge-sort recurrence up to notation.)
Guess. $T(n) \leq c n \lg n$ for some constant $c > 0$ and all $n \geq n_0$.
Inductive step. Assume the bound holds for all values $m < n$ (with $m \geq n_0$). We show it holds for $n$. By the recurrence, $$T(n) = 2T(\lfloor n/2 \rfloor) + n.$$ Applying the inductive hypothesis to $\lfloor n/2 \rfloor$: $$T(n) \leq 2 \cdot c \lfloor n/2 \rfloor \lg(\lfloor n/2 \rfloor) + n \leq 2 \cdot c \cdot (n/2) \cdot \lg(n/2) + n = c n \lg(n/2) + n.$$ Since $\lg(n/2) = \lg n - 1$, $$T(n) \leq c n \lg n - c n + n = c n \lg n - (c - 1) n.$$ For the desired bound $T(n) \leq c n \lg n$, we need $(c - 1) n \geq 0$, i.e. $c \geq 1$.
Base case. We need $n_0$ for which the induction can start. Taking $n_0 = 2$ and any $c \geq T(2)/(2 \lg 2) = T(2)/2$ ensures $T(2) \leq c \cdot 2 \lg 2 = 2c$. (We cannot use $n = 1$ as a base case because $\lg 1 = 0$ and the bound $c \cdot 1 \cdot \lg 1 = 0$ would require $T(1) = 0$. This is why substitution proofs typically take the smallest two values as base cases.) Taking $c = \max(1, T(2)/2)$ completes the argument.■
The Art of Making a Good Guess
There is no recipe — experience and the recursion-tree method (§4.4) are the two chief sources of guesses. When one recurrence resembles another whose solution is known, the known solution is a natural first guess. For example:
- $T(n) = 2T(\lfloor n/2 \rfloor + 17) + n$ looks like merge sort — guess $O(n \lg n)$; the $+17$ perturbation is asymptotically irrelevant.
- If loose bounds are known ($T = \Omega(n)$ and $T = O(n^2)$), work inward: first refine the lower bound to $\Omega(n \lg n)$, then the upper to $O(n \lg n)$.
Subtleties — The Need for Stronger Induction Hypotheses
Sometimes the obvious guess cannot be proved even though it is correct. Consider
$$T(n) = T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + 1.$$
We might guess $T(n) = O(n)$, and attempt to prove $T(n) \leq c n$. Then $$T(n) \leq c \lfloor n/2 \rfloor + c \lceil n/2 \rceil + 1 = c n + 1,$$ which is not $\leq c n$. The induction fails — but the guess is right! The fix is to strengthen the hypothesis by subtracting a lower-order term: guess $T(n) \leq c n - d$ for constants $c, d > 0$. Then
$$T(n) \leq (c \lfloor n/2 \rfloor - d) + (c \lceil n/2 \rceil - d) + 1 = c n - 2d + 1 \leq c n - d$$
as long as $d \geq 1$. The stronger hypothesis is easier to prove because it leaves more "slack" to absorb the additive $+1$.
Pitfalls
Never conclude $T(n) = O(n)$ from an argument of the form
$T(n) \leq 2(c \lfloor n/2 \rfloor) + n = c n + n = O(n)$.
This is wrong. The inductive step must produce exactly the form of the hypothesis $T(m) \leq c m$ — with the same constant $c$ — not merely some $O(n)$ expression. Constants must be pinned down explicitly.
4.4The Recursion-Tree Method
In a recursion tree, each node represents the cost of a single subproblem in the set of recursive function invocations. We sum the costs within each level to obtain a per-level total, then sum those over all levels. The recursion tree is especially good at generating a guess that we can then verify by the substitution method.
Case Study — Merge Sort, Revisited
Recall $T(n) = 2T(n/2) + cn$ for $n > 1$ (with $c$ some constant), $T(1) = c$. Assume for exposition that $n$ is a power of $2$.
Level 0 (the root) does $cn$ work and has $1$ node.
Level 1 has $2$ nodes, each doing $c(n/2)$ work — total: $cn$.
Level 2 has $4$ nodes, each doing $c(n/4)$ work — total: $cn$.
Level $i$ has $2^i$ nodes, each doing $c(n/2^i)$ work — total: $cn$.
The recursion bottoms out at subproblem size $1$, which happens at level $\lg n$.
There are thus $\lg n + 1$ levels, each contributing $cn$ work: $$T(n) = cn (\lg n + 1) = cn \lg n + cn = \Theta(n \lg n).$$
Interactive Recursion Tree — Merge Sort
A Harder Example — $T(n) = 3T(n/4) + cn^2$
Let's use the recursion tree to guess a solution, then verify by substitution.
The root has cost $cn^2$ and $3$ children of size $n/4$; those children have $9$ grandchildren of size $n/16$; and so on. At level $i$ there are $3^i$ nodes, each with cost $c(n/4^i)^2 = c n^2 / 16^i$. The per-level cost is $$3^i \cdot \frac{c n^2}{16^i} = \left(\frac{3}{16}\right)^i c n^2.$$
The tree has depth $\log_4 n$ (since subproblem size $n/4^i = 1$ when $i = \log_4 n$). Summing the per-level costs: $$T(n) = \sum_{i=0}^{\log_4 n - 1} \left(\frac{3}{16}\right)^i c n^2 + \Theta(n^{\log_4 3}).$$ The final additive term is the total cost of the leaves: there are $3^{\log_4 n} = n^{\log_4 3}$ leaves, each costing $\Theta(1)$.
Since $3/16 < 1$, the geometric sum is bounded above by $\sum_{i=0}^{\infty} (3/16)^i c n^2 = \frac{c n^2}{1 - 3/16} = \frac{16}{13} c n^2$. Thus $T(n) = O(n^2)$. The leaf contribution $n^{\log_4 3} \approx n^{0.79}$ is dominated by $n^2$.
We verify by substitution: guess $T(n) \leq d n^2$. The recurrence gives $T(n) \leq 3 d (n/4)^2 + c n^2 = \tfrac{3}{16} d n^2 + c n^2 = (\tfrac{3}{16} d + c) n^2.$ We want this $\leq d n^2$, i.e. $\tfrac{3}{16} d + c \leq d$, i.e. $d \geq \tfrac{16}{13} c$. So choose $d = \tfrac{16}{13} c$ and the bound holds. Conclusion: $T(n) = O(n^2)$ — and a matching lower bound from the root term gives $T(n) = \Theta(n^2)$.
Unbalanced Splits — $T(n) = T(n/3) + T(2n/3) + cn$
Here subproblem sizes shrink at different rates. The longest root-to-leaf path goes through the $2/3$-children at each step; it reaches size $1$ when $(2/3)^k n = 1$, i.e. $k = \log_{3/2} n$. The shortest path goes through the $1/3$-children and has length $\log_3 n$. For a full tree (before any leaf is reached), each level sums to $cn$: a subproblem of size $m$ contributes $cm$, and the subproblems at each level partition the original $n$. So for at least the first $\log_3 n$ levels the per-level total is exactly $cn$. The upper bound is $T(n) = O(cn \log_{3/2} n) = O(n \lg n)$; a matching lower bound from counting the "full" part gives $T(n) = \Omega(n \lg n)$, so $T(n) = \Theta(n \lg n)$.
4.5The Master Method
The master method is a cookbook for recurrences of the form $$T(n) = a T(n/b) + f(n), \qquad a \geq 1, \quad b > 1, \quad f(n) \text{ asymptotically positive}.$$ Here the problem of size $n$ is divided into $a$ subproblems of size $n/b$, and $f(n)$ is the cost of the divide and combine steps. (We interpret $n/b$ as either $\lfloor n/b \rfloor$ or $\lceil n/b \rceil$ throughout — the asymptotic answer is the same.)
- Case 1. If $f(n) = O(n^{\log_b a - \varepsilon})$ for some constant $\varepsilon > 0$, then $T(n) = \Theta(n^{\log_b a})$.
- Case 2. If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{\log_b a} \lg n)$.
- Case 3. If $f(n) = \Omega(n^{\log_b a + \varepsilon})$ for some constant $\varepsilon > 0$, and if $a f(n/b) \leq c f(n)$ for some constant $c < 1$ and all sufficiently large $n$ (the regularity condition), then $T(n) = \Theta(f(n))$.
Interpretation
The watershed $n^{\log_b a}$ is the cost contributed by the leaves of the recursion tree: there are $a^{\log_b n} = n^{\log_b a}$ of them, each of $\Theta(1)$ cost.
- Case 1. $f$ is polynomially smaller than the watershed — leaves dominate, $T(n) = \Theta(n^{\log_b a})$.
- Case 2. $f$ matches the watershed up to constants — all levels are equal, add an extra $\lg n$ factor, $T(n) = \Theta(n^{\log_b a} \lg n)$.
- Case 3. $f$ is polynomially larger than the watershed and regular — the root dominates, $T(n) = \Theta(f(n))$.
Interactive Master Theorem Calculator
Enter $a$, $b$, and $f(n)$ (as $n^k$ or $n^k \lg n$) to have the cases mechanically applied.
Canonical Applications
$a = 2, b = 2$, so $\log_b a = 1$, watershed $n$. $f(n) = n = \Theta(n) = \Theta(n^{\log_b a})$. This is Case 2, giving $T(n) = \Theta(n \lg n)$. ✓
$a = 1, b = 2$, so $\log_b a = 0$, watershed $n^0 = 1$. $f(n) = \Theta(1) = \Theta(n^0)$. Case 2: $T(n) = \Theta(\lg n)$. ✓
$a = 7, b = 2$, $\log_b a = \lg 7 \approx 2.807$. $f(n) = n^2 = O(n^{\lg 7 - \varepsilon})$ for, say, $\varepsilon = 0.8$. Case 1: $T(n) = \Theta(n^{\lg 7})$. This is the celebrated $O(n^{2.81})$ matrix-multiplication bound.
$\log_b a = \log_4 3 \approx 0.793$, watershed $n^{0.793}$. $f(n) = n \lg n = \Omega(n^{0.793 + \varepsilon})$ for, say, $\varepsilon = 0.1$. We verify regularity: $a f(n/b) = 3 \cdot (n/4)\lg(n/4) = (3/4) n \lg n - (3/4) n \lg 4 \leq (3/4) n \lg n = (3/4) f(n)$ for large $n$, with $c = 3/4 < 1$. Case 3: $T(n) = \Theta(n \lg n)$.
$\log_b a = 1$, watershed $n$. $f(n) = n \lg n$. Is $n \lg n = \Omega(n^{1+\varepsilon})$ for some $\varepsilon > 0$? No: $\lg n = o(n^\varepsilon)$ for every $\varepsilon > 0$. So we are in the gap between Cases 2 and 3, and the master theorem does not apply. A different technique — recursion tree or substitution — gives $T(n) = \Theta(n \lg^2 n)$.
Summary Table — When to Use Which Method
| Technique | Best for | Proves? |
|---|---|---|
| Substitution | Verifying a specific guess | Yes, with induction |
| Recursion tree | Generating a guess | Not rigorously; use substitution to prove |
| Master method | Recurrences of form $aT(n/b) + f(n)$ where $f$ has a clean form | Yes (when a case applies) |
Selected Exercises
Exercise 4.5-1 — Apply the master method
(a) $T(n) = 2T(n/4) + 1$. $\log_4 2 = 1/2$, watershed $\sqrt{n}$. $f(n) = 1 = O(n^{1/2 - \varepsilon})$ for $\varepsilon = 1/2$. Case 1: $T(n) = \Theta(\sqrt{n})$.
(b) $T(n) = 2T(n/4) + \sqrt{n}$. Watershed $\sqrt{n}$; $f = \sqrt{n} = \Theta(\sqrt{n})$. Case 2: $T(n) = \Theta(\sqrt{n} \lg n)$.
(c) $T(n) = 2T(n/4) + n$. Watershed $\sqrt{n}$; $f = n = \Omega(n^{1/2 + 1/2})$. Regularity: $2 \cdot (n/4) = n/2 = (1/2) f(n)$, with $c = 1/2 < 1$. Case 3: $T(n) = \Theta(n)$.
(d) $T(n) = 2T(n/4) + n^2$. Watershed $\sqrt{n}$; $f = n^2 = \Omega(n^{1/2 + 3/2})$. Regularity: $2(n/4)^2 = n^2/8 = (1/8) f(n)$. Case 3: $T(n) = \Theta(n^2)$.
Exercise 4.5-2 — Beat Strassen
Problem. Your algorithm multiplies two $3 \times 3$ matrices using $k$ scalar multiplications (and $O(n^2)$ work on submatrices). For which values of $k$ is your algorithm asymptotically faster than Strassen's?
The recurrence becomes $T(n) = k T(n/3) + \Theta(n^2)$. Strassen is $\Theta(n^{\lg 7}) \approx \Theta(n^{2.807})$. By the master theorem, if $\log_3 k > 2$ (i.e. $k > 9$), Case 1 gives $T(n) = \Theta(n^{\log_3 k})$, and we beat Strassen iff $\log_3 k < \lg 7$, i.e. $k < 3^{\lg 7} \approx 21.85$. So the interesting range is $k \in \{10, 11, \ldots, 21\}$. (Standard matrix multiplication uses $k = 27$; Pan's $3\times3$ algorithm achieves $k = 21$, which gives $O(n^{2.78})$ — faster than Strassen.)