Lab 08: Kernel Approximation Methods for Big Data — Instructions

Welcome to Lab 08! You will implement and compare four kernel computation strategies:

Exact RBF kernel — the gold standard, and why it breaks at scale
Random Fourier Features (RFF) — the Rahimi & Recht 2007 landmark
Orthogonal Random Features (ORF) — lower-variance improvement over RFF
Nyström approximation — landmark-based low-rank decomposition

Additional Resources

Tips & Reference Guide — theoretical explanations, formulas, and implementation hints.

Pre-flight Checklist

Checkout the main branch: git checkout main
Pull the latest changes: git pull
Create a local branch: git checkout -b <your_branch_name>
Ensure dependencies are up to date (scipy was added for this lab):
```
uv sync
```
Run the tests (all will fail until you implement the functions):
```
uv run pytest tests/test_lab08.py -v
```

Exercise 1: Exact RBF Kernel — Understanding the Problem (TODOs 1–2)

Objective

Implement the Gaussian kernel and its Gram matrix. Observe how memory requirements grow as O(n²).

You will edit src/lab08.py.

rbf_kernel(x, y, sigma) (TODO 1): Compute the scalar RBF kernel value:

\[K(x, y) = \exp\!\left(-\frac{\|x - y\|^2}{2\sigma^2}\right)\]
rbf_kernel_matrix(X, sigma) (TODO 2): Compute the full n×n Gram matrix where K[i, j] = rbf_kernel(X[i], X[j], sigma).

A Python double loop would call rbf_kernel \(n^2\) times — correct but ~100× slower than numpy. Instead, compute all pairwise distances at once using the algebraic identity:

\[\|x_i - x_j\|^2 = \|x_i\|^2 + \|x_j\|^2 - 2\,x_i^\top x_j\]

This breaks into three numpy operations you can assemble into an (n, n) matrix:
- Squared row norms: a vector of length \(n\) where entry \(i\) is \(\|x_i\|^2\).
- Outer sum of norms: add that vector to itself transposed — broadcasting turns two (n,) vectors into an (n, n) matrix.
- Inner products: X @ X.T gives all \(x_i^\top x_j\) at once, also (n, n).
Clamp the result to zero before the exp (floating point can produce values like -1e-14 on the diagonal). Finally emit a ResourceWarning when n > 5_000.

Validation:

uv run pytest tests/test_lab08.py -k "TestRbfKernel or TestRbfKernelMatrix" -v

Try running the script to see memory grow:

uv run python src/lab08.py

Exercise 2: Random Fourier Features (TODOs 3–5)

Objective

Implement the Rahimi & Recht kernel approximation. Experience how an O(n·D) feature map approximates an O(n²) kernel matrix.

sample_rff_weights(D, d, sigma, rng) (TODO 3): Sample frequency vectors and phase shifts:

\[\omega_j \sim \mathcal{N}(0,\, I/\sigma^2), \quad b_j \sim \text{Uniform}[0,\, 2\pi]\]
rff_features(X, omega, b) (TODO 4): Compute the explicit feature map:

\[Z = \sqrt{2/D}\cdot\cos(X\omega^\top + b), \quad Z \in \mathbb{R}^{n \times D}\]
rff_kernel_approx(X, Y, omega, b) (TODO 5): Approximate \(K(X, Y) \approx Z_X Z_Y^\top\).

Validation:

uv run pytest tests/test_lab08.py -k "TestRff and not quality" -v

Note

The test_approximation_quality test requires TODO 13 (approximation_error) to be implemented first. Run it once you reach Exercise 5:

uv run pytest tests/test_lab08.py -k "TestRff" -v

Exercise 3: Orthogonal Random Features (TODOs 6–7)

Objective

Improve on standard RFF by ensuring frequency vectors are mutually orthogonal, reducing approximation variance for the same budget D.

sample_orf_weights(D, d, sigma, rng) (TODO 6): Build orthogonal frequency blocks:
- Draw \(G \sim \mathcal{N}(0, I)\) of shape \((d, d)\), compute QR decomposition.
- Rescale rows of \(Q^\top\) by independent \(\chi_d\)-distributed norms.
- Repeat for \(\lceil D/d \rceil\) blocks, stack, trim to \(D\) rows, divide by \(\sigma\).
orf_features(X, omega, b) (TODO 7): Apply the same feature formula as RFF — the improvement comes entirely from how omega was constructed.

Validation:

uv run pytest tests/test_lab08.py -k "TestOrf" -v

Exercise 4: Nyström Approximation (TODOs 8–10)

Objective

Implement the landmark-based Nyström method. Unlike RFF/ORF (which use random projections), Nyström leverages actual data points to build the approximation.

select_landmarks(X, m, strategy, rng) (TODO 8): Select \(m\) landmark points from \(X\). Support "random" (uniform without replacement) and "stride" (evenly spaced indices).
nystrom_features(X, landmarks, sigma) (TODO 9): Compute the Nyström feature map:

\[Z = K_{nm}\, K_{mm}^{-1/2}\]

where \(K_{mm}^{-1/2} = V \Lambda^{-1/2} V^\top\) (use scipy.linalg.eigh for stability, clip eigenvalues to \(\ge 10^{-10}\)).
nystrom_kernel_approx(X, Y, landmarks, sigma) (TODO 10): Return \(Z_X Z_Y^\top\).

Validation:

uv run pytest tests/test_lab08.py -k "TestNystrom or TestSelect" -v

Exercise 5: Kernel Ridge Regression (TODOs 11–13)

Objective

Use exact and approximate kernel matrices in the same downstream task to fairly compare approximation quality vs. prediction performance.

kernel_ridge_fit(K_train, y, lam) (TODO 11): Solve the dual system:

\[\alpha^* = (K + \lambda I)^{-1} y\]

Use np.linalg.solve (do not explicitly invert the matrix).
kernel_ridge_predict(K_test_train, alpha) (TODO 12): Compute predictions:

\[\hat{y} = K_{\text{test,train}}\, \alpha^*\]
approximation_error(K_exact, K_approx) (TODO 13): Compute the relative Frobenius-norm error:

\[\text{error} = \frac{\|K - \tilde{K}\|_F}{\|K\|_F}\]

Validation:

uv run pytest tests/test_lab08.py -k "TestKernelRidge or TestApprox" -v

Exercise 6: Benchmarking (TODOs 14–15)

Objective

Quantify the time, memory, and accuracy tradeoffs between all four methods across increasing dataset sizes.

benchmark_methods(n_values, D_values, d, sigma) (TODO 14): For each (n, D) pair, generate synthetic data, then time and measure peak memory for:
- Exact kernel (skip for n > 3_000)
- RFF, ORF, Nyström (always)
Return a list of dicts with keys: method, n, D, time_s, memory_mb, approx_error.
plot_results(results): Already implemented for you — call it after benchmark_methods to visualise the results.

Validation:

uv run pytest tests/test_lab08.py -k "TestBenchmark" -v

Run the full solution and observe the scaling:

uv run python src/lab08_solutions.py

What to Submit

When all tests pass:

uv run pytest tests/test_lab08.py -v

Submit exactly:

src/lab08.py — your completed implementation with the STUDENT REFLECTION filled in.

Do NOT submit: __pycache__ directories, lab08_benchmark.png, or the solutions file.

Questions? Check the Tips & Reference Guide or ask your instructor.