Lab 08: Kernel Approximation Methods for Big Data — Instructions
Welcome to Lab 08! You will implement and compare four kernel computation strategies:
- Exact RBF kernel — the gold standard, and why it breaks at scale
- Random Fourier Features (RFF) — the Rahimi & Recht 2007 landmark
- Orthogonal Random Features (ORF) — lower-variance improvement over RFF
- Nyström approximation — landmark-based low-rank decomposition
Additional Resources
- Tips & Reference Guide — theoretical explanations, formulas, and implementation hints.
Pre-flight Checklist
- Checkout the
mainbranch:git checkout main - Pull the latest changes:
git pull - Create a local branch:
git checkout -b <your_branch_name> - Ensure dependencies are up to date (scipy was added for this lab):
- Run the tests (all will fail until you implement the functions):
Exercise 1: Exact RBF Kernel — Understanding the Problem (TODOs 1–2)
Objective
Implement the Gaussian kernel and its Gram matrix. Observe how memory requirements grow as O(n²).
You will edit src/lab08.py.
-
rbf_kernel(x, y, sigma)(TODO 1): Compute the scalar RBF kernel value:\[K(x, y) = \exp\!\left(-\frac{\|x - y\|^2}{2\sigma^2}\right)\] -
rbf_kernel_matrix(X, sigma)(TODO 2): Compute the full n×n Gram matrix whereK[i, j] = rbf_kernel(X[i], X[j], sigma).A Python double loop would call
rbf_kernel\(n^2\) times — correct but ~100× slower than numpy. Instead, compute all pairwise distances at once using the algebraic identity:\[\|x_i - x_j\|^2 = \|x_i\|^2 + \|x_j\|^2 - 2\,x_i^\top x_j\]This breaks into three numpy operations you can assemble into an
(n, n)matrix:- Squared row norms: a vector of length \(n\) where entry \(i\) is \(\|x_i\|^2\).
- Outer sum of norms: add that vector to itself transposed — broadcasting turns two
(n,)vectors into an(n, n)matrix. - Inner products:
X @ X.Tgives all \(x_i^\top x_j\) at once, also(n, n).
Clamp the result to zero before the
exp(floating point can produce values like-1e-14on the diagonal). Finally emit aResourceWarningwhenn > 5_000.
Validation:
Try running the script to see memory grow:
Exercise 2: Random Fourier Features (TODOs 3–5)
Objective
Implement the Rahimi & Recht kernel approximation. Experience how an O(n·D) feature map approximates an O(n²) kernel matrix.
-
sample_rff_weights(D, d, sigma, rng)(TODO 3): Sample frequency vectors and phase shifts:\[\omega_j \sim \mathcal{N}(0,\, I/\sigma^2), \quad b_j \sim \text{Uniform}[0,\, 2\pi]\] -
rff_features(X, omega, b)(TODO 4): Compute the explicit feature map:\[Z = \sqrt{2/D}\cdot\cos(X\omega^\top + b), \quad Z \in \mathbb{R}^{n \times D}\] -
rff_kernel_approx(X, Y, omega, b)(TODO 5): Approximate \(K(X, Y) \approx Z_X Z_Y^\top\).
Validation:
Note
The test_approximation_quality test requires TODO 13 (approximation_error) to be implemented first. Run it once you reach Exercise 5:
Exercise 3: Orthogonal Random Features (TODOs 6–7)
Objective
Improve on standard RFF by ensuring frequency vectors are mutually orthogonal, reducing approximation variance for the same budget D.
-
sample_orf_weights(D, d, sigma, rng)(TODO 6): Build orthogonal frequency blocks:- Draw \(G \sim \mathcal{N}(0, I)\) of shape \((d, d)\), compute QR decomposition.
- Rescale rows of \(Q^\top\) by independent \(\chi_d\)-distributed norms.
- Repeat for \(\lceil D/d \rceil\) blocks, stack, trim to \(D\) rows, divide by \(\sigma\).
-
orf_features(X, omega, b)(TODO 7): Apply the same feature formula as RFF — the improvement comes entirely from howomegawas constructed.
Validation:
Exercise 4: Nyström Approximation (TODOs 8–10)
Objective
Implement the landmark-based Nyström method. Unlike RFF/ORF (which use random projections), Nyström leverages actual data points to build the approximation.
-
select_landmarks(X, m, strategy, rng)(TODO 8): Select \(m\) landmark points from \(X\). Support"random"(uniform without replacement) and"stride"(evenly spaced indices). -
nystrom_features(X, landmarks, sigma)(TODO 9): Compute the Nyström feature map:\[Z = K_{nm}\, K_{mm}^{-1/2}\]where \(K_{mm}^{-1/2} = V \Lambda^{-1/2} V^\top\) (use
scipy.linalg.eighfor stability, clip eigenvalues to \(\ge 10^{-10}\)). -
nystrom_kernel_approx(X, Y, landmarks, sigma)(TODO 10): Return \(Z_X Z_Y^\top\).
Validation:
Exercise 5: Kernel Ridge Regression (TODOs 11–13)
Objective
Use exact and approximate kernel matrices in the same downstream task to fairly compare approximation quality vs. prediction performance.
-
kernel_ridge_fit(K_train, y, lam)(TODO 11): Solve the dual system:\[\alpha^* = (K + \lambda I)^{-1} y\]Use
np.linalg.solve(do not explicitly invert the matrix). -
kernel_ridge_predict(K_test_train, alpha)(TODO 12): Compute predictions:\[\hat{y} = K_{\text{test,train}}\, \alpha^*\] -
approximation_error(K_exact, K_approx)(TODO 13): Compute the relative Frobenius-norm error:\[\text{error} = \frac{\|K - \tilde{K}\|_F}{\|K\|_F}\]
Validation:
Exercise 6: Benchmarking (TODOs 14–15)
Objective
Quantify the time, memory, and accuracy tradeoffs between all four methods across increasing dataset sizes.
-
benchmark_methods(n_values, D_values, d, sigma)(TODO 14): For each(n, D)pair, generate synthetic data, then time and measure peak memory for:- Exact kernel (skip for
n > 3_000) - RFF, ORF, Nyström (always)
Return a list of dicts with keys:
method,n,D,time_s,memory_mb,approx_error. - Exact kernel (skip for
-
plot_results(results): Already implemented for you — call it afterbenchmark_methodsto visualise the results.
Validation:
Run the full solution and observe the scaling:
What to Submit
When all tests pass:
Submit exactly:
src/lab08.py— your completed implementation with theSTUDENT REFLECTIONfilled in.
Do NOT submit: __pycache__ directories, lab08_benchmark.png, or the solutions file.
Questions? Check the Tips & Reference Guide or ask your instructor.