Feature Detection
Module 5 - Corners, Harris, SIFT, ORB, Matching
What are Image Features?
A feature is a compact, meaningful piece of information extracted from an image. Good features are repeatable (found at the same location despite viewpoint/lighting changes), distinctive (distinguishable from each other), and efficient (fast to compute).
Features enable: image stitching (panoramas), 3D reconstruction, object recognition, AR tracking, and more.
Corners vs Edges vs Flat Regions
Consider a small image patch and how it moves:
Looks the same in all directions. No information. Can't localize.
Can localize perpendicular to edge, not along it. Ambiguous.
Looks different in all directions. Uniquely localizable. Best feature!
Harris Corner Detector
Harris (1988) measures how much intensity changes when you shift a window in any direction, using the Structure Tensor M:
| M | structure tensor (2×2 matrix) - summarizes how intensity changes in all directions around a pixel |
| Σ w(x,y) | weighted sum over a local neighborhood window; w is a Gaussian that gives more weight to the center |
| Iₓ | horizontal image gradient - how much intensity changes left/right at each pixel |
| Iᵧ | vertical image gradient - how much intensity changes up/down at each pixel |
| Iₓ² | squared horizontal gradient - strong in regions with vertical edges |
| Iᵧ² | squared vertical gradient - strong in regions with horizontal edges |
| IₓIᵧ | cross term - large when both gradients are strong simultaneously, indicating a corner |
| [ … ] | 2×2 matrix notation; the off-diagonal IₓIᵧ terms capture edge orientation correlation |
The Harris response R is:
| R | Harris response score - positive = corner, negative = edge, near zero = flat region |
| det(M) | determinant of M = λ₁ × λ₂ - large when intensity changes strongly in both directions (corner) |
| trace(M) | trace of M = λ₁ + λ₂ - sum of eigenvalues, measuring total gradient energy in the window |
| k | sensitivity constant, typically 0.04–0.06 - trades off corner vs edge detection; higher k = fewer corners |
| λ₁, λ₂ | eigenvalues of M - represent the dominant gradient strengths in two orthogonal directions |
| λ₁λ₂ | product of eigenvalues - large only if both directions have strong gradients (true corner) |
| (λ₁+λ₂)² | squared sum - penalizes flat regions and single edges where one eigenvalue dominates |
- R ≫ 0: Corner (both eigenvalues large)
- R ≈ 0: Flat region (both eigenvalues small)
- R ≪ 0: Edge (one eigenvalue much larger)
k is typically 0.04–0.06. After computing R, apply a threshold and non-maximum suppression to get corner locations.
Harris Corner Detector - Live
SIFT and ORB
Harris finds corners but they're not scale-invariant. SIFT (Scale-Invariant Feature Transform, Lowe 2004) detects keypoints at multiple scales using a Difference-of-Gaussian (DoG) pyramid, then builds a 128-dim descriptor based on gradient orientations. Invariant to scale, rotation, and partial illumination change.
ORB (Oriented FAST + Rotated BRIEF) is a fast, free alternative to SIFT. It uses FAST for keypoint detection and BRIEF for binary descriptors. ~100× faster than SIFT, suitable for real-time.
| Detector | Scale-inv. | Rot-inv. | Speed | License |
|---|---|---|---|---|
| Harris | ✗ | ✗ | Fast | Free |
| SIFT | ✓ | ✓ | Slow | Free (post-2020) |
| SURF | ✓ | ✓ | Medium | Patented |
| ORB | ~✓ | ✓ | Very fast | Free |
Feature Matching
Once you have descriptors from two images, find correspondences by comparing descriptors. Common strategies:
- Brute-force: Compare every descriptor to every other. O(n²). Use Hamming distance for binary descriptors (ORB), L2 for float descriptors (SIFT).
- FLANN: Approximate nearest neighbour - much faster for large descriptor sets.
- Ratio test (Lowe's): Only accept match if best match is significantly better than second-best:
d1 / d2 < 0.75. Removes ambiguous matches.
Matching Visualizer (Synthetic)
Quiz
Check your understanding
1. In the Harris response, a pixel with R ≪ 0 (large negative) indicates:
2. What is the main advantage of ORB over SIFT?
3. Lowe's ratio test rejects a match when d1/d2 > 0.75. Why?