Feature Detection

Module 5 - Corners, Harris, SIFT, ORB, Matching

What are Image Features?

A feature is a compact, meaningful piece of information extracted from an image. Good features are repeatable (found at the same location despite viewpoint/lighting changes), distinctive (distinguishable from each other), and efficient (fast to compute).

Features enable: image stitching (panoramas), 3D reconstruction, object recognition, AR tracking, and more.

Corners vs Edges vs Flat Regions

Consider a small image patch and how it moves:

Flat region

Looks the same in all directions. No information. Can't localize.

Edge

Can localize perpendicular to edge, not along it. Ambiguous.

Corner

Looks different in all directions. Uniquely localizable. Best feature!

Harris Corner Detector

Harris (1988) measures how much intensity changes when you shift a window in any direction, using the Structure Tensor M:

M = Σ w(x,y) [ Iₓ² IₓIᵧ ]
[ IₓIᵧ Iᵧ² ]
Symbol guide
Mstructure tensor (2×2 matrix) - summarizes how intensity changes in all directions around a pixel
Σ w(x,y)weighted sum over a local neighborhood window; w is a Gaussian that gives more weight to the center
Iₓhorizontal image gradient - how much intensity changes left/right at each pixel
Iᵧvertical image gradient - how much intensity changes up/down at each pixel
Iₓ²squared horizontal gradient - strong in regions with vertical edges
Iᵧ²squared vertical gradient - strong in regions with horizontal edges
IₓIᵧcross term - large when both gradients are strong simultaneously, indicating a corner
[ … ]2×2 matrix notation; the off-diagonal IₓIᵧ terms capture edge orientation correlation

The Harris response R is:

R = det(M) - k · trace(M)² = λ₁λ₂ - k(λ₁+λ₂)²
Symbol guide
RHarris response score - positive = corner, negative = edge, near zero = flat region
det(M)determinant of M = λ₁ × λ₂ - large when intensity changes strongly in both directions (corner)
trace(M)trace of M = λ₁ + λ₂ - sum of eigenvalues, measuring total gradient energy in the window
ksensitivity constant, typically 0.04–0.06 - trades off corner vs edge detection; higher k = fewer corners
λ₁, λ₂eigenvalues of M - represent the dominant gradient strengths in two orthogonal directions
λ₁λ₂product of eigenvalues - large only if both directions have strong gradients (true corner)
(λ₁+λ₂)²squared sum - penalizes flat regions and single edges where one eigenvalue dominates

k is typically 0.04–0.06. After computing R, apply a threshold and non-maximum suppression to get corner locations.

Harris Corner Detector - Live

Detected corners: ,

SIFT and ORB

Harris finds corners but they're not scale-invariant. SIFT (Scale-Invariant Feature Transform, Lowe 2004) detects keypoints at multiple scales using a Difference-of-Gaussian (DoG) pyramid, then builds a 128-dim descriptor based on gradient orientations. Invariant to scale, rotation, and partial illumination change.

ORB (Oriented FAST + Rotated BRIEF) is a fast, free alternative to SIFT. It uses FAST for keypoint detection and BRIEF for binary descriptors. ~100× faster than SIFT, suitable for real-time.

Detector Scale-inv. Rot-inv. Speed License
HarrisFastFree
SIFTSlowFree (post-2020)
SURFMediumPatented
ORB~✓Very fastFree

Feature Matching

Once you have descriptors from two images, find correspondences by comparing descriptors. Common strategies:

Matching Visualizer (Synthetic)

Quiz

Check your understanding

1. In the Harris response, a pixel with R ≪ 0 (large negative) indicates:

2. What is the main advantage of ORB over SIFT?

3. Lowe's ratio test rejects a match when d1/d2 > 0.75. Why?