What is Computer Vision?
Module 1 - Conceptual Foundation
The Core Question
Computer vision is the field of study that enables machines to interpret and understand visual information from the world - images, videos, and 3D scans. The goal is to replicate and eventually surpass human visual perception using algorithms.
When you look at a photo of a cat, your brain instantly recognizes the animal, its pose, the background, and even its mood. A computer sees only a grid of numbers. Computer vision is the bridge between those numbers and meaningful understanding.
Brief History
- 1966 - MIT Summer Vision Project: Marvin Minsky assigned "solving computer vision" as a summer project. It's still not fully solved.
- 1970s–80s: Edge detection, optical flow (Lucas-Kanade), Hough transform, stereo vision.
- 1990s: Statistical methods, face detection (Viola-Jones 2001), SIFT features.
- 2012 - AlexNet: Deep CNNs trained on GPUs win ImageNet by a massive margin. The deep learning era begins.
- 2017–now: Transformers (ViT), diffusion models, CLIP, SAM - vision meets language.
The Image Processing Pipeline
Most computer vision systems follow a pipeline:
Interactive Pipeline
How Digital Cameras Work
A camera sensor is a 2D grid of photosensitive cells (pixels). Each cell measures the intensity of light hitting it. Most consumer cameras use a Bayer filter - a mosaic of R, G, G, B filters (twice as many green, matching human eye sensitivity).
Bayer Pattern Visualizer
Each colored square is one sensor cell. The camera "demosaics" these into full RGB pixels.
| I(x, y) | pixel intensity at column x, row y - a single number (0 = black, 255 = white) |
| L(x, y, λ) | radiance - how much light of wavelength λ arrives at pixel position (x, y) |
| λ (lambda) | wavelength of light, e.g. 450 nm = blue, 550 nm = green, 650 nm = red |
| R(λ) | sensor spectral response - how sensitive the sensor is to each wavelength |
| · (dot) | multiplication - light intensity times sensor sensitivity at that wavelength |
| ∫∫ … dλ | integration over all visible wavelengths - summing contributions from every color of light |
| dλ | infinitesimally small wavelength step (the "slice" being summed in the integral) |
A pixel value is the integral of the light spectrum L weighted by the sensor's response curve R.
Real-World Applications
Pedestrian & lane detection
Tumor detection, X-rays
3D face recognition
Cashier-less checkout
Crop health, drone scouting
Pose estimation, depth sensing
Quiz
Check your understanding
1. A 256×256 RGB image contains how many numbers?
2. Why does the Bayer filter use twice as many green cells as red or blue?
3. What was significant about AlexNet in 2012?