What is Computer Vision?

Module 1 - Conceptual Foundation

The Core Question

Computer vision is the field of study that enables machines to interpret and understand visual information from the world - images, videos, and 3D scans. The goal is to replicate and eventually surpass human visual perception using algorithms.

When you look at a photo of a cat, your brain instantly recognizes the animal, its pose, the background, and even its mood. A computer sees only a grid of numbers. Computer vision is the bridge between those numbers and meaningful understanding.

Key insight: An image is just a matrix of numbers. A 640×480 grayscale image is a 640×480 matrix of integers 0–255. Color images add a third dimension (3 channels: R, G, B).

Brief History

The Image Processing Pipeline

Most computer vision systems follow a pipeline:

Interactive Pipeline

📷 Capture
🔧 Pre-process
🔍 Features
🧠 Model
✅ Output
Click any step to learn more.

How Digital Cameras Work

A camera sensor is a 2D grid of photosensitive cells (pixels). Each cell measures the intensity of light hitting it. Most consumer cameras use a Bayer filter - a mosaic of R, G, G, B filters (twice as many green, matching human eye sensitivity).

Bayer Pattern Visualizer

Each colored square is one sensor cell. The camera "demosaics" these into full RGB pixels.

I(x, y) = ∫∫ L(x, y, λ) · R(λ) dλ
Symbol guide
I(x, y)pixel intensity at column x, row y - a single number (0 = black, 255 = white)
L(x, y, λ)radiance - how much light of wavelength λ arrives at pixel position (x, y)
λ (lambda)wavelength of light, e.g. 450 nm = blue, 550 nm = green, 650 nm = red
R(λ)sensor spectral response - how sensitive the sensor is to each wavelength
· (dot)multiplication - light intensity times sensor sensitivity at that wavelength
∫∫ … dλintegration over all visible wavelengths - summing contributions from every color of light
infinitesimally small wavelength step (the "slice" being summed in the integral)

A pixel value is the integral of the light spectrum L weighted by the sensor's response curve R.

Real-World Applications

🚗 Self-driving cars
Pedestrian & lane detection
🏥 Medical imaging
Tumor detection, X-rays
📱 Face ID
3D face recognition
🛒 Retail
Cashier-less checkout
🌾 Agriculture
Crop health, drone scouting
🎮 AR/VR
Pose estimation, depth sensing

Quiz

Check your understanding

1. A 256×256 RGB image contains how many numbers?

2. Why does the Bayer filter use twice as many green cells as red or blue?

3. What was significant about AlexNet in 2012?