Image Segmentation

Module 6 - Thresholding, K-Means, Semantic & Instance Segmentation

What is Segmentation?

Segmentation partitions an image into meaningful regions. Instead of bounding boxes, we assign a label to every pixel. This answers "which pixels belong to the sky?" not just "is there sky in this image?".

Semantic
All pixels of a class get the same label. "All cars = red"
Instance
Each object instance is separate. "Car 1 = red, Car 2 = blue"
Panoptic
Semantic + Instance combined. Full scene understanding.

Thresholding

The simplest segmentation: pixels below threshold T → background (0), above → foreground (255).

seg(x,y) = 255 if I(x,y) > T, else 0
Symbol guide
seg(x, y)segmentation output at pixel (x, y) - either 255 (foreground/white) or 0 (background/black)
I(x, y)grayscale intensity of the input pixel at column x, row y - a value from 0 to 255
Tthreshold - the decision boundary; pixels brighter than T are labeled foreground
255 if … else 0binary output - assigns white (255) to foreground pixels and black (0) to background pixels
> Tcomparison: if the pixel is strictly brighter than the threshold, classify it as foreground

Variants:

Thresholding - Live

Source (grayscale)
Binary

K-Means Color Clustering

K-Means groups pixels by color similarity. Each pixel is assigned to the nearest of K cluster centers; centers are recomputed as the mean of their members. Repeat until convergence.

The result: a color-quantized image where similar colors are merged, revealing dominant color regions - a form of segmentation.

minimize Σₖ Σᵢ∈Cₖ ‖xᵢ − μₖ‖²
Symbol guide
minimizethe algorithm's goal - find cluster assignments that make this total as small as possible
Σₖouter sum over all K clusters (e.g. k = 1, 2, 3 for K=3 colors)
Σᵢ∈Cₖinner sum over all pixels i that belong to cluster k
xᵢthe color vector of pixel i, e.g. (R, G, B) = (200, 130, 50)
μₖcentroid of cluster k - the average color of all pixels currently assigned to that cluster
xᵢ − μₖdifference vector - how far pixel i's color is from its cluster's average color
‖…‖²squared Euclidean distance - the sum of squared differences across R, G, B channels

K-Means Color Quantization - Live

Source
Quantized

Modern Segmentation Methods

Classical methods struggle with complex scenes. Modern approaches use deep learning:

Key insight: U-Net's skip connections pass fine-grained spatial information from encoder to decoder, solving the resolution loss problem of pure encoder networks.

Quiz

Check your understanding

1. Otsu's thresholding automatically finds T by:

2. You have a photo with a bright lamp in one corner making the background uneven. Which thresholding approach works best?

3. What distinguishes instance segmentation from semantic segmentation?