Image Segmentation
Module 6 - Thresholding, K-Means, Semantic & Instance Segmentation
What is Segmentation?
Segmentation partitions an image into meaningful regions. Instead of bounding boxes, we assign a label to every pixel. This answers "which pixels belong to the sky?" not just "is there sky in this image?".
All pixels of a class get the same label. "All cars = red"
Each object instance is separate. "Car 1 = red, Car 2 = blue"
Semantic + Instance combined. Full scene understanding.
Thresholding
The simplest segmentation: pixels below threshold T → background (0), above → foreground (255).
| seg(x, y) | segmentation output at pixel (x, y) - either 255 (foreground/white) or 0 (background/black) |
| I(x, y) | grayscale intensity of the input pixel at column x, row y - a value from 0 to 255 |
| T | threshold - the decision boundary; pixels brighter than T are labeled foreground |
| 255 if … else 0 | binary output - assigns white (255) to foreground pixels and black (0) to background pixels |
| > T | comparison: if the pixel is strictly brighter than the threshold, classify it as foreground |
Variants:
- Global threshold: One T for the whole image. Works only with uniform lighting.
- Adaptive threshold: T computed locally per region. Better for uneven lighting.
- Otsu's method: Automatically finds optimal T by minimizing within-class variance. Works great for bimodal histograms.
Thresholding - Live
K-Means Color Clustering
K-Means groups pixels by color similarity. Each pixel is assigned to the nearest of K cluster centers; centers are recomputed as the mean of their members. Repeat until convergence.
The result: a color-quantized image where similar colors are merged, revealing dominant color regions - a form of segmentation.
| minimize | the algorithm's goal - find cluster assignments that make this total as small as possible |
| Σₖ | outer sum over all K clusters (e.g. k = 1, 2, 3 for K=3 colors) |
| Σᵢ∈Cₖ | inner sum over all pixels i that belong to cluster k |
| xᵢ | the color vector of pixel i, e.g. (R, G, B) = (200, 130, 50) |
| μₖ | centroid of cluster k - the average color of all pixels currently assigned to that cluster |
| xᵢ − μₖ | difference vector - how far pixel i's color is from its cluster's average color |
| ‖…‖² | squared Euclidean distance - the sum of squared differences across R, G, B channels |
K-Means Color Quantization - Live
Modern Segmentation Methods
Classical methods struggle with complex scenes. Modern approaches use deep learning:
- FCN (2015): Fully Convolutional Network - first end-to-end semantic segmentation with CNNs.
- U-Net (2015): Encoder-decoder with skip connections. Excellent for medical imaging.
- Mask R-CNN (2017): Extends Faster R-CNN with a mask branch. Instance segmentation.
- SAM (2023): Segment Anything Model by Meta. Promptable segmentation at any granularity.
Quiz
Check your understanding
1. Otsu's thresholding automatically finds T by:
2. You have a photo with a bright lamp in one corner making the background uneven. Which thresholding approach works best?
3. What distinguishes instance segmentation from semantic segmentation?