Computer Vision with Marco

Segmentation partitions an image into meaningful regions. Instead of bounding boxes, we assign a label to every pixel. This answers "which pixels belong to the sky?" not just "is there sky in this image?".

Semantic
All pixels of a class get the same label. "All cars = red"

Instance
Each object instance is separate. "Car 1 = red, Car 2 = blue"

Panoptic
Semantic + Instance combined. Full scene understanding.

Thresholding

The simplest segmentation: pixels below threshold T → background (0), above → foreground (255).

Symbol guide

seg(x, y)	segmentation output at pixel (x, y) - either 255 (foreground/white) or 0 (background/black)
I(x, y)	grayscale intensity of the input pixel at column x, row y - a value from 0 to 255
T	threshold - the decision boundary; pixels brighter than T are labeled foreground
255 if … else 0	binary output - assigns white (255) to foreground pixels and black (0) to background pixels
> T	comparison: if the pixel is strictly brighter than the threshold, classify it as foreground

K-Means Color Clustering

K-Means groups pixels by color similarity. Each pixel is assigned to the nearest of K cluster centers; centers are recomputed as the mean of their members. Repeat until convergence.

The result: a color-quantized image where similar colors are merged, revealing dominant color regions - a form of segmentation.

Symbol guide

minimize	the algorithm's goal - find cluster assignments that make this total as small as possible
Σₖ	outer sum over all K clusters (e.g. k = 1, 2, 3 for K=3 colors)
Σᵢ∈Cₖ	inner sum over all pixels i that belong to cluster k
xᵢ	the color vector of pixel i, e.g. (R, G, B) = (200, 130, 50)
μₖ	centroid of cluster k - the average color of all pixels currently assigned to that cluster
xᵢ − μₖ	difference vector - how far pixel i's color is from its cluster's average color
‖…‖²	squared Euclidean distance - the sum of squared differences across R, G, B channels

Modern Segmentation Methods

Classical methods struggle with complex scenes. Modern approaches use deep learning:

Key insight: U-Net's skip connections pass fine-grained spatial information from encoder to decoder, solving the resolution loss problem of pure encoder networks.

Image Segmentation

What is Segmentation?

Thresholding

Thresholding - Live

K-Means Color Clustering

K-Means Color Quantization - Live

Modern Segmentation Methods

Quiz

Check your understanding