Task: Assign a discrete label from a fixed set of categories to an input image.

Why not hard-code?: No explicit algorithm exists to programmatically recognize objects (e.g., "cat") from raw pixels. We must use a data-driven approach:

Collect labeled dataset
Train a classifier using ML
Evaluate on unseen test data

Nearest Neighbor & K-Nearest Neighbor (KNN)

How it works

Memorize all training images & labels.
For a test image, compute distance to all training samples.
Predict label of the most similar example (K=1) or majority vote among K closest (K>1).

Distance Metrics

L1 (Manhattan): $d_1(x,y) = \sum_i |x_i - y_i|$
L2 (Euclidean): $d_2(x,y) = \sqrt{\sum_i (x_i - y_i)^2}$
Note: L1 and L2 can yield identical distances for geometrically different points in pixel space.

Complexity & Limitations

Training: $O(1)$ (just store data)
Prediction: $O(N)$ where N = training set size
Critical flaw: Pixel-wise distance is not semantically meaningful. A 1-pixel shift, occlusion, or color tint can make vastly different images appear equally "close" in pixel space. Thus, KNN on raw pixels is rarely used in practice.

Hyperparameters & Tuning Strategy

Hyperparameters: Algorithm-level choices not learned from data (e.g., K, distance metric).
Bad Practices:
- Tune on training data → Overfits (K=1 always gives 0% train error)
- Tune on test data → Leaks evaluation info; no idea of real generalization
Correct Practice:
- Split: Train → fit model | Validation → tune hyperparameters | Test → final evaluation (use only once!)
- For small datasets: Cross-Validation (e.g., 5-fold) to average performance and reduce variance.

Linear Classifiers

Instead of memorizing data, learn a fixed set of parameters:

$$ f(x, W) = Wx + b $$

$x$: Flattened image vector (e.g., $3072 \times 1$ for CIFAR-10)
$W$: Weight matrix ($C \times D$, $C$=classes, $D$=features)
$b$: Bias vector ($C \times 1$)
Output: Raw class scores (logits)

Interpretations

Visual: Each row of $W$ acts as a template. High score ≈ high template match.
Geometric: Defines linear decision boundaries (hyperplanes) in feature space.
Limitation: Can only model linearly separable data. Fails on complex patterns (e.g., concentric circles, XOR-like distributions).

Softmax Classifier & Loss Function

Softmax Function

Converts raw scores to probabilities:

$$ P(y=j \mid x) = \frac{e^{s_j}}{\sum_{k=1}^{C} e^{s_k}} $$

Ensures $P \geq 0$ and $\sum P = 1$

Cross-Entropy Loss (Negative Log-Likelihood)

$$ L_i = -\log P(y_i \mid x) = -\log\left(\frac{e^{s_{y_i}}}{\sum_j e^{s_j}}\right) $$

Goal: Minimize loss ⇔ Maximize probability of the correct class.
Loss Bounds:
- Minimum: $0$ (perfect confidence in correct class)
- Maximum: $\infty$ (zero probability for correct class)
- At random initialization: All scores ≈ equal → $L_i \approx \log(C)$(e.g., $\log(10) \approx 2.3$ for CIFAR-10)

Key Takeaways

KNN is intuitive but impractical for raw image pixels due to $O(N)$ inference and poor semantic distance metrics.
Linear classifiers introduce parameterization, enabling fast inference and learnable decision boundaries.
Softmax + Cross-Entropy provides a probabilistic, differentiable framework for multi-class classification.
Proper data splitting (train/val/test) and hyperparameter tuning are foundational to reliable ML workflows.

Share this post

Back to home

Image Classification with Linear Classifiers

Nearest Neighbor & K-Nearest Neighbor (KNN)

How it works

Distance Metrics

Complexity & Limitations

Hyperparameters & Tuning Strategy

Linear Classifiers

Interpretations

Softmax Classifier & Loss Function

Softmax Function

Cross-Entropy Loss (Negative Log-Likelihood)

Key Takeaways

Comments

Nearest Neighbor & K-Nearest Neighbor (KNN)

How it works

Distance Metrics

Complexity & Limitations

Hyperparameters & Tuning Strategy

Linear Classifiers

Interpretations

Softmax Classifier & Loss Function

Softmax Function

Cross-Entropy Loss (Negative Log-Likelihood)

Key Takeaways

Comments

Scan to share on WeChat