Pi 5 Digit AI — Edge AI on Raspberry Pi
Download Report (PDF) View on GitHubOverview
A complete embedded AI application for detecting and recognizing handwritten digits (0–9) in real time using a Raspberry Pi 5 and its Camera Module v3. The camera stream is processed inside a Docker container using OpenCV and C/C++, and the resulting annotated video feed is redistributed over TCP for remote viewing.
Architecture
[Camera v3] → rpicam-vid:5000 → [Docker OpenCV/C++]:8554 → [Client VLC]
The system captures video from the Pi camera, processes each frame through a preprocessing pipeline (ROI extraction, binarization, contour detection, digit extraction, MNIST-format normalization), runs inference through either an MLP or CNN model, and streams the annotated result over the network.
Models
- MLP — A minimal single hidden layer architecture (512 neurons, ~407K parameters). Extremely fast inference (~0.033ms) but lower accuracy on real-world data (84%).
- CNN — Two convolutional layers with stride 2, followed by a dense output layer (~29K parameters, 14x lighter than the MLP). Better accuracy (92%) with inference at ~0.6ms on the Pi — well within the 30 FPS budget.
Models are trained with PyTorch on a hybrid dataset (MNIST + custom handwritten digits), then weights are exported to text files and loaded by the C inference engine via sequential reading.
C/C++ Implementation
The application is built around six modular components:
- neural_network (C) — Model loading, forward pass for MLP and CNN, implemented without external math libraries.
- process_frame (C++) — Orchestrates the full pipeline: acquisition, preprocessing, inference, and display.
- contour_detection (C++) — Finds all contours in the binarized image via OpenCV.
- contour_selector (C++) — Selects the best contour candidate based on area, aspect ratio, and fill rate.
- digit_extractor (C++) — Extracts, centers (barycenter alignment), pads, and normalizes the digit to a 28×28 MNIST-compatible input.
- main (C++) — Entry point, GStreamer video pipeline management, main loop.
Results
- CNN achieves 92% accuracy on the custom test set, with ~0.6ms inference time on the Raspberry Pi 5.
- MLP achieves 84% accuracy with ~0.033ms inference time.
- Both models comfortably run within the 30 FPS camera constraint.
- The full application is containerized with Docker for easy deployment via a single
run.shscript.
Tech Stack
- Training: Python, PyTorch (in Docker)
- Inference: C/C++, OpenCV 4, GStreamer
- Deployment: Docker, Raspberry Pi 5, Pi Camera Module v3
- Compilation: Makefile, -O3 optimization, C++17