Pi 5 Digit AI — Edge AI on Raspberry Pi

January 2026 — ENSEIRB-MATMECA (with Maxime Hurtubise)

C/C++ PyTorch Docker Raspberry Pi OpenCV

Overview

A complete embedded AI application for detecting and recognizing handwritten digits (0–9) in real time using a Raspberry Pi 5 and its Camera Module v3. The camera stream is processed inside a Docker container using OpenCV and C/C++, and the resulting annotated video feed is redistributed over TCP for remote viewing.

Architecture

[Camera v3] → rpicam-vid:5000 → [Docker OpenCV/C++]:8554 → [Client VLC]

The system captures video from the Pi camera, processes each frame through a preprocessing pipeline (ROI extraction, binarization, contour detection, digit extraction, MNIST-format normalization), runs inference through either an MLP or CNN model, and streams the annotated result over the network.

Models

MLP — A minimal single hidden layer architecture (512 neurons, ~407K parameters). Extremely fast inference (~0.033ms) but lower accuracy on real-world data (84%).
CNN — Two convolutional layers with stride 2, followed by a dense output layer (~29K parameters, 14x lighter than the MLP). Better accuracy (92%) with inference at ~0.6ms on the Pi — well within the 30 FPS budget.

Models are trained with PyTorch on a hybrid dataset (MNIST + custom handwritten digits), then weights are exported to text files and loaded by the C inference engine via sequential reading.

C/C++ Implementation

The application is built around six modular components:

neural_network (C) — Model loading, forward pass for MLP and CNN, implemented without external math libraries.
process_frame (C++) — Orchestrates the full pipeline: acquisition, preprocessing, inference, and display.
contour_detection (C++) — Finds all contours in the binarized image via OpenCV.
contour_selector (C++) — Selects the best contour candidate based on area, aspect ratio, and fill rate.
digit_extractor (C++) — Extracts, centers (barycenter alignment), pads, and normalizes the digit to a 28×28 MNIST-compatible input.
main (C++) — Entry point, GStreamer video pipeline management, main loop.

Results

CNN achieves 92% accuracy on the custom test set, with ~0.6ms inference time on the Raspberry Pi 5.
MLP achieves 84% accuracy with ~0.033ms inference time.
Both models comfortably run within the 30 FPS camera constraint.
The full application is containerized with Docker for easy deployment via a single run.sh script.

Tech Stack

Training: Python, PyTorch (in Docker)
Inference: C/C++, OpenCV 4, GStreamer
Deployment: Docker, Raspberry Pi 5, Pi Camera Module v3
Compilation: Makefile, -O3 optimization, C++17

Back to projects