Pi 5 Digit AI — Edge AI on Raspberry Pi

Download Report (PDF) View on GitHub
C/C++ PyTorch Docker Raspberry Pi OpenCV

Overview

A complete embedded AI application for detecting and recognizing handwritten digits (0–9) in real time using a Raspberry Pi 5 and its Camera Module v3. The camera stream is processed inside a Docker container using OpenCV and C/C++, and the resulting annotated video feed is redistributed over TCP for remote viewing.

Architecture

[Camera v3] → rpicam-vid:5000 → [Docker OpenCV/C++]:8554 → [Client VLC]

The system captures video from the Pi camera, processes each frame through a preprocessing pipeline (ROI extraction, binarization, contour detection, digit extraction, MNIST-format normalization), runs inference through either an MLP or CNN model, and streams the annotated result over the network.

Models

  • MLP — A minimal single hidden layer architecture (512 neurons, ~407K parameters). Extremely fast inference (~0.033ms) but lower accuracy on real-world data (84%).
  • CNN — Two convolutional layers with stride 2, followed by a dense output layer (~29K parameters, 14x lighter than the MLP). Better accuracy (92%) with inference at ~0.6ms on the Pi — well within the 30 FPS budget.

Models are trained with PyTorch on a hybrid dataset (MNIST + custom handwritten digits), then weights are exported to text files and loaded by the C inference engine via sequential reading.

C/C++ Implementation

The application is built around six modular components:

  • neural_network (C) — Model loading, forward pass for MLP and CNN, implemented without external math libraries.
  • process_frame (C++) — Orchestrates the full pipeline: acquisition, preprocessing, inference, and display.
  • contour_detection (C++) — Finds all contours in the binarized image via OpenCV.
  • contour_selector (C++) — Selects the best contour candidate based on area, aspect ratio, and fill rate.
  • digit_extractor (C++) — Extracts, centers (barycenter alignment), pads, and normalizes the digit to a 28×28 MNIST-compatible input.
  • main (C++) — Entry point, GStreamer video pipeline management, main loop.

Results

  • CNN achieves 92% accuracy on the custom test set, with ~0.6ms inference time on the Raspberry Pi 5.
  • MLP achieves 84% accuracy with ~0.033ms inference time.
  • Both models comfortably run within the 30 FPS camera constraint.
  • The full application is containerized with Docker for easy deployment via a single run.sh script.

Tech Stack

  • Training: Python, PyTorch (in Docker)
  • Inference: C/C++, OpenCV 4, GStreamer
  • Deployment: Docker, Raspberry Pi 5, Pi Camera Module v3
  • Compilation: Makefile, -O3 optimization, C++17
Back to projects