Blog — Tojo Thomas | On-Device AI, Embedded Systems, Mobile ML

May 1, 2026 22 min read Mobile AI • NPU • On-Device Inference

Running Transformer Models on Mobile NPUs: What Actually Works (and What Breaks)

Field-tested engineering report from deploying RoBERTa-base (125M params) on Snapdragon 8 Elite's Hexagon NPU. 8 backends tested, real failure cases with Issue/Effect/Fix analysis, w8a16 quantization breakthrough, and the design decisions behind SentiLog's production AI stack.

Snapdragon 8 Elite Hexagon NPU ONNX Runtime LiteRT w8a16 Quantization Qualcomm AI Hub

Read article →