Now available · Android & iOS

AI that runs
on the device.

Speech recognition, text-to-speech, and LLM inference — entirely on-device. No cloud calls. No latency. No data leaves the phone. Deploy to millions of Android and iOS devices with OTA model updates and real-time telemetry.

Complete privacy

Data never leaves the device. No cloud processing. GDPR/HIPAA friendly.

Zero latency

7× faster than real-time inference. No network round-trips. Instant responses.

Zero cloud cost

No per-token API bills. No usage metering. Models run free on-device.

Open source · Apache 2.0 · Production ready

Android SDK · LiveSwift SDK · LiveFlutter · Planned

Why DeviceAI

Enterprise-grade AI.
Zero cloud dependency.

Everything your team needs to ship on-device AI at scale — from a single SDK integration to fleet-wide model management.

Privacy by design

Audio, text, and prompts never leave the device. No cloud processing, no data collection, no compliance risk. GDPR/HIPAA friendly by default.

Sub-100ms inference

whisper.cpp runs 7× faster than real-time. llama.cpp generates tokens at 46 tok/s on mid-range hardware. No network round-trip — instant responses.

Works fully offline

No internet required after initial model download. Perfect for field workers, healthcare, aviation, or anywhere connectivity is unreliable.

Hardware-aware

SDK auto-detects RAM, CPU, SoC, and NPU. The backend assigns the right model per device — flagship gets Llama 3B, budget gets SmolLM 135M.

OTA model updates

Push new models to devices without app store updates. Canary rollouts, percentage-based targeting, and instant kill-switch if something goes wrong.

Production telemetry

Track inference latency, TTFT, tokens/sec, and model load times across your entire fleet. Network-aware batching. Custom analytics sink support.

How It Works

From zero to on-device AI
in five minutes

Add the SDK

One line in your build.gradle. Works with any Android app — Kotlin, Java, Compose, or XML.

implementation("dev.deviceai:core:0.0.1")

Initialize

Two lines of code. The SDK auto-detects device hardware — RAM, CPU, SoC model, NPU availability. No manual configuration.

DeviceAI.initialize(context, apiKey = "<YOUR_API_KEY>")

Run inference

Call the SDK. Speech recognition, text-to-speech, or LLM chat — all running locally on the device. Zero cloud calls.

session.send("Hello").collect { print(it) }

Monitor performance

Inference latency, time-to-first-token, tokens/sec, and model load times stream to your dashboard. No prompts or audio collected — ever.

telemetry = TelemetryLevel.Minimal

Push model updates

Deploy new models to specific device tiers without an app update. Canary to 5%, observe telemetry, roll out to 100% — or instant rollback.

canary → rollout → full (or rollback)

Ready to ship AI that respects your users' privacy?

Get access

AI that runson the device.