Docs / SDK
SDK Quickstart
Start local first. Add managed mode when you need device registration, rollouts, and telemetry.
1. Install
Add the dependencies to your build.gradle.kts.
build.gradle.kts
implementation("dev.deviceai:core:0.0.1")
implementation("dev.deviceai:speech:0.0.1")
implementation("dev.deviceai:llm:0.0.1")2. Initialize
The SDK auto-detects device hardware. No manual configuration needed.
Initialize (local mode — no API key needed)
// Application.onCreate() PlatformStorage.initialize(this) DeviceAI.initialize(context = this)
3. Speech-to-Text
Powered by whisper.cpp. 7x faster than real-time on mid-range hardware.
Speech-to-Text
SpeechBridge.initStt(modelPath, SttConfig(language = "en", useGpu = true)) val text = SpeechBridge.transcribeAudio(samples) // FloatArray, 16kHz mono SpeechBridge.shutdownStt()
4. Text-to-Speech
Powered by sherpa-onnx. Supports VITS and Kokoro voice models.
Text-to-Speech
SpeechBridge.initTts(modelPath, tokensPath, TtsConfig(speechRate = 1.0f))
val pcm: ShortArray = SpeechBridge.synthesize("Hello from DeviceAI.")
SpeechBridge.shutdownTts()5. LLM Chat
Powered by llama.cpp. Any GGUF model. Vulkan GPU.
LLM Chat (streaming)
val session = DeviceAI.llm.chat("/path/to/model.gguf") {
systemPrompt = "You are a helpful assistant."
temperature = 0.7f
}
session.send("What is Kotlin?").collect { token ->
print(token)
}
session.close()6. Offline RAG
BM25 keyword retrieval — no embedding model needed. Runs entirely on-device.
Offline RAG
val store = BM25RagStore(rawChunks = listOf(
"DeviceAI runs inference on-device.",
"Uses llama.cpp with Vulkan GPU acceleration."
))
val session = DeviceAI.llm.chat(modelPath) { ragStore = store }
session.send("What GPU does DeviceAI use?").collect { print(it) }Scale across devices
Add managed mode
The SDK is production-ready without a backend. When you need orchestration across your device fleet, add an API key to unlock:
Device registration
Auto hardware profiling + capability tier assignment
Model manifest
Assign the right model per device tier, synced every 6h
OTA rollouts
Canary → rollout → full — or instant rollback
Telemetry
Latency, TTFT, tokens/sec — no prompts or audio collected
Kotlin — one line change
DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
telemetry = TelemetryLevel.Minimal
appVersion = BuildConfig.VERSION_NAME
}Invite-only during alpha. We review every request.