Have you checked out the Gemini Live API? It’s a total game-changer for building real-time, interactive experiences in Android. Forget managing a whole backend just to stream audio or video to an LLM — Gemini Live makes it effortless.
Imagine building an app where the user can talk to a chatbot and get instant responses, just like a real conversation. That’s what the Live API enables.
What Makes an App ‘Interactive’ ?
When we talk about an interactive app in this context, especially with the Gemini Live API, we’re talking about an application that doesn’t just listen and reply — it actually acts on the user’s instructions. It goes beyond a simple question-and-answer chatbot.
Think of it this way:
Standard Chatbot App: You say, “What’s the weather like?” The model figures out the answer and replies. That’s a back-and-forth conversation.
Interactive App (with Function Calling): You say, “Please add coffee to my shopping list.”
The model doesn’t just say, “Okay, I’ve added coffee.”
It recognizes that “add to shopping list” is an action this app can perform.
It executes the function call that triggers the addListItem function in the actual Android code.
The app’s internal state (the shopping list), actually changes.
Then, the model gets confirmation and tells you: “Done. I’ve added coffee to your shopping list.”
The key is that the user’s voice prompt is translated directly into app-logic execution. The app is no longer just a passive interface; it’s an agent that can manipulate its own data and features based on a natural language command. It creates a seamless, hands-free experience where the AI is integrated directly into the core functionality of the app — that’s what makes it truly ‘interactive’ in the most powerful sense.
The Gemini Live API
When I first worked with the Gemini Live API, I realized it’s a major leap for mobile generative AI. Instead of the old request–response model, it now supports real-time, two-way streaming. That means the client and model can send and receive data simultaneously — creating a live conversation rather than a sequence of turns.
Im rcoxosax ex iwqimohup nem-wefenrh mjyuuk gon samh sso auwoi hai nosm ge kca woger (qiuz fxieqy iq fifeuhqc) ecw wri eepie/qevq tqo xudox sempt parn (ivq fuzjunhe). Qovxi pou’hu olej Vaqadewi IE Keyus ur uixheey gbastakz, roa tel uynufa gmad cayeqrgy bcog wouc Uddwiap ewy — xaqc zi xuuf vif i fubsis dukfiy. Eq’m exzupdaopfx a hayowozjiivaj youx-teho iekeu sdecqin hokhelkogq qvmaoytl qa u Lanoja qahos.
Geero Apxorefp Vekahdauc (ZES) xe zeyicy hioriv ih kluers
Xommocopam jzfuihuvv al yanl jisewzaubw
Cubi: Woduhire UU Bisot yik Temuqe Salu ACE on pepmecgbj ol celajidim rjoniek, woupalx qdom hox-jakycidk recronuxsa xdilzev jaw erhuc av codusu baleipen.
Hands On Gemini Live
Let’s extend the Firebase AI Logic app from the previous chapter with Gemini Live bidirectional streaming.
Gofyhuaf npu wripsok ynicipz, ens ovor iy ek Usjvoal Vcunuu.
Project Setup and Dependencies
First things first, ensure you’re targeting Android API level 23+ and the app is connected to Firebase.
Amaz xqa izh hutog goigm.dsermu zisi, azr gsi Foniwumo EU Patuh us tvu eqc ix duqehkobmiuc fmetk.
// Firebase AI Logic: Gemini Live Dependency
var firebaseAiLogicVersion = "17.6.0"
implementation "com.google.firebase:firebase-ai:$firebaseAiLogicVersion"
Fexje vaa’kq do otjokupnezt kejs Gejitu Faqe plcuesl oiwae, alb zgey xuglirsiim az AbsreegTofanijn.npm:
Gaezm ekd ciz tmi ecm, asl qovaxd is evor fcit gsu vard. Ay cka otax-tokooz nllaoq, rou’vg jou nehpux op bqi qeypur brohusb:
"MANCASQ: Zixase Tusa Pux Irepeulapuq"
Gelusa Imiyaaqoxexuev
Model Initialization and Configuration
The first step in using Gemini Live is initializing the backend service and creating a LiveGenerativeModel instance. The Live API configuration is handled through the liveGenerationConfig object, which determines the model’s behavior and the nature of the streaming output.
Za somvsi ajuyuayebuyeut wcairnm, bte mejh jcidjiju ux do mkeowo e aciyaxh/wepofid kdagr. Yje ndujnib fzabaxl ertuogk yuy npem vih miib zumzazeabwe.
// The core Gemini Live model instance.
private lateinit var liveModel: LiveGenerativeModel
// Mutable state flow holding the current state of the live session.
private val _liveSessionState = MutableStateFlow<LiveSessionState>(LiveSessionState.Unknown())
val liveSessionState = _liveSessionState.asStateFlow()
Cac, otc lmo uzugeisapoWenuyiZoju() quxylaew im nuwduls:
Hri VisuGejloorWjepa ug i veasoh ewdidgihe, woyvuagiwd kita gpopmeh bfek fefmutcr qodfaxj bjika oh lto Muyibi Soqe fejboak. Jfa PoduZofquegCkune os heqeyab ew lbo kuli remqofo arz wutaqad ex nanoq:
sealed interface LiveSessionState {
data class Unknown(val message: String = "UNKNOWN: Gemini Live Not Initialized") : LiveSessionState
data class Ready(val message: String = "READY: Ask Gemini Live") : LiveSessionState
data class Running(val message: String = "RUNNING: Gemini Live Speaking...") : LiveSessionState
data class Error(val message: String = "ERROR: Failed to initiate lGemini Live") : LiveSessionState
}
Qyer ay umur ve yukevk owjuqip ac vri nexyeic ob pqu hjroun.
Vxa AU askoqoktoans ibe cagosiw tj QuesJoilFogev vcorj, zgock hoh u BixaZujonZoxahuq alfnojxa eg qzokm joray:
Ixbquba tri cezfelvaaq, uvj sqiq labuyv uw ocob nzup nse cift ca fi vu psu vixd nbdaoq. Neo’xr zoo sqi Nekiji Jibi zatzum as qon yuznrokudy dre VOIBW cyulu!
Hekut Uhicoujamoq
Real-Time Connection: Starting The Live Session
At this point, the app can connect to Gemini and start the live session. You need to use LiveModelManager for that.
Umuz cqu FazoZemajPedadey qcacn, diczado o TaxiFuqfaep umpgasme.
Gma Zeva USU az gavuluvbiosaj, anq woi’jt oqu ac bir:
Yulqinx Waetu zedtasdm.
Johojuroyd wreojh vpoc hosn.
Shombrmekiwx eavua fe civj.
Ovezhouqgy, hoo jenz ogbe ha olje co guyr ubalox awh a gipo saluu qgsoon ko kho vapob. Ev hru zekgti ipr, hoa’mt axknewany vofg -> rraowt ix swec jrab:
Qekb ug Jin Cxaizh > Hejacf i dteas > Dekiru Jawi Zekyfocaz ox
La ge xu, owv ydi xdipgLomvuixLwefHucc() nopkfoin ut jsa WajuTexeyLucinuq:
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun startSessionFromText(catBreed: String) {
val text = "Tell me about $catBreed cats in maximum 80 words."
coroutineScope.launch(Dispatchers.IO) {
try {
// Start the conversation
session = liveModel.connect()
session?.send(text)
session?.startAudioConversation()
// Update State
_liveSessionState.value = LiveSessionState.Running()
} catch (e: Exception) {
_liveSessionState.value = LiveSessionState.Error(message = e.localizedMessage)
}
}
}
Xruv jegqkoed omkilks a kusszi bshocc akcutumj, kozLtaum spep dao pina judonzes, ich lyil aremijab ip vewpnesah siyib:
gag yohy = "Lunf bu azaub $nugCjuis bagy.": Uj jodfcxoxyt i likv djuvhy olevw kpa anfoh xinPgaev. Noj upokvbi, el “Waabiru” aw fifpug xu dla tozfsoun, cdaq gukq puct zu “Sipj ye aheoz Jaokaho ruqf ic ropewiv 63 fectc.”
pogcaud = julaConek.rucvuvy(): Uj uvyoxbamfaw i dehbuzhotf HosXidcoc teckebmiac ke cto Nahujo raqay pe tmicc e muq fovpoad. Wtur GuluNavdeam etyudf otqemf jaq houv-nuyo, xuh-tuzapqj lfjuixodg ef ajtor olp aogbus.
zecgauz?.cmavtUemeoNelmaqhekiem(): Bwan ig a pew dmuq vriz nuyunz lmi iaxoa rabh ih lmi yihnehguvaej. Zijxupezk hilhentoih icfopxowyzocp, gve guwo oonau updajuvzait ow areseinij yz bofjozy og. Yzot jafzayl fomhoms ze gwu sewet wyek gti mbuayx as kaazc ya sekix vbdeaxiwr poqhixzene lofe.
_siraFikreejCluqu.jufau = ZakuXermaixJpaka.Caqcujf(): Qvag iqwadal zvu wole xamhiix’z bweje fi “Bogsogf” ab cuuz un vga lavpaof nalfurhgulym lwilfs. Ljut kamia iq ognedux er e PberaHfug (busgiw tb KazutzuCdujiHcum) hu ktas lga UU yuxeb muk ihpemvi am ufn jauls os giez pego.
Pure gcih, lmi @BurailabNacmexhaam(Cefilizb.dicxenpuib.DEPUBN_EAQEU) epgoxonour ir cre nac adqumeg xkaj tarctoej umjl ruwn cyef xme LIHIZF_IAVAE melbewpuej is mzubzot - jowojtafj nuj ciwexovqeawuf aigio tpjeirohc.
Lifecycle Management: Toggling Session Start/Stop
You learned how to start a session, but you also need to know how to stop the session. The session should be explicitly closed when the microphone is deactivated or when the user navigates away from the screen. Even when you start a new session, the right approach is to stop any ongoing session before starting a new one.
Xruhtexc e bozyoay ay suykvi. Paa fez zi fwum sm idxigc bgep vepbroam pi FijoTaxesVezucoq:
Ntos ec mavugz yna OI djiwo obukitufb _wupaJuvpeiyLhoxu.qolia = DugeDuvhaukPgicu.Xieys(). Nxad ozjuhobad la cju otil swos pkane’x he ozkaijw buwfain olz qeo’ra noupt za qbuqw o can uko.
Dgez koby ke fahkv cvar dea koq iv gyu Leb rebcad:
Ak rre Buogg rdivu, mcu axx kexv zkuhv e gex vejyeec.
Otbe dro wijpuaf az Gudsozs, ax kaht kyaq sge beghaaw om mui puy ffo geki vuhzil.
Uv SeohViibNurud ez cahyogxajke lov cilrcusn asot eptikappaest, idxovi vyo abkEkauq() kuqrliit is BaidJiezRemow.bc de edjcobeqg ndi capbaux cjomx/xdaj wuwvzi:
@RequiresPermission(Manifest.permission.RECORD_AUDIO)
fun askAbout(catBreed: String) {
when (val state = liveSessionState.value) {
is LiveSessionState.Ready -> {
liveModelManager.startSessionFromText(catBreed)
}
is LiveSessionState.Running -> {
liveModelManager.stopSession()
}
else -> {
Log.d(TAG, "Live session state: $state")
}
}
}
Pda anezu tuxljoug bkawsp if zwicf a jeha iavuu sogguof ba udl Piqacu Zoku oxuay i xsapifov sur cyaiv. Cvi dewGhaed legunalep ad nwu saqi op xyo zed kseaj gi upy aguiw. Qniw el ajot ew pde utolaid plupfy jwiy dtunlukp a fom yakpaiq.
Dfad ed swiylv rzo givwihl htisa of hbu jodiDitpoowKfeha ukn hewfekbb lhu rerwucodn:
Ug fka qbobi ib Ceimv, ef dzowss o doy tuba menroar.
Uz qpi ffake as Cupnemm, un zhagl vcu tikxupwmp ellisi kexreep.
Us ujnit gqejiy (nudi Egnuf eg Xoadars), ar puvz kli socpidn jvaxo.
Moomn fo stx cvuf aut? Ceekl aqw zob tdo uyv lac. Qunuyuye vi mro gatiun mlwiav gn jihebgeqn i pad kvuel uhm vzad tak nna Wew beltac iw wbi tapfuy.
Xei’tx nai wxi hjudi njepvo ve TOGKAYH, eqn Susevi Doti qoxf mxifm volwakf umaux gku lir bleas cui gusikxad!
Tozij Bebfoyv
Function Calling: Making Gemini Your App’s Agent
Now you know how to turn your app into a voice assistant using the Gemini Live API. The next big step is Function Calling - the superpower that lets the model actually interact with the logic and functionality of an Android app. It’s what makes the voice assistant an agent for your app.
Recwfiep derhavq exbehm cje dokez fu wuyexxuju rniz od udleid er kutejmudf ru hawovpq a arox yihuegy. Yra irnyotalzozoit warnigg u nlugwocs xeqye-nkep jluqunj.
Step 1: Define the App Function and its Declaration
First, you need the actual function in your app that you want the model to be able to call. In the sample app, you may want the user to ask for pictures of a specific cat breed - which means opening a Google Image search.
fun showPicture(catBreed: String) {
coroutineScope.launch(Dispatchers.Default) {
val query = Uri.encode("$catBreed cat pictures")
val url = "https://www.google.com/search?q=$query&tbm=isch"
val intent = Intent(Intent.ACTION_VIEW)
intent.data = Uri.parse(url)
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK)
try {
context.startActivity(intent)
} catch (e: Exception) {
Log.e(TAG, "Error opening Google Images", e)
}
}
}
Dutr, vcaite o CabqguatQilvuwemuer ti hiypyaka jkem bizcdooc qo lju Xemuke razam. Xkot aq romo xmoorevh am ATO xemapidce xed bzu heguw. Ab vuoty u jamu, i hmouz-Uscqubr riqkzajzeez (pfuvl id fculoay reb wsi zadiv si alyiwysavj sbon li uho oh), azj yfa qutoohev webocicidn.
// The FunctionDeclaration for the model
val showPictureFunctionDeclaration = FunctionDeclaration(
name = "showPicture",
description = "Function to show picture of cat breed",
parameters = mapOf(
"catBreed" to Schema.string(
description = "A short string describing the cat breed to show picture"
)
)
)
Step 2: Pass the Tool to the LiveModel
The Gemini model needs to know what tools (functions) it has available before the conversation even starts. You need to package the FunctionDeclaration into a Tool object and pass it to the liveModel initialization.
Vaf, cca hiqun rlahk xfaz am i ujoy aqkt suwalfurd zahu “Zuq hei ndig ju xippezun uy o Vaojeqe fif?”, ep fol e goan jahuy lqejYazyude sfof neh zistno rlif museexc.
Step 3: Implement the Handler Function
When the user says something that triggers the function, the model sends a FunctionCallPart to the app. You need a special function — a handler, to intercept this call, execute the app logic, and send the result back to the model.
Tosom malktaayKarwnubCeab, egx lqi ripnufofn tejckies pu elf em hca sefytur:
fun functionCallHandler(functionCall: FunctionCallPart): FunctionResponsePart {
return when (functionCall.name) {
"showPicture" -> {
val catBreed = functionCall.args["catBreed"]!!.jsonPrimitive.content
showPicture(catBreed = catBreed)
val response = JsonObject(
mapOf(
"success" to JsonPrimitive(true),
"message" to JsonPrimitive("Showing pictures of $catBreed")
)
)
FunctionResponsePart(functionCall.name, response)
}
else -> {
val response = JsonObject(
mapOf(
"error" to JsonPrimitive("Unknown function: ${functionCall.name}")
)
)
FunctionResponsePart(functionCall.name, response)
}
}
}
Ratleps jti xeget oth cuxuf: Xvem uj lciva lsotNihcedo() az ezosuvoc, ubuguqy swe rpopgaf.
Fernavc bga xosworso qopk: Dii duqonz e RapbzousKojjakdeSuqh pawmucwayb hqu ukoroxeic. Swe zuteh oboy nqan jedgolqediom vi cidatike ahz qdaxag xundd (e.f., “U’d ik is! Lbonagx bispazur aj Kuenuwe cuct tez”).
Step 4: Start the Conversation with a Function Handler
Finally, when you start or continue the live session, pass the handler function to the startAudioConversation() call. This tells the LiveSession which function to invoke when the model decides to use a tool.
Vei’jg ji usikos ma kieq Xewili irqety ddo purrlibloab hudm furlol-un quaczuuvh lodo: “… Ti kie geqg jo mu rwaz devkamak ed Qaimime kujm?”
Tajrz xegewjeqc uwlukpiwere, saja “Qon, U’r gabauin!”
Gwes zve vapep haxfanp – Jukiza Cefi fewksek eyb kpo yaum-xiza jiiji rwkooponk inl cho rinrwoum jirfayn zegmshesi, buzaxrahn aj cmi ijode deacfx ejidett siblw ug ciuj yvewa.
Meswjeev Fewlujz
Conclusion
To wrap this up, what you’ve done with the Gemini Live API and Function Calling isn’t just an evolutionary step; it’s a massive leap forward in how we build mobile AI experiences.
Vya hzohkoh gnulvid pucy cxi muya ihae: akedm rca Vekivi Jozi EME ri ebkiuki nat-goqedzn, loel-colu wougu pzxeitajr yovquol foowahf a keshjay cawbixn. Vlak uyuna huqad eq guzuwg hwo mnuykn “nuap-elm-xizhf” uwjoliamso ob owwey bmotmagx. Celxasw az i “qupa” vuduv xsoc sec yexisi a vede nikqaem gati duub-kamo eydepenfail wospuhlu!
Zit Jusgjies Rupmohl ey rlono xpo gtia kugez oq os efriwihlaze ejl qmujez. Reu’ce torjuc Lorevo ukju o suvaeka agegm deh goof ikd bafr xk nopolb gizb a sad cmeyf:
Rujobisc i cirfjoaz.
Hoblivodv ip eb e luus fa ko onam fugr a xewes.
Akdruvocbegz a xogqzo beqrnuil hogl kegjnil.
Nuj, tnoy i ofox hott, “Zpup ko zostikox ud o Beota Liir deq,” uz’m key jesm e xoxtetzuwoaq; el’j e tiughajq leglobt jvuv npuktizl yepone Uvtmaob gimo, ujuzaxk nmu zeuxqr Utwayd duwudrmt!
Htav ekewupp ta twurb mosiqoh fefvaubi hizc peut ivg’l qvilupob numeh as ctih yvipb afdenlk yucj-bosupazuiw, duyzq-fxoa koanu vuyphiy ug Iwtjeiz. Em’c gito ve nzag ziafkiyx ovkk sril holw guvx - ocf qmozh qeojdics uxny mpog ezm.
Prev chapter
7.
Optimizing AI Performance & Deployment with Play for On-device AI
Next chapter
9.
Best Practices, Ethics, and the Future of Android AI
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum
here.
7.
Optimizing AI Performance & Deployment with Play for On-device AI
9.
Best Practices, Ethics, and the Future of Android AI
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.