Self-Made Gemini AI Agent with Termux, All on Your Smartphone!
📅 February 19, 2026 | 📁 Development Experience
1. Introduction: Genspark's cost was a concern
This blog, "Genspark Development Struggles," is built using Genspark for the blog system itself. I entrust Genspark with everything from formatting articles to posting them. It's very convenient, but there's a cost associated with its use. The fact that credits decrease every time I add an article was honestly a source of worry.
So, I started looking for various tools to see if I could handle "maintenance and article additions with a different tool." However, perhaps because my requirements were unusual, I couldn't find anything that quite fit.
- I absolutely want to complete everything on my smartphone.
- The blog's source is managed on GitHub, published on Cloudflare, and articles are stored in a D1 database.
- I want to give instructions in natural language and automate the entire process up to deployment.
Cursor and Claude Code are both PC-based and, in any case, paid. Firebase seemed promising in terms of environment, but the AI agent within it had low capabilities and couldn't access external services like GitHub.
2. Searching for a smartphone-only development environment
The desire to complete development work solely on a smartphone might sound unreasonable at first glance. However, having been a software engineer for 20 years, I had the selfish reason of "not wanting to open my computer anymore."
I tried several existing mobile development tools, but none met the requirements, either lacking GitHub access or the ability to execute commands with natural language. That led me to Termux, an app that allows you to set up a Linux environment on an Android smartphone.
3. Encountering Termux and the struggle of installation
Termux is an app that allows you to run a Linux terminal environment on Android. Since it supports standard tools like bash, python3, git, and curl, theoretically, much of what can be done on Linux can be achieved on a smartphone.
The installation itself was quite a hurdle. I initially installed the Play Store version, but after hearing about its stalled development, I reinstalled the F-Droid version. The installation would freeze midway, with nothing happening for long periods, but for some reason, it completed smoothly on the second attempt. Furthermore, I successfully set up the environment after enduring several security warnings from Google.
pkg update && pkg upgrade and install basic tools with pkg install git python curl.
4. Decided to build my own AI agent
Once the environment was set up, the next step was a tool that could be controlled with natural language. I looked for AI chat tools (like "ai chat") that could be used with Termux, but it seemed none were compatible or could be installed. The official Gemini CLI tool was also a no-go.
So, I changed my approach and decided to "directly hit the Gemini API and build my own agent." The only required ingredients were bash, python3, and curl. All of these are standard in Termux.
The AI agent itself was developed in consultation with Claude (Anthropic). After much trial and error, debugging, and rethinking the design, it finally reached a working state as v1.08.
5. Agent processing flow
The basic processing flow of the completed agent is as follows:
User instruction → ① Gemini analyzes instruction and determines command → Command execution (git, curl, npm, etc.) → ② Gemini judges/explains result → User selects continue/terminate/additional instruction
The key is the design that "calls Gemini twice." The first call (①) determines what to do and executes the command. The second call (②) evaluates the result and determines if the task is complete or if the next step is needed. Based on Gemini's judgment in ②, the user can choose y (continue) / n (exit) / or provide additional instructions with a free message.
TYPE: COMMAND— Execute a shell commandTYPE: SEARCH— Search with Brave API or DuckDuckGoTYPE: IMAGE— Analyze a screenshotTYPE: NEED_INPUT— Prompt user for inputTYPE: DONE— Task completedTYPE: NEXT— Continue to the next step
Initially, I used Gemini 2.5 Flash, but I frequently encountered situations where its capabilities weren't sufficient, so I switched to Gemini 3.0 Flash Preview (gemini-3-flash-preview). Since one API key supports all models, you can switch by simply changing the model name.
6. What was achieved and what wasn't
- Access to GitHub (clone, push, pull, etc.)
- Automated deployment using GitHub Actions (automatic reflection to Cloudflare on push)
- Configuration and history management per project (.ai_config / .ai_history)
- Minor source code modifications and file operations
- Creating long articles (due to Gemini API token limitations, article creation is done with Genspark's AI chat)
- Direct deployment to Cloudflare using Wrangler (abandoned due to difficulty installing Wrangler on Termux, replaced with CI/CD via GitHub Actions)
At first, I felt like "I wanted a big motorcycle, but after much effort, I ended up with a moped" (lol). However, being able to instruct GitHub integration, build, and deployment processes from a smartphone alone using natural language is quite practical.
7. Who is this recommended for?
I've considered who this approach might be particularly suitable for.
- SEs and engineers who travel frequently — Can be used for minor corrections and verification tasks on trains or during business trips.
- Developers who don't want to carry a PC — Those who want to complete minor development tasks with just one smartphone.
- Those familiar with Linux/bash — You can customize the script to tailor the environment to your liking.
- Those who want to try the Gemini API — You can easily check the API's behavior without needing a framework.
Conversely, it's not suitable for large-scale CLI automation or complex frontend development. It fits the purpose of "lightly operating GitHub or changing settings on a smartphone."
8. Complete script (full disclosure)
Here is the complete script, finished after much trial and error. It was created using Claude's API. I haven't thoroughly checked its contents, so please **use it at your own risk.**
Set
GOOGLE_API_KEY as an environment variable.Add
export GOOGLE_API_KEY='your-api-key' to ~/.bashrc and execute source ~/.bashrc.You can obtain an API key from Google AI Studio.
Save the script as ~/bin/ai and grant execute permissions with chmod +x ~/bin/ai.
#!/bin/bash
# AI Assistant v1.08
# Mobile Development AI Assistant powered by Gemini
# ============================================================
# Constants / Environment Variables
# ============================================================
GOOGLE_API_KEY="${GOOGLE_API_KEY:-}"
BRAVE_API_KEY="${BRAVE_API_KEY:-}"
MODEL="gemini-3-flash-preview"
API_URL="https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:generateContent"
SCREENSHOT_DIR="${HOME}/storage/pictures/Screenshots"
CURRENT_PROJECT_FILE="${HOME}/.ai_current_project"
IMAGE_SCRIPT="${HOME}/bin/ai_image.py"
TASK_RESULT_FILE="${HOME}/tmp/ai_task_result.tmp"
HISTORY_FILE="${HOME}/.ai_history"
TMP_DIR="${HOME}/tmp"
PAYLOAD_FILE="${TMP_DIR}/ai_payload.tmp"
# ============================================================
# Project Management
# ============================================================
PROJECT_NAME_CURRENT=""
PROJECT_DIR_CURRENT=""
PROJECT_CONFIG=""
PROJECT_CONTEXT=""
load_config() {
PROJECT_NAME_CURRENT=""
PROJECT_DIR_CURRENT=""
PROJECT_CONFIG=""
PROJECT_CONTEXT=""
HISTORY_FILE="${HOME}/.ai_history"
if [[ -f "$CURRENT_PROJECT_FILE" ]]; then
local project_path
project_path=$(cat "$CURRENT_PROJECT_FILE")
if [[ -n "$project_path" && -d "$project_path" ]]; then
PROJECT_DIR_CURRENT="$project_path"
PROJECT_NAME_CURRENT=$(basename "$project_path")
HISTORY_FILE="${PROJECT_DIR_CURRENT}/.ai_history"
[[ -f "${PROJECT_DIR_CURRENT}/.ai_config" ]] && PROJECT_CONFIG=$(cat "${PROJECT_DIR_CURRENT}/.ai_config")
[[ -f "${PROJECT_DIR_CURRENT}/.ai_context" ]] && PROJECT_CONTEXT=$(cat "${PROJECT_DIR_CURRENT}/.ai_context")
while IFS='=' read -r key value; do
[[ "$key" =~ ^[[:space:]]*# ]] && continue
[[ -z "$key" ]] && continue
key=$(echo "$key" | tr -d ' ')
value=$(echo "$value" | tr -d '"' | tr -d "'")
export "$key=$value"
done < <(grep '=' "${PROJECT_DIR_CURRENT}/.ai_config" 2>/dev/null)
fi
fi
}
restore_last_project() { load_config; }
detect_project_switch() {
local input="$1"
echo "$input" | grep -qiE "プロジェクト.*(切|替|変)|switch.*project|project.*switch"
}
# ============================================================
# System Prompt
# ============================================================
build_system_prompt() {
cat << SYSPROMPT
You are a mobile development support AI assistant.
## [Configuration File] Project Information
- Project Name: ${PROJECT_NAME_CURRENT:-Not selected}
- Project Directory: ${PROJECT_DIR_CURRENT:-Not set}
- History File: ${HISTORY_FILE}
## [Configuration File] .ai_config
${PROJECT_CONFIG:-No settings}
## [Configuration File] .ai_context
${PROJECT_CONTEXT:-No context}
## Command Execution Rules (Most important, strict observance)
- You must execute git, curl, npm, cp, mv, etc., by yourself in COMMAND format.
- "Please execute it yourself" is absolutely forbidden.
- Do not use the cd command alone; use the format "cd /path && next command".
## Response Format (Output TYPE line first)
TYPE: COMMAND / SEARCH / IMAGE / NEED_INPUT / DONE / NEXT
SYSPROMPT
}
build_continuation_prompt() {
local original_task="$1"
local last_result="$2"
cat << CONTPROMPT
## [Original Task]
${original_task}
## [Previous Execution Result]
${last_result}
Is the task complete? Explain the result clearly and then decide on continuation.
Response Format: TYPE: DONE / NEXT / NEED_INPUT
CONTPROMPT
}
# ============================================================
# Gemini API Call
# ============================================================
_call_gemini_with_prompt() {
local user_message="$1"
local system_prompt history_content payload response text
system_prompt=$(build_system_prompt)
[[ -f "$HISTORY_FILE" ]] && history_content=$(tail -50 "$HISTORY_FILE")
payload=$(python3 -c "
import json, sys
sp = sys.argv[1]; msg = sys.argv[2]; hist = sys.argv[3]
full_sys = sp + '\n\n## [Past Conversation History]\n' + (hist if hist else 'No history') + '\n---End of past history---'
print(json.dumps({'system_instruction':{'parts':[{'text':full_sys}]},'contents':[{'role':'user','parts':[{'text':msg}]}],'generationConfig':{'temperature':0.7,'maxOutputTokens':2048}}))
" "$system_prompt" "$user_message" "$history_content" 2>/dev/null)
[[ -z "$payload" ]] && echo "ERROR: payload build failed" >&2 && return 1
echo "$payload" > "$PAYLOAD_FILE"
response=$(curl -s -X POST "${API_URL}?key=${GOOGLE_API_KEY}" \
-H "Content-Type: application/json" -d @"$PAYLOAD_FILE" 2>/dev/null)
[[ -z "$response" ]] && echo "ERROR: empty response" >&2 && return 1
text=$(echo "$response" | python3 -c "
import json,sys
try:
d=json.load(sys.stdin); print(d['candidates'][0]['content']['parts'][0]['text'])
except Exception as e:
print('ERROR:'+str(e),file=sys.stderr); sys.exit(1)
" 2>/dev/null)
[[ -z "$text" ]] && echo "ERROR: text extract failed" >&2 && return 1
echo "$text"
}
call_gemini() {
_call_gemini_with_prompt "$(printf '## [Latest Instruction]\n%s' "$1")"
}
call_gemini_continuation() {
_call_gemini_with_prompt "$(build_continuation_prompt "$1" "$2")"
}
# ============================================================
# History Management
# ============================================================
save_history() {
printf "[%s] %s: %s\n" "$(date '+%Y-%m-%d %H:%M:%S')" "$1" "$2" >> "$HISTORY_FILE"
local lc; lc=$(wc -l < "$HISTORY_FILE" 2>/dev/null || echo 0)
[[ "$lc" -gt 200 ]] && tail -100 "$HISTORY_FILE" > "${HISTORY_FILE}.tmp" && mv "${HISTORY_FILE}.tmp" "$HISTORY_FILE"
}
# ============================================================
# Web Search
# ============================================================
search_web() {
local query="$1" results=""
local enc; enc=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$query" 2>/dev/null)
[[ -n "$BRAVE_API_KEY" ]] && results=$(curl -s \
-H "X-Subscription-Token: ${BRAVE_API_KEY}" \
"https://api.search.brave.com/res/v1/web/search?q=${enc}&count=3" 2>/dev/null | \
python3 -c "
import json,sys
try:
d=json.load(sys.stdin)
for r in d.get('web',{}).get('results',[])[:3]:
print(f'Title: {r.get(\"title\",\"\")}\nURL: {r.get(\"url\",\"\")}\nDescription: {r.get(\"description\",\"\")}\n---')
except: pass
" 2>/dev/null)
[[ -z "$results" ]] && results=$(curl -s "https://html.duckduckgo.com/html/?q=${enc}" 2>/dev/null | \
grep -o '<a class="result__a"[^>]*>[^<]*</a>' | head -3 | sed 's/<[^>]*>//g')
echo "${results:-No search results}"
}
# ============================================================
# Image Analysis
# ============================================================
handle_image_request() {
local image_path="$1"
[[ "$image_path" == "latest" || -z "$image_path" ]] && \
image_path=$(ls -t "$SCREENSHOT_DIR"/*.png "$SCREENSHOT_DIR"/*.jpg 2>/dev/null | head -1)
[[ -z "$image_path" || ! -f "$image_path" ]] && echo "Image file not found" && return 1
python3 "$IMAGE_SCRIPT" "$image_path" "Please describe this image in detail" 2>/dev/null || echo "Image analysis error"
}
# ============================================================
# Command Execution
# ============================================================
execute_command() {
local cmd="$1"
local dangerous_patterns=("rm -rf /" "mkfs" "dd if=" ":(){ :|:& };" "> /dev/sda")
for pattern in "${dangerous_patterns[@]}"; do
echo "$cmd" | grep -qF "$pattern" && echo "⚠️ Dangerous command blocked: $cmd" && return 1
done
if echo "$cmd" | grep -qE "^rm "; then
echo -n "⚠️ Do you want to execute a delete command? ($cmd) [y/N]: "
read -r confirm
[[ "$confirm" != "y" && "$confirm" != "Y" ]] && echo "Canceled" && return 1
fi
echo "🔧 Executing: $cmd"
local result exit_code
result=$(eval "$cmd" 2>&1); exit_code=$?
echo "$result"; echo "Exit code: $exit_code"
printf "%s\nExit code: %d" "$result" "$exit_code" > "$TASK_RESULT_FILE"
return $exit_code
}
# ============================================================
# Response Parsing / Dispatch
# ============================================================
dispatch_response() {
local response="$1"
local task_type task_content
task_type=$(echo "$response" | grep "^TYPE:" | head -1 | sed 's/^TYPE: *//' | tr -d '\r')
task_content=$(echo "$response" | grep -v "^TYPE:" | sed '/^[[:space:]]*$/d' | head -1)
echo "📌 [DEBUG] task_type=[$task_type]"
case "$task_type" in
COMMAND)
save_history "AI" "COMMAND execution: $task_content"
execute_command "$task_content"
save_history "SYSTEM" "Execution result: $(cat "$TASK_RESULT_FILE" 2>/dev/null | head -c 300)"
return 3 ;;
SEARCH)
echo "🔍 Searching: $task_content"
local results; results=$(search_web "$task_content")
echo "$results"; printf "%s" "$results" > "$TASK_RESULT_FILE"
save_history "SYSTEM" "Search results: ${results:0:300}"
return 3 ;;
IMAGE)
local img_result; img_result=$(handle_image_request "$task_content")
echo "$img_result"; printf "%s" "$img_result" > "$TASK_RESULT_FILE"
return 3 ;;
NEED_INPUT)
echo "❓ $task_content"; echo -n "Answer: "; read -r user_answer
save_history "USER" "$user_answer"
local new_response; new_response=$(call_gemini "$user_answer")
dispatch_response "$new_response"; return $? ;;
DONE)
echo "✅ $task_content"; save_history "AI" "Completed: $task_content"; return 0 ;;
NEXT)
echo "⏭️ Next: $task_content"
printf "%s" "$task_content" > "$TASK_RESULT_FILE"; return 3 ;;
*)
echo "💬 $response"; save_history "AI" "${response:0:200}"; return 0 ;;
esac
}
# ============================================================
# Main Task Loop
# ============================================================
execute_task() {
local user_input="$1" max_iterations=10 iteration=0
local original_task="$user_input" current_input="$user_input"
save_history "USER" "$user_input"
detect_project_switch "$user_input" && echo "🔄 Project switch detected..." && load_config
while [[ $iteration -lt $max_iterations ]]; do
((iteration++))
echo ""; echo "═══ Step $iteration ═══"
echo "🌐 ①Analyzing with Gemini..."
local response; response=$(call_gemini "$current_input")
[[ -z "$response" ]] && echo "❌ ①Gemini error" && return 1
dispatch_response "$response"
local action_status=$?
case $action_status in
0) echo "✅ Task completed"; return 0 ;;
1) echo "❌ Error"; return 1 ;;
3)
echo ""; echo "🌐 ②Gemini continuing judgment..."
local last_result; last_result=$(cat "$TASK_RESULT_FILE" 2>/dev/null || echo "No result")
local cont_response; cont_response=$(call_gemini_continuation "$original_task" "$last_result")
[[ -z "$cont_response" ]] && echo "❌ ②Gemini error" && return 1
local cont_type cont_content
cont_type=$(echo "$cont_response" | grep "^TYPE:" | head -1 | sed 's/^TYPE: *//' | tr -d '\r')
cont_content=$(echo "$cont_response" | grep -v "^TYPE:" | sed '/^[[:space:]]*$/d' | head -1)
echo ""; echo "── ②Gemini judgment: [$cont_type] ──"; echo "$cont_content"; echo ""
echo -n "Continue? [y=continue / n=exit / message=add instruction]: "; read -r user_choice
case "$user_choice" in
n|N) echo "Interrupted"; return 0 ;;
""|y|Y)
case "$cont_type" in
DONE) echo "✅ Completed: $cont_content"; save_history "AI" "Completed: $cont_content"; return 0 ;;
NEXT|COMMAND|SEARCH|IMAGE) current_input="$cont_content" ;;
NEED_INPUT)
echo "❓ $cont_content"; echo -n "Answer: "; read -r user_answer
save_history "USER" "$user_answer"; current_input="$user_answer" ;;
*) echo "💬 $cont_response"; return 0 ;;
esac ;;
*) save_history "USER" "$user_choice"; current_input="$user_choice" ;;
esac ;;
esac
done
echo "⚠️ Max steps ($max_iterations) reached"; return 0
}
# ============================================================
# Chat Mode
# ============================================================
chat_mode() {
echo "🤖 AI Assistant v1.08 | Project: ${PROJECT_NAME_CURRENT:-Not selected}"
echo "exit=Exit / clear=Clear history / history=Display history / project <dir>=Switch project"
echo "════════════════════════════════════════"
while true; do
echo ""; echo -n "You: "; read -r user_input
case "$user_input" in
exit|quit) echo "Exiting"; break ;;
clear) > "$HISTORY_FILE"; echo "✅ History cleared (${HISTORY_FILE})" ;;
history) cat "$HISTORY_FILE" 2>/dev/null || echo "No history" ;;
project\ *)
local np="${user_input#project }"
if [[ -d "$np" ]]; then echo "$np" > "$CURRENT_PROJECT_FILE"; load_config
echo "✅ Project switched: $PROJECT_NAME_CURRENT | History: $HISTORY_FILE"
else echo "❌ Directory not found: $np"; fi ;;
"") continue ;;
*) execute_task "$user_input" ;;
esac
done
}
# ============================================================
# Image Script Generation
# ============================================================
generate_image_script() {
cat > "$IMAGE_SCRIPT" << 'IMGSCRIPT'
#!/usr/bin/env python3
import sys, base64, json, urllib.request, os
def analyze_image(path, inst="Please describe this image in detail"):
key = os.environ.get("GOOGLE_API_KEY","")
if not key: print("GOOGLE_API_KEY not set"); return
with open(path,"rb") as f: img=base64.b64encode(f.read()).decode()
ext=path.rsplit(".",1)[-1].lower()
mime={"jpg":"image/jpeg","jpeg":"image/jpeg","png":"image/png","gif":"image/gif","webp":"image/webp"}.get(ext,"image/jpeg")
payload={"contents":[{"parts":[{"inline_data":{"mime_type":mime,"data":img}},{"text":inst}]}],"generationConfig":{"temperature":0.7,'maxOutputTokens':1024}}
url=f"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key={key}"
req=urllib.request.Request(url,data=json.dumps(payload).encode(),headers={"Content-Type":"application/json"})
try:
with urllib.request.urlopen(req) as r: print(json.loads(r.read())["candidates"][0]["content"]["parts"][0]["text"])
except Exception as e: print(f"Error: {e}")
if __name__=="__main__":
if len(sys.argv)<2: print("Usage: ai_image.py <path> [instruction]"); sys.exit(1)
analyze_image(sys.argv[1], sys.argv[2] if len(sys.argv)>2 else "Please describe this image in detail")
IMGSCRIPT
chmod +x "$IMAGE_SCRIPT"
}
# ============================================================
# Main
# ============================================================
main() {
[[ -z "$GOOGLE_API_KEY" ]] && echo "❌ GOOGLE_API_KEY not set" && exit 1
mkdir -p "$TMP_DIR"
[[ ! -f "$IMAGE_SCRIPT" ]] && generate_image_script
restore_last_project
[[ $# -gt 0 ]] && execute_task "$*" || chat_mode
}
main "$@"
9. Summary
Seeking a smartphone-only development environment, I arrived at the combination of Termux × Gemini API. While its uses are limited for the effort invested, once you experience "operating GitHub in natural language from a smartphone terminal," you won't want to let it go.
- Minimal AI agent operating only with bash + python3 + curl
- Ensures autonomy through a two-call Gemini design (① analysis, ② result judgment)
- Separates settings and history per project (.ai_config / .ai_history)
- Achieves automatic deployment with GitHub Actions even without Wrangler
This script is not exclusive to Termux; it will also run on regular Linux/Mac by simply changing the SCREENSHOT_DIR path. I hope it serves as a reference for those who want to easily try the Gemini API without a framework.
📚 Related Articles
- External Tools Frequently Used with Genspark (GitHub / Cloudflare / Cron)
- Auto-generating CLI Tools with Genspark: Practical Automation Techniques to Eliminate Repetitive Tasks
- The Fear of Code Loss: The Importance of Git Version Control in AI Development
- Is Claude Code Still Too Early? How I Built a "Blog System" with Free Genspark
- Is Genspark Full of Bugs? The "Seven Hells" Developers Faced and Workarounds to Overcome Them
📅 2026/02/20 Update: The Official Gemini CLI Finally Works!
After completing my custom script, I excitedly tried to register this article in the database, but I hit a major wall. The HTML file size was too large, and my custom script couldn't handle it, failing to register the article... As expected, there were limits to a makeshift tool.
I felt frustrated, thinking "Is this the limit for smartphone development?", but I persistently asked Genspark again about "how to run Gemini CLI on Termux". Surprisingly, a solution was presented!
npm install -g @google/gemini-cli --ignore-scripts
With this magic option, the installation succeeded! Upon launching and talking to it, the model in the bottom right corner was "Gemini 2.5", which gave slightly disappointing answers. When I asked "Can I change the model to 3?", I was told there's no direct command, but setting general.previewFeatures to true in the configuration might work.
After changing the setting and restarting as instructed, **Gemini 3 (Flash Preview)** launched successfully! It's far smarter than my custom script and handles large files effortlessly.
In fact, the registration of this article to the D1 database is being done from my smartphone using this **Gemini CLI**. The database registration was completed successfully, and I have finally obtained my ideal smartphone development environment!