Sunny — Local AI Agent Stack
Local AI agent with multi-layer memory, built on Ollama + Qwen + FastAPI
Context
Most AI agent setups depend entirely on cloud APIs. I wanted to understand — by building it — what it takes to run a capable local agent stack with persistent memory, orchestration, and tool use, end to end on my own machine.
What I'm doing
Building Sunny, a local AI stack with Ollama serving Qwen models behind a FastAPI gateway, with a multi-layer memory architecture (short-term working memory, mid-term episodic memory, long-term semantic memory) and a mission-control layer for orchestrating multiple specialized agents.
How it works
Sunny — System Architecture
Mission Control
Agent orchestration layer
Specialized Agents
Research · Code · Plan
Memory Architecture
Short-term · Episodic · Semantic
FastAPI Gateway
Request routing + tool use
Ollama + Qwen
Local model inference
The gateway routes requests, the memory layers hydrate context based on relevance and recency, and the agent layer runs specialized loops (research, code, plan).
Status
Honestly, this is a work in progress. Some Docker / Hyper-V networking issues are still unresolved at the time of writing. I'm publishing this case study before it's "done" because the architecture itself is the lesson, and pretending it's finished when it isn't is the failure mode I want to avoid.