Tooling / InfrastructureActive

Sunny — Local AI Agent Stack

Local AI agent with multi-layer memory, built on Ollama + Qwen + FastAPI

Solo build·2025·Active

OllamaQwenFastAPIDockerMulti-layer memory architecture

Context

Most AI agent setups depend entirely on cloud APIs. I wanted to understand — by building it — what it takes to run a capable local agent stack with persistent memory, orchestration, and tool use, end to end on my own machine.

What I'm doing

Building Sunny, a local AI stack with Ollama serving Qwen models behind a FastAPI gateway, with a multi-layer memory architecture (short-term working memory, mid-term episodic memory, long-term semantic memory) and a mission-control layer for orchestrating multiple specialized agents.

How it works

Sunny — System Architecture

Mission Control

Agent orchestration layer

Specialized Agents

Research · Code · Plan

Memory Architecture

Short-term · Episodic · Semantic

FastAPI Gateway

Request routing + tool use

Ollama + Qwen

Local model inference

The gateway routes requests, the memory layers hydrate context based on relevance and recency, and the agent layer runs specialized loops (research, code, plan).

Status

Honestly, this is a work in progress. Some Docker / Hyper-V networking issues are still unresolved at the time of writing. I'm publishing this case study before it's "done" because the architecture itself is the lesson, and pretending it's finished when it isn't is the failure mode I want to avoid.

← All projects