Skip to main content
itschijong//v.04
let's talk
Tooling / InfrastructureActive

Sunny — Local AI Agent Stack

Local AI agent with multi-layer memory, built on Ollama + Qwen + FastAPI

Solo build·2025·Active
OllamaQwenFastAPIDockerMulti-layer memory architecture

Context

Most AI agent setups depend entirely on cloud APIs. I wanted to understand — by building it — what it takes to run a capable local agent stack with persistent memory, orchestration, and tool use, end to end on my own machine.

What I'm doing

Building Sunny, a local AI stack with Ollama serving Qwen models behind a FastAPI gateway, with a multi-layer memory architecture (short-term working memory, mid-term episodic memory, long-term semantic memory) and a mission-control layer for orchestrating multiple specialized agents.

How it works

Sunny — System Architecture

Mission Control

Agent orchestration layer

1

Specialized Agents

Research · Code · Plan

2

Memory Architecture

Short-term · Episodic · Semantic

3

FastAPI Gateway

Request routing + tool use

4

Ollama + Qwen

Local model inference

5

The gateway routes requests, the memory layers hydrate context based on relevance and recency, and the agent layer runs specialized loops (research, code, plan).

Status

Honestly, this is a work in progress. Some Docker / Hyper-V networking issues are still unresolved at the time of writing. I'm publishing this case study before it's "done" because the architecture itself is the lesson, and pretending it's finished when it isn't is the failure mode I want to avoid.