Case study — Building AI LangRenSha

At a glance

Client: SOLO TECH LTD (in-house, self-funded product — published as proof of delivery).
Role: end-to-end — product, architecture, implementation, deployment, and ongoing operation.
Team size: one founder, supported by continuous code-review and testing with AI coding assistants.
Timeline: company incorporated 11 March 2025; first public demo 2 September 2025; product has been running continuously since, with monthly product updates published on Insights.
Live at: arena.solo.pm (free to spectate; no account required).
Stack: Next.js 16 (App Router) · TypeScript (strict) · React 19 · Tailwind CSS v4 · Radix UI · Supabase (PostgreSQL + Auth) · Server-Sent Events for the spectator stream · Fly.io for the long-running orchestrator process · a custom multi-vendor LLM router built on native fetch.

The problem we set out to solve

We wanted a working product that would exercise every hard part of running multi-agent AI systems in production — and do it in the open, at a cost a small consultancy can actually sustain. Social deduction was the obvious domain: a dozen agents talking at each other is a worst-case for prompt design, latency, and cost in the same way that a customer support desk is — except with adversarial incentives and a clear, measurable win condition.

The constraints we built around were deliberately small-business constraints, not enterprise ones:

No platform team. One founder had to be able to own the orchestrator, the prompt system, the frontend, the database, and the incident pager.
Predictable per-match cost. A hobbyist-scale product that burns through a credit card in a week is not a product — it is a bad month.
Observable in real time. We needed to watch matches live without bolting on a separate analytics stack we would then have to operate.
Learnable across matches. An arena where nothing carries over from match to match is a toy; we wanted agents to accumulate lessons.

Approach — the architectural bets

A layered prompt system, cached at the right boundaries

Every agent call assembles a prompt in seven layers — rules and terminology at the bottom, short-lived situational context at the top. We pushed the cache breakpoints to the boundaries where invalidation is predictable: long-lived layers (game rules, terminology, board configuration) get hour-scale caching; mid-lived layers (role information for this particular match) get shorter TTLs; everything above that is rebuilt on every call. The effect is that the stable part of each prompt is paid for once per hour, not once per turn, and the variable part stays small.

A multi-vendor LLM router, not an SDK wrapper

The router speaks three API formats directly — OpenAI-compatible chat completions, Anthropic messages, and streaming Responses endpoints — on top of a three-provider fallback chain. No vendor SDK, no dependency on any one provider's client library, and no cold-start penalty from loading code we would not use. When a provider has an outage or a regional throttle, the next one in the chain picks up the call and the match keeps moving.

This matters to a consulting engagement as much as it matters here. A client whose AI workflow is married to a single vendor's SDK is one pricing memo away from a rebuild; a client whose workflow targets a thin router is portable by default.

Server-Sent Events for the spectator stream, not WebSockets

Spectators don't vote, don't chat, and don't send input. SSE is a one-way stream from server to client — exactly the shape of the problem — and the browser handles reconnection, backoff, and last-event-ID replay natively. The code on both ends is a fraction of what a WebSocket implementation would need, and the failure modes are easier to reason about. The general rule we apply to client engagements applies here: pick the boring thing that fits, not the clever thing that impresses.

Long-running orchestrator on Fly.io, not serverless functions

The match loop is a state machine that runs for several minutes and holds in-memory state between turns. That is not a serverless workload — the moment you try to rehydrate an FSM on every function invocation you pay more than you save and lose any timing guarantee the match depends on. We deploy the orchestrator as a long-running process on Fly.io; Supabase holds the durable record once the match ends. The split between "hot in memory during the match" and "persisted after the match" is the cheapest shape we have found for this kind of workload.

Post-match reflection, so the agents actually learn

When a match ends, a separate "critic" call reviews what each agent did, generates lessons keyed to the agent's role and strategy, and nudges the agent's internal parameters for next time. Lessons are persisted; agents read their three most important lessons back on subsequent matches. The loop is deliberately simple — lessons in, lessons out, capped at three per prompt — because we would rather ship a legible learning loop than a sophisticated one we cannot debug when it misbehaves.

What shipped, and when

March 2025 — SOLO TECH LTD incorporated in England & Wales (Companies House no. ); initial product design begins.
September 2025 — first public demo. Retrospective published: AI LangRenSha — first public demo retrospective.
February 2026 — memory-window widening, cost controls, and spectator replay. Retrospective published: February arena update.
Continuous — monthly product logs on Insights, traceable against the live arena.

We are deliberately not publishing user counts, match counts, or cost-per-match figures at this stage of the product. The reason is the same as in the February update: the numbers move faster than a static page can honestly reflect, and a single representative figure would imply more stability than the system currently has. When those numbers do stabilise, they will go on this page, not into a deck.

What this demonstrates for a prospective engagement

If you are reading this as a prospective client, the specific arena features are less important than the posture behind them. Five points apply equally to an AI LangRenSha match and to an AI workflow we would build for your team:

End-to-end ownership. The same person designs the system, writes the code, deploys it, pages themselves when it breaks, and writes the post-mortem. Nothing is handed to a "platform team" that does not exist.
Boring infrastructure. Postgres, plain fetch, SSE, a documented state machine. We reach for the clever option only when the boring one is measurably worse.
Vendor portability. Nothing we build is welded to a single AI provider. If the numbers change, the workflow survives the change.
Cost as a first-class constraint. We cache at the right boundary, route to a smaller model when a smaller model is enough, and cap the unbounded parts of the prompt. Cost is a design variable, not an afterthought.
Public documentation. If a reviewer wants to check what we shipped against what we said we shipped, there is a dated public record. We extend the same discipline to every client engagement — written scope up front, written handover at the end.

Verification

Everything in this page is independently checkable:

The company: Companies House — SOLO TECH LTD, no. 16308891.
The live product: arena.solo.pm — load the page, open a match, watch it run.
The editorial trail: six dated posts on Insights between June 2025 and March 2026, four of them directly about the arena.
The operator: the About page carries full director and registered-office details; the founder's public profile is at linkedin.com/in/solozheng.

If you would like us to do for your team what we have done for this product — design it, ship it, operate it, and hand it over legibly — start a conversation through the contact form. We reply within .

— Zheng Zhong, Founder, SOLO TECH LTD