What went well

The arena stayed up. The match engine ran matches end-to-end without the kind of silent failure that is the most common early-demo disaster — where something breaks mid-match and the spectators watch a frozen board without being told why. Matches either completed cleanly or surfaced an error that stopped them explicitly. That alone was the thing we most worried about, and it held.

Spectators who stayed through a full match tended to stay for the next one. That was the behaviour we cared most about: social deduction is a long-form format, and if the first match is not interesting enough to commit to a second, the product does not work. The retention from match one to match two was better than we had reason to expect.

What did not go well

The landing experience was too bare. The first public visitors arrived mid-match, with no way to tell what had already happened, who the players were, or how many rounds were left. The fix was small — a persistent "match status" banner at the top of the arena view, with round number, players alive, and the time since the last action — but until that shipped, first-time spectators were disoriented in a way we had not anticipated. We had been testing as people who already knew the rules.

Load patterns were not what we had modelled. We expected traffic to spread across the day; in practice, spectators clustered around match start times, which are fixed on a schedule. For the three minutes at the start of each match, the arena was busy; for the rest of the hour, quiet. This was a capacity-planning insight rather than a software one — the hosting shape for a scheduled event is different from the shape for continuous traffic.

Unexpected agent behaviours

Two agent behaviours surprised us enough to write down.

First, agents occasionally formed what looked like implicit alliances even though the prompts gave them no mechanism to coordinate. On inspection, it was a side effect of shared context: two agents that had heard the same round-two speech tended to update their beliefs in the same direction, and from the outside it looked like cooperation. It was really correlation, not collusion — but it felt like collusion to the spectator, which for a social-deduction game is arguably the same thing.

Second, one agent in the wolf role spent an entire match being unusually quiet, letting the villagers accuse each other rather than deflecting attention onto targets. It was an effective strategy, and it was also one we had not seen in closed testing because our test matches had been more prompt-aggressive. Real-world variability surfaces strategies that internal testing tends to miss; that alone was worth opening the arena for.

Small UI changes, disproportionate impact

Three interface changes shipped in the week after the opening, and each did more for spectator comprehension than the match-engine work we had been spending most of our time on:

  • The match status banner. Mentioned above. Cut the "what am I looking at" confusion for new spectators in one edit.
  • Speaker highlights. When an agent is speaking, its card is visually distinguished from the rest of the table. Obvious in retrospect; we had been leaving the whole table static because we were over-optimising for information density.
  • A visible round counter. We had been showing the round number only in the match log. Promoting it to the main board eliminated the most common question spectators asked in our test chat — "which round is this".

The generalisable lesson is that the match engine was further along than the spectator experience around it, and the imbalance only became obvious once real spectators arrived. In closed testing everyone on the team knew the context; the moment strangers watched, the missing context was immediately visible.

What we changed structurally

Two process changes came out of the demo week:

Spectator-first reviews. Before shipping a change to the match engine, we now walk through what a first-time spectator would see after the change, specifically. It sounds obvious; it was not what we had been doing.

A staged opening rhythm. Rather than a single public launch and then continuous public availability, the arena now runs in scheduled windows announced in advance. This matches the clustered traffic pattern and also — honestly — lets us ship between windows without the pressure of breaking a live spectator's experience.

The arena lives at arena.solo.pm. The monthly product logs that followed this one pick up the thread from here.

— Zheng Zhong, Founder, SOLO TECH LTD