The Runtime is the AI Infra Advantage

You can take any frontier LLM — Claude Opus, GPT-5, Gemini 2.5 Pro — and prompt it to trade. The model will write plausible-sounding trade theses. It will pick reasonable-looking entries. It will even propose stop-loss levels. Then you put real money behind it and watch it lose, slowly and stupidly, for reasons that have nothing to do with whether the model is “smart.”

Senpi · May 20, 2026 · 17 min read
The Runtime is the AI Infra Advantage

Inside Senpi Trading Runtime 1.1.0 -- technical deep dive

Last week we introduced

@senpi_ai Runtime 1.1.0 — 30× faster reaction time, 75% cheaper trades, 80% lower token costs, agents that learn from every move. This week, a deep dive into how it actually works under the hood. If you’re building autonomous trading agents on any venue, this is the architecture that took us six months of live losses to figure out.

Why the runtime is the moat

Every team building autonomous trading agents in 2026 hits the same wall.

You can take any frontier LLM — Claude Opus, GPT-5, Gemini 2.5 Pro — and prompt it to trade. The model will write plausible-sounding trade theses. It will pick reasonable-looking entries. It will even propose stop-loss levels. Then you put real money behind it and watch it lose, slowly and stupidly, for reasons that have nothing to do with whether the model is “smart.”

Alpha Arena ran the experiment in public earlier this year — eight frontier models, $10K each, two-week competitions on US tech stocks. The portfolio lost about a third of its capital. Only 6 of 32 contests ended in profit. Jay Azhang at Nof1 summarized the finding in one line that we’ve quoted everywhere since:

“LLMs can’t really make money by themselves. You need a very sophisticated harness and scaffolding and data platform in order to even give them a chance.” --

@jay_azhang

Two terms worth distinguishing upfront, because we use both:

  • The runtime — the engine room of an agent. The layer that handles execution, risk, exits, and telemetry deterministically, so the LLM can focus on what it’s actually good at: deciding what to trade and when. Senpi Trading Runtime 1.1.0 is our implementation, shipped May 12. This article is about the runtime.

  • The harness — the full vertical AI product. Runtime + native deployment + native chat + Senpi Model + compounding learning loops. The harness comes online over the next 60 days as we bring the entire system in-house. Covered in the “What’s next” section at the end on this article.

The runtime is the foundation.

Here’s what’s actually in ours. 👇

The architectural commitment: separate the deterministic from the judgmental

The most important design decision in Senpi Runtime 1.1.0 is structural, not technical. It is the separation of concerns between what an LLM is good at and what an LLM is catastrophic at.

Image

The LLM appears in exactly one place in this stack: the entry gate. Every other layer runs on hardened code. The result is an agent that acts with the speed and reliability of a quant system, but decides with the contextual judgment of a frontier model.

This is the opposite of how most autonomous trading agents are built. The default pattern — LLM at the center, orchestrating every operation — fails in production for predictable reasons: 5-second LLM scan loops miss moves, ad-hoc Python risk gates silently crash, exit prices get calculated wrong because the LLM misread a number in a tool response, agents enter the same trade twice because two parallel reasoning threads didn’t know about each other.

The runtime exists to make every one of those failure modes structurally impossible.

Layer 1: The Producer SDK — sub-second scanners with no LLM in the loop

A producer is a Python script that runs on a fixed interval, evaluates a trading thesis, and pushes signals to the runtime. It is where every Senpi strategy embodies its edge: what to trade, when to trade it, and why.

In Runtime 1.0, producers were ad hoc. Every strategy author rebuilt the same five primitives — MCP client, signal envelope, reentrancy lock, cache, batch fan-out — from scratch. Twelve strategies meant twelve different implementations, twelve different failure modes, and no shared pattern for new authors to inherit.

Runtime 1.1.0 ships senpi_runtime_helpers, the canonical Python Producer SDK. One import; no subprocesses; stdlib only.

The core skeleton every producer now follows:

from senpi_runtime_helpers import (
    SenpiClient, scanner_lock, tick_cache, producer_daemon,
)

WALLET = os.environ["GRIZZLY_WALLET"]
SCANNER_NAME = "grizzly_signals"
LOCK_NAME = f"grizzly-{WALLET[2:10]}"  # per-wallet — multi-wallet host safe

client = SenpiClient()
mcp = tick_cache(client)  # per-tick TTL memoization

def run_one_tick():
    with scanner_lock(LOCK_NAME):
        ch = mcp("strategy_get_clearinghouse_state", strategy_wallet=WALLET)
        markets = mcp("leaderboard_get_markets", limit=100)
        # ... scoring logic ...
        if signal_ready:
            client.push_signal(
                address=WALLET, scanner=SCANNER_NAME,
                asset="BTC", direction="LONG",
                score=0.85,
                signal_type="MOMENTUM",
                data={"funding_bps": 18, "sm_pct": 22.5},
            )

if __name__ == "__main__":
    producer_daemon(
        fn=run_one_tick,
        interval_seconds=300,
        name=LOCK_NAME,
        wallet=WALLET,
        scanner=SCANNER_NAME,
    )

That’s the entire surface area. The five primitives behind it are where the engineering lives.

SenpiClient — direct HTTPS, no mcporter subprocess

In Runtime 1.0, every MCP call spawned a mcporter subprocess — a 6-process tree per invocation. A producer that called 8 MCP tools per tick spawned 48 processes per tick. At 5-minute intervals across a fleet of 30 agents, that’s tens of thousands of subprocess invocations per hour.

SenpiClient is a persistent in-process HTTPS client. Same MCP calls, no subprocess startup, no JSON marshaling overhead, no race conditions between concurrent forks. A single MCP call dropped from ~800ms to under 50ms. Tick-level latency dropped from 5–8 seconds (the headline “30× faster” number from last week’s announcement) to under 300ms.

scanner_lock — fcntl + PID-aliveness, no fork-storm

Every producer tick wraps in scanner_lock(LOCK_NAME). This is an fcntl.flock exclusive lock with one critical addition: it records the holder’s PID in a metadata file alongside the lock.

If a previous tick crashed mid-execution (kill -9, OOM, container restart), the lock file persists but the kernel auto-released the flock. The next tick acquires the lock cleanly. If the previous tick is still running (long-running scan, slow MCP call), the new tick fails with BlockingIOError and skips — preventing the classic “same trade entered twice because two parallel ticks raced” bug that cost real money in Runtime 1.0.

# From senpi_runtime_helpers/lock.py
"""scanner_lock — fcntl lock with PID-aliveness stale recovery.

The unlink-then-open pattern is deliberately not used: it would create a
new inode and let two callers flock different inodes for the same path.
"""

That second sentence is the kind of bug that takes a real-money loss to learn. We learned it once.

tick_cache — per-tick memoization

Most producers call the same MCP tool multiple times per tick — strategy_get_clearinghouse_state for position context, leaderboard_get_markets for universe screening, market_get_funding_regime for funding context. Naive code calls each tool fresh every time it needs the data.

tick_cache(client) wraps the client with TTL memoization scoped to a single tick. A producer that calls strategy_get_clearinghouse_state four times during one tick pays for one MCP call, not four. Combined with parallel() for fan-out across asset universes, an 80-asset scoring loop that used to take 12 seconds now completes in 1.4.

producer_daemon — long-lived loop replaces openclaw cron

The old pattern was openclaw cron add + agentTurn — every tick spawned an LLM inference just to dispatch a Python script. Cron + LLM + agent boot ≈ 8 seconds of latency and a paid token bill per tick.

producer_daemon(fn, interval_seconds, ...) is a plain Python loop. One process per producer, lives across ticks, fires run_one_tick() directly. No LLM in the dispatch path. Per-tick wall-clock timeout via SIGALRM. SIGTERM/SIGINT trigger a graceful drain (current tick finishes, then loop exits). Built-in alive_check — daemon self-terminates if the runtime is deleted or the scanner is renamed, preventing “ghost producer” bugs where a strategy was un-installed but its producer kept pushing signals into a runtime that no longer cared.

This single change is the source of the 80% lower token costs figure. We stopped paying for an LLM inference to dispatch a Python script.

Layer 2: The Signal Schema — strict envelope, fail-fast validation

When a producer pushes a signal, it hits the runtime’s /signals endpoint with a strictly-validated envelope. The schema is additionalProperties: false — unknown fields are rejected with INVALID_REQUEST. No silent passthrough, no “well the field was in there but downstream ignored it.”

The envelope splits routing fields (top-level) from scanner-specific payload (data block):

external_scanners:
-name: my_signals
config:
fields:
funding_bps:{type: number,required:true}
sm_pct:{type: number,required:true}
rank_velocity:{type: number,required:false}

This is mundane until you’ve shipped twelve agents whose producers all emit slightly different signal shapes. Then it becomes the difference between “we can read every signal across the fleet with one parser” and “every audit query requires per-agent code.” The strict schema is what makes the learning corpus (Layer 6) tractable.

There is one footgun the SDK exists partly to prevent: putting asset or direction inside data instead of at the top level. The runtime stores the two locations independently. Downstream consumers read inconsistently. The Pangolin TST incident on 2026-05-05 was a real-money loss caused by exactly this bug. The SDK now makes the right pattern the path of least resistance.

Layer 3: The LLM Gate — the only place an LLM appears in the trade flow

Every signal that reaches the runtime is evaluated by an LLM gate before it becomes a trade. The gate has access to:

  • The signal envelope (asset, direction, score, scanner reasoning)

  • Current portfolio state (open positions, today’s PnL, recent trades)

  • Market context (funding regime, recent volatility)

  • The agent’s running risk gate status (any cooldowns active, any caps approached)

The LLM produces a structured decision: approve, reject, or request more context. That’s it. The LLM does not place the order. It does not calculate the size. It does not set the stop-loss. It does not handle the exit. It approves a decision that the runtime then executes deterministically.

This is the design choice that makes Senpi agents fast in production. The LLM isn’t in the critical path of execution — only in the critical path of judgment. A signal can fire, gate-approve, and have a market order on Hyperliquid in under 2 seconds. Pure-LLM trading agents take 30-90 seconds for the same path.

It’s also the design choice that lets us use cheaper models without losing performance. The LLM gate is a bounded decision (approve/reject with reasoning), not an open-ended planning loop. Our Round 3 model bake-off showed Gemma 4 31B hitting 74% on the same scenarios where Opus 4.7 hit 91% — for an 11× cheaper inference cost. The harness lets us choose the right model per task instead of paying frontier rates for every operation.

Layer 4: The Risk Gates — five gates, fail-closed, real-time evaluated

Risk in Runtime 1.0 was Python code each strategy author wrote themselves. Twelve agents meant twelve interpretations of “daily loss limit.” One of them had a bug where the limit was tracking the wrong asset. Another silently failed on a network error and kept trading.

Runtime 1.1.0 consolidates risk into five gates, evaluated by the runtime engine before every open. Closes are not gated — protective exits always run.

Image

The configuration lives in the strategy’s risk.guard_rails block:

risk:
guard_rails:
daily_loss_limit_pct:4
max_entries_per_day:6
bypass_max_entries_per_day_on_profit:false
max_consecutive_losses:3
cooldown_minutes:90
drawdown_halt_pct:20
per_asset_cooldown_minutes:45

Three engineering choices behind this layer matter:

1. Fail-closed. If any MCP call backing a risk evaluation fails (network error, timeout, missing snapshot), halt-class gates return CLOSED and asset-specific checks return COOLDOWN. Trading is suspended whenever risk state is unknown. There is no permissive fallback. The cost of accidentally pausing for two minutes is much lower than the cost of accidentally trading with no risk visibility.

2. Real-time, not cached. Every checkGate() invocation fetches fresh data. No background polling, no cached verdicts that go stale. A daily-loss gate that was OPEN at 12:00:00 is re-evaluated at 12:00:01.

3. Priority-ordered. CLOSED > COOLDOWN > OPEN. In default mode the runtime short-circuits on the first non-OPEN verdict — no point evaluating per-asset cooldowns if daily loss has already halted trading.

Every gate evaluation appends a structured JSONL audit entry. The audit trail isn’t optional; it’s the substrate for the learning corpus.

Layer 5: The DSL Exit Engine — deterministic two-phase trailing stops

The DSL (Dynamic Stop-Loss) is the part of the runtime we’ve iterated on most. Six months of live trading produced a two-phase design that protects from immediate loss in Phase 1 and locks in profits as they accumulate in Phase 2.

Phase 1: Initial Defense

Active from position open until the first profit tier is triggered. Two floors run simultaneously, and the stricter one wins:

  • Absolute loss floor — derived from max_loss_pct. The position can never lose more than this percentage of margin. The runtime places a stop-loss order on the exchange at this exact level. If price wicks past this level faster than the runtime’s polling can react, the exchange-side SL closes the position automatically with reason exchange_sl_hit.

  • Trailing retrace floor — derived from retrace_threshold. Tracks the running high-water mark and ratchets the floor up at retrace_threshold% ROE below the high. As the position gains, the floor tightens.

A subtle but critical detail: the runtime uses consecutive breach counting with a configurable tolerance (consecutive_breaches_required, typically 3). A single wick below the floor doesn’t trigger an exit; three consecutive ticks below the floor does. This filters out the wick-and-recover patterns that previously cost us real money to bad exits.

Phase 2: Profit Lock

Active from when the first tier’s trigger_pct is crossed. Each tier defines a lock_hw_pct — the percentage of high-water ROE to lock in as the trailing floor.

Entry: $100, 10× leverage, Tier 1: trigger_pct=10, lock_hw_pct=40

Position hits ROE +10% → Tier 1 activates → Phase 2 begins
  high_water_roe = 10% → floor_roe = 10 × 0.40 = 4% → floor at $100.40
  Exchange SL placed at $100.40

Position climbs → high_water_roe = 18%
  floor_roe = 18 × 0.40 = 7.2% → floor at $100.72
  Exchange SL updated to $100.72

Tier 2: trigger_pct=20, lock_hw_pct=70 activates
  high_water_roe = 22% → floor_roe = 22 × 0.70 = 15.4% → floor at $101.54
  Exchange SL updated to $101.54

Price falls to $101.54 → exchange SL executes
  Close, reason: exchange_sl_hit

The floor only moves up. When price makes a new high, the floor tightens. When price retraces, the floor stays fixed.

The Phase 2 exit mechanism is entirely exchange-driven. The runtime’s role is only to keep the SL order updated as the floor ratchets. This is intentional: an exchange-resident SL fires even if the runtime is mid-restart, mid-deploy, mid-anything. The protective path doesn’t depend on the runtime being awake at the millisecond price hits the floor.

Fee-optimized exits

Every exit can now use FEE_OPTIMIZED_LIMIT orders — maker-first execution that falls back to taker only if the maker order doesn’t fill within a configurable timeout.

exit:
engine: dsl
interval_seconds:30
order_type: FEE_OPTIMIZED_LIMIT
fee_optimized_limit_options:
ensure_execution_as_taker:true
execution_timeout_seconds:15
dsl_preset:
    ...

Hyperliquid charges ~0.02% maker fee vs ~0.05% taker — a 2.5× spread. For an agent doing 30 round trips per week at $5K notional each, that’s $30/week of fees instead of $75/week, every week, with no thesis change. Multiply across the fleet and you get the 75% cheaper trades headline from last week.

In Runtime 1.0, every exit was taker. Strategies with good signals were bleeding to fees and didn’t know it until we built the telemetry that surfaced it.

Layer 6: The Telemetry & Learning Layer — every trade is a training example

This is the layer that compounds.

Every signal the producer emits, every gate evaluation the runtime runs, every entry the LLM gate approves, every DSL tick, every exit, every fee paid — all of it is captured as structured JSONL telemetry, queryable via audit_query on the Senpi MCP. Each entry includes ai_reasoning — the structured thought process the LLM gate used to make the decision.

A sample audit entry from a live Cheetah trade:

score 11 CHEETAH SOL LONG —
  SM_STRONG 12.7%/179t |
  VELOCITY 15m=1.76/1h=1.26 |
  ACCEL 15m(1.76)>1h(1.26) |
  QUALITY_ALIGN 2_traders

This isn’t an explanation written for humans. It’s a structured reasoning record the next agent can read. Cheetah can read its own trade chain to identify which signal components correlate with winners. Grizzly can read Cheetah’s reasoning to learn what BTC patterns Cheetah’s pattern detector finds. Every agent has access to the reasoning behind every decision every other agent has made.

This is what we mean by “the runtime is the moat.” A competitor can copy our scanner code, our YAML schema, even our DSL configuration. They can’t copy 24 months of compounded reasoning data across 75 live agents trading real money on Hyperliquid. That corpus is structurally impossible to replicate without spending the same six months in the market with the same exposure.

It is also the training data for the Senpi Model — the fine-tuned open-source model that will replace generic frontier LLMs as the gate inference engine over the next 90 days. The harness produces the corpus; the corpus trains the model; the model improves the harness; the cycle compounds.


Layer 7: Liveness & Operator Visibility — runtime ≠ operating

A runtime can report status: running while every component inside it is silent. A scanner can be registered but never scheduled. A producer can be installed but never push a signal. An LLM gate can be wired but never invoked because no signal ever reaches it.

Runtime 1.1.0 ships explicit liveness verification through senpi-helpers — a CLI that reads the daemon’s self-describing state files (pid.json, boot.json, heartbeat.json) and surfaces per-component health.

$ senpi-helpers list
NAME         PID      RUNNING   WALLET            SCANNER          TICKS  ERRORS  LAST_TICK
grizzly      1149375  true      0xabc…1234        grizzly_signals  847    0       2026-05-15T18:42:11Z
spider       1150299  true      0xdef…5678        spider_signals   1204   2       2026-05-15T18:42:08Z
kestrel      —        false     0x618b…daac2      kestrel_signals  0      0       —

Every component declares its expected behavior, exposes structured counters (tickSuccessCount, lastTickFinishedAt, consecutiveErrorCount), and surfaces failure signatures the moment they happen. Old agents would crash, hang, or quietly stop trading with no warning — operators would find out days later when they checked the P&L. New agents fail loudly within minutes.

There is one engineering rule the runtime enforces: “running” ≠ “operating.” A daemon that boots successfully but never completes a tick is not operating. The CLI distinguishes between the two. So does the runtime’s own internal health check. So do the alerts.

This is mundane infrastructure work that nobody puts on a slide. It is also the difference between an agent fleet you can trust with real capital and one you can’t.

The runtime as a moat

Six months of live trading. Real losses. Every failure documented and fixed. Twelve homegrown risk implementations replaced by one battle-tested layer. 5-second LLM scan loops replaced by 300ms deterministic scans. Three-times-expensive taker exits replaced by maker-first execution. Silent crashes replaced by structured liveness verification. Ad-hoc audit trails replaced by a queryable corpus that compounds across the fleet.

The result, in numbers from last week’s announcement: 30× faster, 75% cheaper, 80% lower token costs, agents that learn from every move.

The result, structurally: the runtime is the moat — today. The full harness, coming next, is what compounds it.

Every Senpi agent shipped after May 12, 2026 inherits this entire stack from day one. Twelve strategy authors no longer rebuild the same five primitives twelve different ways. The fee optimization isn’t optional — it’s the default. The risk gates aren’t strategy-author Python code — they’re runtime-enforced. The audit trail isn’t a logging best practice — it’s the substrate of the learning loop.

The architectural commitment behind all of it is simple: separate the deterministic from the judgmental, then run each one on the substrate it deserves. Code for execution. Models for decisions. Runtime for both.

What’s next: the next 90 days

What’s described above is what’s live today — Runtime 1.1.0, the hardened execution stack, the Producer SDK, the risk gates, the DSL engine, the learning telemetry. It’s the foundation. It is not the destination.

The destination is the full Senpi Harness: runtime + native deployment on

senpi.ai

+ native chat surface + Senpi Model fine-tuned on our trade-chain corpus + compounding learning loops, integrated into one consumer product. The runtime makes the agents work. The harness makes the product win.

Three workstreams are in flight right now that turn the runtime into the harness.

1. Bring the entire system in-house

Today, deploying a Senpi agent means deploying to Railway (third-party hosted), chatting via Telegram (third-party hosted), and bringing your own LLM key (third-party model, third-party billing). It works — but it’s a workflow built for early adopters comfortable with command-line ops.

The next iteration brings every layer in-house. Deployment moves to

senpi.ai

native — 1-click deploy instead of Railway provisioning. Chat moves to

senpi.ai

native — same conversation surface on web and mobile, no Telegram detour. The current external dependencies become history.

Status: In build. Ships June 1 as Senpi Personal Agents v1.

2. Own-model vs. third-party — the Senpi Model

The runtime’s LLM gate today calls out to Gemini, Claude, GPT, or whichever provider the operator configured. It works — but it’s also where the unit economics fight the product. Frontier model inference at scale dominates the cost stack. And no commodity model is specifically trained on the reasoning patterns the Senpi fleet has generated.

We’re fine-tuning our own. Open-source base (the bake-off across nine candidates is in its final round), trained on the Predator corpus — every trade, every reasoning chain, every outcome the fleet has produced over six months of live trading. The Senpi Model becomes the default gate inference engine, replacing the third-party provider line on every agent.

Why this matters beyond cost: the Senpi Model is the only model in the world trained specifically on Hyperliquid trading reasoning. Every agent that runs on it gets reasoning quality no generic frontier model can match.

Status: In test/design. Ships June (TBD) as the Senpi Model.

3. Learning agents wired into the runtime

The telemetry layer (Layer 6 above) produces the corpus. The next step is closing the loop: agents that actively read their own history and the fleet’s history to propose their own improvements.

Three components:

  • Self-retrospection — every agent reads its own complete trade chain weekly and proposes its own config changes

  • Cross-agent reasoning — agents read each other’s reasoning, not just outcomes. Wolverine learns what Kodiak’s pattern detector finds. Grizzly learns Cheetah’s BTC patterns.

  • Outcomes loop — every proposed change is measured against live outcomes and fed back into system calibration. Changes that improve PnL stay. Changes that don’t get reverted automatically.

Status: In test. Wired into Runtime 2.0.

The compounding flywheel

This is where the runtime turns from an infra advantage into a structural moat.

              More users deploy agents
                       ↓
         Every trade generates telemetry
        (reasoning · outcome · decision chain)
                       ↓
          Telemetry trains the Senpi Model
        (fine-tuning + reinforcement on real outcomes)
                       ↓
            Smarter model → smarter agents
        (every agent in the fleet upgrades simultaneously)
                       ↓
                 Users win more
       (better PnL, more retention, more referrals)
                       ↓
              [back to: more users]

Every cycle compounds. 1× becomes 10× becomes 100×.

Dig deeper

  • 🛠 GitHub: senpi-trading-runtime — full source, references, and the senpi_runtime_helpers Python Producer SDK

🥷 Launch your Hyperliquid agent: senpi.ai