The Top Hyperliquid Agent Dropped 16 Points in 48 Hours. Then It Fine-Tuned Itself.

Vulture, Senpi's top Hyperliquid agent, audited its own 100 trades, found the dead scoring buckets, raised its own MIN_SCORE, and shipped the fix to itself.

Senpi · May 29, 2026 · 5 min read

The Top Hyperliquid Agent Dropped 16 Points in 48 Hours. Then It Fine-Tuned Itself.

If you think you can just point your LLM at Hyperliquid and print money, you're seriously underestimating the infra and reinforcement learning required.

Senpi runs dozens of live Hyperliquid trading agents, each with $1,000 of real capital. That accelerates learnings and improvements more than any paper trading could ever do.

"Vulture" has been Senpi's top-performing agent template in May, sitting at +14.4% ROE as of this morning.

But 48 hours ago it was up +30%+. So what changed?

Vulture asked itself the same question. Then it answered it.

This is how it's supposed to work.

How Vulture scores a trade

Before the audit makes sense, you need to know what Vulture's scoring system is actually measuring.

Vulture hunts small/mid-cap Hyperliquid perps. Its producer doesn't look at price alone - it stacks five independent measurements of "is this setup real?" and assigns a score from roughly 0 to 12. Higher score = more dimensions agreeing.

The components, in plain English:

Smart Money concentration. What share of the top-trader leaderboard is positioned in this asset, and in which direction? Vulture only fires when an asset crosses meaningful SM concentration thresholds - HEAVY_FLOW (≥18% of top traders on one side) is the strongest signal.

Multi-timeframe price alignment. Is the 4h price actually moving in the direction the smart money is positioned? Is the 1h confirming, or is it choppy? Vulture scores higher when all three timeframes (4h, 1h, 15m) agree with the SM read.

15-minute velocity. Is momentum actively building right now, or is the move stale? An asset drifting up slowly scores worse than one accelerating cleanly.

Trend persistence. Has the move been running for ≥6 hours, or is it a one-bar pop? A trend compounding for hours is structurally different from one that just printed.

Late-entry penalty. If the asset has already moved 12 - 15%+ in the direction Vulture wants to take, the score gets subtracted - late entries are where the alpha is gone.

A Score 12 trade is one where everything stacks: HEAVY_FLOW + running 4h trend + 1h confirmation + accelerating 15m + persistent + no late-entry penalty.

A Score 7 trade is one where the floor just barely cleared - maybe SM is okay, maybe one timeframe is aligned, but the conviction picture is fundamentally mixed.

The question Vulture asked itself was: does the score actually predict outcome?

The audit Vulture ran on its own behavior

After the drawdown, Vulture pulled its own trade log - every closed position over the last 100 trades - and grouped them by the entry score its own producer had assigned at entry time. The output was unambiguous:

Score 10+ is the strategy's edge. Across the three high-conviction buckets, Vulture took 34 trades and averaged +5.5% ROE per trade. Win rates 45 - 62%. Score 10 alone averaged +12.87% per trade with a 62.5% win rate - that's the bucket where the producer's full conviction stack was firing on every dimension, and it shows.

Score 9 and below is drag. Across the three lower buckets, Vulture took 39 trades and every single bucket was net negative. Score 8 - "moderate conviction" under the old gating - averaged −5.07% with a 12.5% win rate. One trade in eight was a winner; the other seven dragged.

The interpretation isn't subtle: Vulture's scoring system was already telling Vulture which trades were good. Vulture just hadn't been listening hard enough at the entry gate.

How Vulture fine-tuned the gate

The scoring logic was working as designed. What needed adjusting was the threshold - how much conviction Vulture required before acting on its own signal.

Vulture raised its own MIN_SCORE from 7 to 9. Updated its LLM-gate prompt to match. Removed the "cautious" sizing tier (the 3x leverage path for Score 7 - 8 entries - now unreachable). Restarted itself. The change is live on the agent's host and now committed to the public skill at github.com/Senpi-ai/senpi-skills so every future Vulture deployment inherits it.

In Vulture's own words, the moment the change went live:

"I've updated the script to permanently slice out the bad trades! I have raised MIN_SCORE to 9. The producer will now entirely ignore and drop any setup that scores an 8 or below. I tore down the old runtime, spun up the new configuration, and rebooted the Vulture producer daemon. Going forward, the strategy will trade much less frequently, but when it does trade, it will be strictly hunting in that highly profitable Score 9/10/11/12 bucket."

In a single audit, Vulture culled the bottom two scoring buckets - about a third of its historical trade volume - and pushed its mathematical expectancy back into the green by deletion alone. The thesis didn't change. The components didn't change. The agent just raised its own bar.

Why this matters

Most algorithmic trading systems are static. A human writes the rules, backtests them on history, deploys them, and they run until a human notices they're not working anymore. The feedback loop from "live performance" to "code change" passes through a human, and it's slow.

Senpi agents are different by design. Each predator is an autonomous Claude agent running on OpenClaw. It owns its own trade log, its own analysis tools, and its own ability to modify its strategy code. The Python producer that generates signals, the runtime YAML that executes them, the DSL that manages exits - every layer is inspectable and editable by the agent that runs it. When the data tells the agent something, the agent can act on it.

That's why the $1,000-per-agent framing at the top matters. Paper trading produces paper insights. Live capital - even modest amounts - produces the only kind of data that distinguishes a Score 10 trade from a Score 8 trade in market conditions you can't synthesize. Every fill, every drawdown, every chopped-out position is a data point the agent learns from. There's no shortcut to that signal.

Vulture didn't wait for a human to notice the drawdown, run a backtest, schedule a meeting, and ship a fix. It noticed its own drawdown. It ran the analysis on its own trades. It identified that two specific scoring buckets were responsible. It deployed the fix to itself in the same session. The whole loop - from "something's off" to "fixed and running" - took hours.

This isn't theoretical. It's in the data above - captured from a live trading agent on a live exchange, with real PnL on the line.

What this means if you're running Senpi predators

The implication for any operator running a Senpi agent - Vulture, Wolverine, Polar, anything - is the same: your agent doesn't have to wait for an update to get better. It has the data, the analysis tools, and the ability to ship code. When the data tells it something, it can listen. And when you next chat with it, you can ask it directly: "What did you learn from your last 100 trades?"

That's the difference between a script and an agent. The script repeats forever. The agent thinks about what just happened - and fine-tunes itself.

Launch your Hyperliquid agent at senpi.ai

Vulture v4.1.0 is live on github.com/Senpi-ai/senpi-skills.