DEV Community

Alex Delov
Alex Delov

Posted on

Stateful provider fallback for LLM pipelines: an FSM pattern

Gateway-level LLM fallback (LiteLLM, Bifrost, Kong AI Gateway) operates on individual HTTP requests. When a request to one provider fails, the gateway retries it against another. This is the right tool when your unit of work is a single completion call.

It is the wrong tool when your unit of work is a multi-step pipeline, because the gateway has no concept of "step 2 of 3." It sees a request, not a position in a state machine.

This post walks through implementing provider fallback as an explicit FSM transition using llm-nano-vm 0.8.6, including two bugs we hit against the real package (not a mock of it).

Problem statement

Three-step pipeline:

collect_application → verify_income → policy_decision
Enter fullscreen mode Exit fullscreen mode

verify_income calls an LLM. The LLM provider can become unavailable mid-pipeline. We want the pipeline to finish — on a different provider — and we want the Receipt (nano-vm's deterministic post-execution artifact) to show exactly what happened.

Mechanism: failure as a TOOL result, not an exception

llm-nano-vm's native LLM step type does not give you a branch point on failure — if the adapter raises, the step is marked FAILED and the trace stops. To get branching, you write the LLM call inside a TOOL step that catches the exception and returns a sentinel value:

async def attempt_llm_step(**kwargs):
    step_id = kwargs["step_id"]
    try:
        result = await _call_adapter(prompt)
        return 1  # success sentinel
    except ProviderUnavailableError:
        return 0  # failure sentinel
Enter fullscreen mode Exit fullscreen mode

The FSM program then branches on that sentinel:

Step(
    id="try_s2",
    type=StepType.TOOL,
    tool="attempt_llm_step",
    args={"step_id": "s2_verify"},
    output_key="provider_ok",
),
Step(
    id="check_s2_result",
    type=StepType.CONDITION,
    condition="$provider_ok < 1",
    then="switch_provider",
    otherwise="s3_setup",
),
Enter fullscreen mode Exit fullscreen mode

This is the core mechanism: provider failure becomes a value the FSM evaluates, not an exception the runtime propagates.

Bug #1: ExecutionVM.run is async

Easy to miss if you're skimming the README. vm.run() returns a coroutine, not a Trace. The fix is asyncio.run(vm.run(program, context=...)) at the top level, and async def for any tool function that calls an LLM adapter — ExecutionVM checks inspect.iscoroutinefunction(fn) per-tool and awaits accordingly.

Bug #2: string literals don't work in ASTEngine conditions

Our first version of the condition was:

condition="try_s2.output == 'PROVIDER_FAILED'"
Enter fullscreen mode Exit fullscreen mode

This parses without error. It evaluates to False, always. We confirmed by testing the engine directly:

from nano_vm.vm import eval_condition
ctx = {"try_s2": {"output": "PROVIDER_FAILED"}}
eval_condition("try_s2.output == 'PROVIDER_FAILED'", ctx)
# False
Enter fullscreen mode Exit fullscreen mode

llm-nano-vm's ASTEngine (v0.8.6) supports ==, !=, >, <, in, not_in, and, or, not, contains — but the right-hand side of a comparison must be a number or a $var reference, not a quoted string literal. The working pattern is a numeric sentinel:

condition="$provider_ok < 1"
Enter fullscreen mode Exit fullscreen mode

This is now documented as a hard constraint in the project, not folklore.

The two failure scenarios

python receipt_demo.py --failure-mode retry   # degrades over 3 attempts, then switches
python receipt_demo.py --failure-mode hard     # fails once, switches immediately
Enter fullscreen mode Exit fullscreen mode

Output for hard:

S2  verify_income
  EVENT: ProviderUnavailable (CLAUDE)
  ACTION: switch_provider  claude → gpt
S3  policy_decision       ✓  GPT

RECEIPT:
{
  "final_status": "SUCCESS",
  "provider_final": "gpt",
  "switch_event": "ProviderUnavailable",
  "trace_hash": "c6f5c32c..."
}
Enter fullscreen mode Exit fullscreen mode

Why trace_hash is identical across both scenarios

trace_hash is SHA-256 over a Merkle chain of step results. Both retry and hard traverse the exact same FSM path — the retry loop is contained inside the attempt_llm_step TOOL, so the FSM only ever sees one TOOL step result either way. Same path → same hash. This is a property of the construction, not a coincidence to explain away — if the paths ever diverged, the hashes would too.

Current limits

  • Fallback chain is a fixed list (claude → gpt → qwen), not a scored/ranked choice
  • No active health-check polling — failure is detected only on attempt, unlike Bifrost's stated ~11Ξs overhead active detection
  • The demo's MockAdapter doesn't call a real provider API; it's deterministic by design so the demo is reproducible without API keys

What this composes with, not replaces

A gateway like LiteLLM still owns model routing, rate limiting, and cost tracking at the HTTP layer. This FSM pattern owns pipeline-state-aware fallback — the question "what was the pipeline doing when the provider died, and did it finish?" The two are different layers, not competing answers to the same question.

Repo: provider-fallback-demo

pip install "llm-nano-vm[litellm]"
python receipt_demo.py --both
Enter fullscreen mode Exit fullscreen mode

Next step: emitting switch_provider as an OpenTelemetry span so it shows up in existing dashboards instead of only in the Receipt JSON.

Top comments (0)