Gateway-level LLM fallback (LiteLLM, Bifrost, Kong AI Gateway) operates on individual HTTP requests. When a request to one provider fails, the gateway retries it against another. This is the right tool when your unit of work is a single completion call.
It is the wrong tool when your unit of work is a multi-step pipeline, because the gateway has no concept of "step 2 of 3." It sees a request, not a position in a state machine.
This post walks through implementing provider fallback as an explicit FSM transition using llm-nano-vm 0.8.6, including two bugs we hit against the real package (not a mock of it).
Problem statement
Three-step pipeline:
collect_application â verify_income â policy_decision
verify_income calls an LLM. The LLM provider can become unavailable mid-pipeline. We want the pipeline to finish â on a different provider â and we want the Receipt (nano-vm's deterministic post-execution artifact) to show exactly what happened.
Mechanism: failure as a TOOL result, not an exception
llm-nano-vm's native LLM step type does not give you a branch point on failure â if the adapter raises, the step is marked FAILED and the trace stops. To get branching, you write the LLM call inside a TOOL step that catches the exception and returns a sentinel value:
async def attempt_llm_step(**kwargs):
step_id = kwargs["step_id"]
try:
result = await _call_adapter(prompt)
return 1 # success sentinel
except ProviderUnavailableError:
return 0 # failure sentinel
The FSM program then branches on that sentinel:
Step(
id="try_s2",
type=StepType.TOOL,
tool="attempt_llm_step",
args={"step_id": "s2_verify"},
output_key="provider_ok",
),
Step(
id="check_s2_result",
type=StepType.CONDITION,
condition="$provider_ok < 1",
then="switch_provider",
otherwise="s3_setup",
),
This is the core mechanism: provider failure becomes a value the FSM evaluates, not an exception the runtime propagates.
Bug #1: ExecutionVM.run is async
Easy to miss if you're skimming the README. vm.run() returns a coroutine, not a Trace. The fix is asyncio.run(vm.run(program, context=...)) at the top level, and async def for any tool function that calls an LLM adapter â ExecutionVM checks inspect.iscoroutinefunction(fn) per-tool and awaits accordingly.
Bug #2: string literals don't work in ASTEngine conditions
Our first version of the condition was:
condition="try_s2.output == 'PROVIDER_FAILED'"
This parses without error. It evaluates to False, always. We confirmed by testing the engine directly:
from nano_vm.vm import eval_condition
ctx = {"try_s2": {"output": "PROVIDER_FAILED"}}
eval_condition("try_s2.output == 'PROVIDER_FAILED'", ctx)
# False
llm-nano-vm's ASTEngine (v0.8.6) supports ==, !=, >, <, in, not_in, and, or, not, contains â but the right-hand side of a comparison must be a number or a $var reference, not a quoted string literal. The working pattern is a numeric sentinel:
condition="$provider_ok < 1"
This is now documented as a hard constraint in the project, not folklore.
The two failure scenarios
python receipt_demo.py --failure-mode retry # degrades over 3 attempts, then switches
python receipt_demo.py --failure-mode hard # fails once, switches immediately
Output for hard:
S2 verify_income
EVENT: ProviderUnavailable (CLAUDE)
ACTION: switch_provider claude â gpt
S3 policy_decision â GPT
RECEIPT:
{
"final_status": "SUCCESS",
"provider_final": "gpt",
"switch_event": "ProviderUnavailable",
"trace_hash": "c6f5c32c..."
}
Why trace_hash is identical across both scenarios
trace_hash is SHA-256 over a Merkle chain of step results. Both retry and hard traverse the exact same FSM path â the retry loop is contained inside the attempt_llm_step TOOL, so the FSM only ever sees one TOOL step result either way. Same path â same hash. This is a property of the construction, not a coincidence to explain away â if the paths ever diverged, the hashes would too.
Current limits
- Fallback chain is a fixed list (
claude â gpt â qwen), not a scored/ranked choice - No active health-check polling â failure is detected only on attempt, unlike Bifrost's stated ~11Ξs overhead active detection
- The demo's
MockAdapterdoesn't call a real provider API; it's deterministic by design so the demo is reproducible without API keys
What this composes with, not replaces
A gateway like LiteLLM still owns model routing, rate limiting, and cost tracking at the HTTP layer. This FSM pattern owns pipeline-state-aware fallback â the question "what was the pipeline doing when the provider died, and did it finish?" The two are different layers, not competing answers to the same question.
Repo: provider-fallback-demo
pip install "llm-nano-vm[litellm]"
python receipt_demo.py --both
Next step: emitting switch_provider as an OpenTelemetry span so it shows up in existing dashboards instead of only in the Receipt JSON.
Top comments (0)