Systems Architecture · Living Document · April 2026
End-to-end system schema
How a client agent intent becomes a tool call — and how we make our tools win. Read left → right.
Live in production
In development
Planned
Client agent
Client Agent
An autonomous AI agent running locally or in the cloud. Receives a user task and determines an external tool is needed.
· Name & description
· System prompt — behaviour & persona
· LLM — Claude / GPT / Gemini / other
· Tool list — MCP server connected
· Memory — context window + history
· Trigger — user / schedule / event
· Output — user / agent / system
Any MCP client
intent (natural language)
Orchestrator
Local Orchestrator
Receives agent intent. Embeds it as a vector. Broadcasts semantic payload to registered MCP registries simultaneously. Collects manifests. Selects by discovery score.
Live — tool_finder
Framework default registry
LangChain / AutoGen / CrewAI register Inferventis as default endpoint. Option A — concurrent development track.
Planned
semantic broadcast (embedding)
MCP server · discovery
MCP Server
Cloud Run europe-west1. Streamable HTTP. tools/list returns manifest. tools/call routes to handler. API key auth. Fires billing + telemetry per call.
Cloudflare Workers deployed globally. Intercepts manifest requests before they reach origin. Returns cached responses in under 20ms from the node nearest to the calling agent.
Planned
Semantic path cache
Recognises semantically equivalent intents across different agents. "Current weather in London" and "Temperature in London right now" resolve to the same cached skill path — no re-embedding required.
Planned
Regional replicas
Manifest index replicated across global regions — Europe, North America, APAC, Middle East. Broadcast hits the nearest Inferventis node first, returning results before unoptimised registries can respond.
Planned
Micropayment layer
x402 protocol support for autonomous agent-to-agent payments. Agents carry wallets and pay per call without human involvement. Complements Stripe for human operator billing.
Planned
optimised manifests written
Optimisation engine — core IP
Manifest quality scoring
Every manifest scored on field completeness, description richness, example coverage, and precondition clarity. Quality bonus (20% weight) applied per call. Richer manifests win by design.
Live
Telemetry capture
Every tool call logs: calling model, tool selected, intent query, latency, success/fail. Accumulates as proprietary dataset per LLM × tool pair. The raw material for all optimisation.
Live
A/B variant testing
Multiple description variants run simultaneously per tool. Selection rate measured per variant per calling model. Winning variant promoted. Challenger variants continuously introduced.
In development
Judge LLM eval loop
Synthetic eval pipeline. Judge LLM tests variants against thousands of generated intents. Measures trigger rate and hallucination rate per tool × model pair. Runs on every model version update.
Planned
Per-model manifest variants
For each tool × LLM pair a separately tuned manifest is maintained. Claude: precise bounded descriptions. GPT-4: action-first natural language. Gemini: schema-focused. Mistral: concise enterprise. 16 variant files live across 4 tools × 4 models.
Live
Dynamic manifest serving
Calling model detected from model_hint in request. Correct per-model variant served automatically. Falls back to base manifest if no variant exists. variant_id returned in results for tracking.
Live
ranked manifests returned
MCR — Multi-model Contextual Registry
currency_convert
62.7
score
Live FX via open.er-api.com. No auth. ISO 4217. Rate + converted amount + timestamp.
Optimised · standard
stripe_payments
59.5
score
Stripe test mode. Payments, charges, customers, subscriptions. Real data.