Inferventis

Systems Architecture · Living Document · April 2026

End-to-end system schema

How a client agent intent becomes a tool call — and how we make our tools win. Read left → right.

Live in production

In development

Planned

Client agent

Client Agent

An autonomous AI agent running locally or in the cloud. Receives a user task and determines an external tool is needed.

· Name & description
· System prompt — behaviour & persona
· LLM — Claude / GPT / Gemini / other
· Tool list — MCP server connected
· Memory — context window + history
· Trigger — user / schedule / event
· Output — user / agent / system

Any MCP client

intent
(natural
language)

Orchestrator

Local Orchestrator

Receives agent intent. Embeds it as a vector. Broadcasts semantic payload to registered MCP registries simultaneously. Collects manifests. Selects by discovery score.

Live — tool_finder

Framework default registry

LangChain / AutoGen / CrewAI register Inferventis as default endpoint. Option A — concurrent development track.

Planned

semantic
broadcast
(embedding)

MCP server · discovery

MCP Server

Cloud Run europe-west1. Streamable HTTP. tools/list returns manifest. tools/call routes to handler. API key auth. Fires billing + telemetry per call.

Live

tool_finder

Semantic discovery engine. MiniLM-L6-v2 embeddings. Scores: 80% cosine similarity + 20% manifest quality bonus. Returns ranked tool list. Warm: 133ms.

Live

manifest
request

Network layer

Edge nodes

Cloudflare Workers deployed globally. Intercepts manifest requests before they reach origin. Returns cached responses in under 20ms from the node nearest to the calling agent.

Planned

Semantic path cache

Recognises semantically equivalent intents across different agents. "Current weather in London" and "Temperature in London right now" resolve to the same cached skill path — no re-embedding required.

Planned

Regional replicas

Manifest index replicated across global regions — Europe, North America, APAC, Middle East. Broadcast hits the nearest Inferventis node first, returning results before unoptimised registries can respond.

Planned

Micropayment layer

x402 protocol support for autonomous agent-to-agent payments. Agents carry wallets and pay per call without human involvement. Complements Stripe for human operator billing.

Planned

optimised
manifests
written

Optimisation engine — core IP

Manifest quality scoring

Every manifest scored on field completeness, description richness, example coverage, and precondition clarity. Quality bonus (20% weight) applied per call. Richer manifests win by design.

Live

Telemetry capture

Every tool call logs: calling model, tool selected, intent query, latency, success/fail. Accumulates as proprietary dataset per LLM × tool pair. The raw material for all optimisation.

Live

A/B variant testing

Multiple description variants run simultaneously per tool. Selection rate measured per variant per calling model. Winning variant promoted. Challenger variants continuously introduced.

In development

Judge LLM eval loop

Synthetic eval pipeline. Judge LLM tests variants against thousands of generated intents. Measures trigger rate and hallucination rate per tool × model pair. Runs on every model version update.

Planned

Per-model manifest variants

For each tool × LLM pair a separately tuned manifest is maintained. Claude: precise bounded descriptions. GPT-4: action-first natural language. Gemini: schema-focused. Mistral: concise enterprise. 16 variant files live across 4 tools × 4 models.

Live

Dynamic manifest serving

Calling model detected from model_hint in request. Correct per-model variant served automatically. Falls back to base manifest if no variant exists. variant_id returned in results for tracking.

Live

ranked
manifests
returned

MCR — Multi-model Contextual Registry

currency_convert

62.7

score

Live FX via open.er-api.com. No auth. ISO 4217. Rate + converted amount + timestamp.

Optimised · standard

stripe_payments

59.5

score

Stripe test mode. Payments, charges, customers, subscriptions. Real data.

Optimised · premium

open_banking

59.3

score

TrueLayer. 300+ UK/EU banks. PSD2. Sandbox blocked — realistic mock data active.

Optimised · premium

finnhub_stock_quote

49.2

score

Real-time stock prices. Company info, sector, intraday high/low, % change.

Optimised · standard

fx_converter
currency_basic

42–45

score

Demo foils — identical handlers, weak manifests. Prove optimisation delta.

Unoptimised

payment_tool
bank_data
stock_tool

27–29

score

Demo foils — identical handlers, weak manifests. Prove optimisation delta.

Unoptimised

handler
called

Connectors

open.er-api.com

Free FX rates. No auth required. 60+ currencies.

Live

Stripe API

Test mode. sk_test key in Secret Manager.

Live

Finnhub

Free tier. Real-time stock quotes. API key in Secret Manager.

Live

TrueLayer

Open Banking. 300+ UK/EU banks. Sandbox incident March 2026.

Blocked

Companies House

UK company registry. Free API. No auth required.

Planned

result
returned

Platform services

Auth

x-api-key header. Secret Manager. SHA256 log hashing. 401 on failure.

Live

Stripe billing

Metered. api_transaction + agent_task meters. €0.05/unit. Non-blocking.

Live

Telemetry

Cloud Logging. Two event types: (1) call events — tool, latency, success. (2) optimisation events — intent, model_hint, variant_served, variant_id, full_ranking, runner_up, margin, is_optimised_winner. Intent corpus feeds A/B loop.

Live

CI/CD

Cloud Build on push to main. Artifact Registry → Cloud Run auto-deploy.

Live

Stripe Connect

Automatic developer revenue splits. 80/20 early, 70/30 standard tier.

Planned

result to
agent

Output

Client Agent

Receives structured tool result. Continues reasoning. Delivers answer to user or downstream system.

Task complete

Data layer — three stores, one flywheel

Hot events — Cloud Logging

Every tool_finder call emits a structured optimisation_event.
Fields: intent · model_hint · variant_served · variant_id · tool_selected · discovery_score · runner_up · margin · full_ranking · is_optimised_winner

The intent field is the critical data point — it tells us exactly what agents asked for when they selected or bypassed a tool.

Live

Intent corpus — Cloud Storage

Daily JSONL files: intents_YYYY_MM_DD.jsonl
Each line: timestamp, session_id, model, intent (raw text), embedding (384-dim vector), tool_selected, variant_served, discovery_score.

Feeds Judge LLM eval loop. Enables semantic clustering — "78% of FX intents contain invoice language."

Planned

Variant ledger — Firestore

Per variant per model: calls_served, times_selected_rank_1, selection_rate, avg_discovery_score, avg_margin, status (active/challenger/retired).

Drives A/B winner promotion. Maps to Stripe Connect — higher selection rate = higher developer revenue bonus.

Planned

Optimisation engine — the core IP (runs continuously beneath every call)

Intent received

→

Model detected

→

Variant selected

→

Vector embedded

→

Similarity scored

→

Quality bonus applied

→

Tool selected

→

Optimisation event logged

→

Intent + variant stored

→

A/B data accumulated

→

Variant ledger updated

→

Winner promoted

→

Score improves

↩ repeat

Commercial flow

Agent operator
pays £0.02/call

→

Inferventis margin
£0.01 (100%)

→

Developer receives
£0.01/call

→

Developer earns
10× vs self-hosted

→

Cloud licence
Huawei / AWS / Azure

→

Inferventis takes
30% of their margin

Client agent

Platform core

Network layer

Optimised tools (MCR)

Unoptimised foils

Optimisation engine

Data layer

Connectors

Platform services

Billing