Solved Systems - AI Workflow Automation for Growing Businesses

The Problem: Death by Documentation

If you're building AI agents that talk to APIs, you've probably run into the same wall we did: token bloat. Every tool call returns massive JSON payloads that flood your context window, burn through your budget, and slow everything down. We fixed it by fundamentally rethinking how our MCP server handles API interactions — shifting from traditional tool calling to server-side code execution. The results were dramatic.

PatchOps is our unified MCP server that gives AI agents access to energy industry APIs — Corva for drilling operations, Enverus for market intelligence, WellDatabase for well data, GeoForce for device tracking, and Microsoft 365 tools like Outlook, Teams, and Planner.

In the original architecture, every MCP tool call followed the standard pattern: Claude sends a request, the server calls the API, and the full response gets stuffed back into the context window. Simple enough. Except it was hemorrhaging tokens.

The worst offender? Every single PatchOps response included an "LLM Connector Guide" — documentation for all connectors — regardless of which API you were actually querying. Ask for one Outlook email? Here's 8,000+ tokens of documentation for WellDatabase, GeoForce, Corva, Enverus, and Snowflake too.

Wake-up call: In one real conversation, retrieving a few emails consumed 68,000 tokens. Only about 2,000 of those were actual useful data. That's a 2.9% signal-to-noise ratio.

A single GeoForce query returning 118 devices would dump ~47KB of raw JSON into the context — roughly 12,000 tokens of coordinates, battery statuses, timestamps, and metadata that Claude then had to parse through just to answer "how many active devices do I have?"

The Fix: Let the Server Do the Work

Instead of returning raw API responses for Claude to digest, we moved to a code execution model. Claude writes a small JavaScript snippet describing what it wants, the PatchOps server executes it against the API, and only the processed result comes back.

Before: raw API payloads flood the context. After: server-side execution returns only the answer.

Before:

Claude → "list all GeoForce devices" → PatchOps → GeoForce API → 47KB raw JSON → Claude context
Result: ~12,000 tokens consumed

After:

const devices = await geoforce.listDevices();
return {
  total: devices.length,
  active: devices.filter(d => d.state === 'ACTIVE').length
};
// Result: ~150 tokens consumed

The AI generates targeted code that fetches, filters, and aggregates on the server side. Only the answer comes back — not the raw material.

The Numbers

We ran extensive benchmarks comparing both approaches across real workloads.

98.75% GeoForce reduction

97.5% Outlook reduction

39x Efficiency gain

$512/yr Cost savings (50 users)

Token Reduction by Scenario

Scenario	Tool Calling	Code Execution	Reduction
GeoForce: 118 devices	12,000 tokens	150 tokens	98.75%
Outlook: Email retrieval	68,000 tokens	1,726 tokens	97.5%
Corva: Rig list	8,000 tokens	1,107 tokens	86%
Enverus: Basin query	~8,000 tokens	1,361 tokens	83%
Multi-API dashboard	24,000 tokens	300 tokens	98.75%

The GeoForce case is the clearest illustration. A query that previously returned 47KB of device telemetry now returns a 464-byte summary. That's a 100x reduction in context size.

For the Outlook case, the improvement was even more striking in absolute terms. We went from 68,000 tokens — where 48,000 were wasted on connector documentation repeated six times — down to 1,726 tokens of actual email content. A 39x improvement in efficiency.

Cost Impact

At Claude API pricing ($0.80/1M input tokens), the per-call savings look small but scale fast:

Per call: $0.0096 → $0.00012 (99% reduction)
Monthly (30 calls/user): $0.288 → $0.0036
Annual across 3 APIs, 50 users: ~$512/year saved

These numbers get more interesting when you factor in that the reduced context also means faster responses and fewer cases where conversations hit context limits — which means fewer retries and restarts.

Speed Tradeoff

There is one tradeoff: latency. Direct tool calls return in ~100-120ms. Code execution takes 6-7 seconds because it includes runtime boot, API call, and server-side processing.

Method	Latency	Tokens	Best For
Direct tool call	~120ms	12,000	Single lookups, real-time dashboards
Code execution	~6,000ms	150	Bulk operations, analytics, reports

For a single device lookup where a user is waiting, 120ms wins. For pulling a fleet summary, generating a report, or orchestrating across multiple APIs, the 6-second wait is negligible compared to the 98% token savings.

The Hybrid Approach

We didn't go all-in on either method. PatchOps now uses a hybrid strategy:

Code execution for bulk data, aggregations, multi-API orchestration, filtering, and any operation where you don't need every field from every record. This is the default path for analytics-style queries.

Direct tool calling for single-item lookups, time-critical operations, and cases where complete raw data is needed for downstream processing.

Routing heuristic: If the query smells like "get me this specific thing," use a direct call. If it smells like "tell me about all the things," use code execution.

What Actually Changed Architecturally

The shift required three things:

Wrapping each API connector as an executable context. Each connector (Corva, Enverus, GeoForce, Outlook, etc.) is available as a pre-authenticated client inside the code execution sandbox. Claude doesn't need to know about auth tokens, endpoints, or pagination — it just calls await corva.getRigs().
Eliminating the connector guide from responses. The biggest single win was removing the 8,000-token documentation payload that shipped with every response. Connector method signatures are now discovered through a lightweight getConnectorDocs call (~50 tokens) that only returns the relevant connector.
Server-side JavaScript execution via Azure Functions. The code runs in a sandboxed runtime with access to pre-configured API clients. Boot time is ~5ms, runtime initialization ~150ms, and the actual API call typically ~90ms. The bulk of the 6-second latency is data processing for large result sets.

Lessons Learned

Token cost is the real bottleneck, not latency. We initially worried about the 6-second execution time. In practice, nobody cares about 6 seconds when they're asking for a fleet summary or generating a report. But everyone cares when their conversation hits the context limit halfway through a workflow because earlier API calls ate 68,000 tokens returning documentation nobody asked for.

The 35ms MCP protocol overhead is irrelevant. We spent time benchmarking whether calling functions directly vs. through MCP tool protocol made a difference. It's 35ms. Not worth optimizing when your real savings are in the 98% token reduction column.

Useful data ratio matters more than total token count. The Outlook conversation was the wake-up call. 68,000 tokens consumed, 2,000 tokens of useful data. That's a 2.9% efficiency rate. After code execution, we're consistently above 90% useful data in every response.

Let the server aggregate. The instinct with LLM tool use is to bring data to the model and let it reason over raw information. For structured API data, that's backwards. The server can filter, count, and summarize far more efficiently than burning tokens to have the model parse JSON.

Bottom Line

Moving from traditional MCP tool calling to server-side code execution gave us a 95-99% reduction in token consumption across our energy industry API integrations. The architecture is simple: pre-authenticated API clients in a sandboxed runtime, lightweight method discovery, and letting Claude write targeted extraction code instead of drowning in raw payloads.

If you're building MCP servers that wrap data-heavy APIs, the code execution pattern is worth serious consideration. The token savings alone justify the architectural shift — and your users will thank you when their conversations stop hitting context limits mid-workflow.

How We Cut Token Consumption by 98% Moving from Tool Calling to Code Execution

The Problem: Death by Documentation

The Fix: Let the Server Do the Work

The Numbers

Token Reduction by Scenario

Cost Impact

Speed Tradeoff

The Hybrid Approach

What Actually Changed Architecturally

Lessons Learned

Bottom Line

Related Articles

How We Made Keyframe Editing Feel Like Moving a Camera

How We Replaced Hardcoded Video Templates with AI-Generated Layouts

How We Automated 200+ Daily Tasks for an Oil & Gas Company

Ready to automate your operations?