How We're Using Reinforcement Learning Principles to Build Better Oil & Gas Tools

The models powering AI coding assistants got dramatically better over the last two years. The reason isn't bigger datasets or more parameters — it's reinforcement learning. Give a model a task, let it try, check whether the output actually works, and use that signal to get better. Repeat millions of times.

Code is the ideal domain for this because verification is cheap and unambiguous. Does it compile? Do the tests pass? That binary signal — pass or fail — is all you need to close the loop.

At PatchOps, we're not training models. We're building the tools that models use — MCP connectors that give AI agents access to real-world data across the energy industry. Regulatory filings from the Texas Railroad Commission. Well records from WellDatabase. Production analytics from Corva. Market data from ERCOT, CAISO, and EIA. Environmental monitoring from USGS, EPA, and NOAA. Pipeline intelligence from Enverus. Fleet tracking from Samsara and Geoforce. Over 50 connectors and growing.

The same RL principle applies to all of them: if you can define what "correct" looks like and verify it automatically, quality only goes up.

The Problem: Scale Creates a Quality Challenge

When you have a handful of tools, you can test them manually. When you have hundreds of tools spread across 50+ connectors — each with different APIs, auth models, data formats, and domain semantics — manual testing doesn't scale.

Every connector has its own quality questions:

Does WellDatabase's searchWells return the fields an AI agent needs — well name, API number, operator, coordinates, lateral length?
Does Corva's getWellDetails include real-time drilling data when a well is active?
Does ERCOT's getGridConditions return current load data within 5 seconds?
Does the RRC connector handle 977,000+ well records without timeouts?
Does Enverus return production data with the right units and date formats?
Do environmental connectors (USGS water, EPA air quality, NOAA weather) return valid GeoJSON?
When you add a new dataset to one connector, did you break another?

Without a systematic way to answer these questions across every connector, you're relying on user reports to find problems. That's reactive. We wanted to be proactive.

The Solution: Eval Suites for Every Connector

We built a connector eval system that borrows directly from the RL playbook: task + verifier, pointed at our tools instead of at model outputs.

An eval suite is a collection of test cases grouped by connector. Each case specifies:

A tool to call — e.g., searchWells on WellDatabase, getGridConditions on ERCOT, searchInspections on RRC
Input arguments — the exact parameters an AI agent would pass
Assertions — verifiable claims about what the response should look like

The assertion engine supports nine verification types that cover the patterns we care about across all connectors:

no_error          — the tool call succeeded
has_data          — a path in the response has non-empty data
field_exists      — a specific field exists (e.g., data[0].wellName)
field_equals      — a field matches an expected value
field_contains    — a field contains a substring
count_gte         — an array has at least N items
count_lte         — an array has at most N items
response_time_ms  — response completed within a time budget
response_shape    — response has the expected top-level keys

When you hit "Run All," each case executes against the real connector — same code path as a Claude or ChatGPT user calling the tool. The assertion engine checks every claim and records pass/fail with detailed diagnostics.

What This Looks Like Across the Platform

Every connector category has its own quality concerns. The eval system handles all of them with the same framework.

Oil & Gas Data Connectors

WellDatabase — search returns wells with complete metadata; operator queries match known operators; production data has valid numeric fields
Corva — real-time drilling data includes WIT streams; well details have survey and completion data; rig assignments resolve correctly
Enverus — production queries return data with correct units; lease records include all required regulatory fields
RRC — 37 tools covering wells, permits, inspections, pipelines, gas plants, and production across 977K+ records

Energy Markets

ERCOT — grid conditions return current load and frequency; LMP prices have valid node identifiers; fuel mix totals sum correctly
CAISO — day-ahead prices return for requested dates; renewable curtailment data includes MW values
EIA — petroleum supply data matches known reporting periods; natural gas storage reports have regional breakdowns

Environmental & Geospatial

USGS Water — streamflow queries return valid gauge data with timestamps; site lookups resolve by HUC code
EPA / AirNow — air quality index returns current readings; monitoring station data includes lat/lng
NOAA / NWS — forecast data returns for valid coordinates; historical weather has temperature and precipitation fields
Wetlands / Floodzone / Soils — spatial queries return valid GeoJSON with proper feature properties

Enterprise & Productivity

Snowflake — query execution returns results with correct column types; schema introspection lists all tables
Samsara / Geoforce — fleet queries return vehicle positions with timestamps; geofence lookups resolve correctly
GitHub — repository queries return valid issue and PR data; project board cards have correct status fields

The same nine assertion types work everywhere. has_data verifies a Corva drilling response just as well as an EPA air quality reading. response_shape validates an ERCOT grid response the same way it validates a WellDatabase search.

The Feedback Loop

The real power isn't in running evals once — it's in the loop they create across the entire platform:

The eval loop: write a test, run it, see it fail, fix the handler, watch it pass. Repeat — across every connector.

Today this loop is manual — you see a failure, open the handler code, fix it, re-run. But the architecture supports automation at every step. A failing eval can be fed to an AI agent along with the handler source code to propose a fix. Evals can run on every pull request so connector quality never regresses.

When we recently audited our RRC connector, the eval system immediately surfaced that 26 of 37 tools returned empty results (ETL still loading), 2 had a date parsing bug, and 9 were fully operational. That's the kind of visibility you need when you're scaling a platform — not guessing, knowing.

Where We're Headed

We're building eval suites for every connector on the platform, starting with the highest-traffic ones and expanding outward. The roadmap:

CI-gated evals — every PR that touches a connector handler must pass its eval suite before merging. Break a WellDatabase search? The PR is blocked.
AI-assisted diagnosis — failing evals automatically generate fix proposals by analyzing the handler code against the expected output
Coverage dashboard — track which connectors and tools have evals, which don't, and prioritize the gaps
Cross-connector consistency — ensure that patterns like pagination, error handling, field naming, and GeoJSON output are consistent across all 50+ connectors
Regression detection — when an upstream API changes (ERCOT updates their schema, WellDatabase adds a field), the eval catches it before users do

The insight from RL applies directly: if you can define what "correct" looks like and check it automatically, you can improve continuously. We're applying that principle to every connector on the PatchOps platform — oil and gas, energy markets, environmental data, enterprise tools — and the result is an ecosystem of AI tools that gets more reliable with every iteration.

The eval system is live on PatchOps today. If you're building MCP tools for any industry and want to see how assertion-based verification can improve your connector quality, we'd love to talk.

How We're Using Reinforcement Learning Principles to Build Better Oil & Gas Tools

The Problem: Scale Creates a Quality Challenge

The Solution: Eval Suites for Every Connector

What This Looks Like Across the Platform

Oil & Gas Data Connectors

Energy Markets

Environmental & Geospatial

Enterprise & Productivity

The Feedback Loop

Where We're Headed

Related Articles

Pricing AI in Barrels: Why We Denominate Tokens in Energy

Paylocity to Active Directory: Automating Employee Lifecycle Sync for Mark-Taylor Residential

How We Made Keyframe Editing Feel Like Moving a Camera

Ready to automate your operations?