2026-05-11 · project

Three tools for the Pacific data ecosystem

Work in progress. Longer technical write-ups for each of the three are coming; this is the orientation post.

A short overview of three connected tools I’ve been working on at SPC over the last year and a half, with collaborators across the SIS-CC community.

Context

The Pacific Community runs PDH .Stat, the regional SDMX hub for the Pacific island nations: censuses, labour-force surveys, fisheries indicators, education statistics. Real data, used by real policy people, undersized in the broader data-engineering conversation. My role is to look after it and the infrastructure around it.

SDMX (Statistical Data and Metadata eXchange) is the format the international statistical community uses to publish official numbers. It’s powerful, it’s mature, it’s interoperable across organisations, and it’s a pain to query if you don’t already speak its vocabulary. The three tools below address different parts of that pain.

1. SDMX MCP Gateway

github.com/Baffelan/sdmx-mcp-gateway

A Model Context Protocol server that exposes SDMX .Stat endpoints to AI agents through small, discovery-first tools: list_dataflows, get_dataflow_structure, get_dimension_codes, check_time_availability, probe_data_url, suggest_nonempty_queries. The design principle is progressive disclosure. Instead of dumping 100 KB of XML metadata into an agent’s context window in one shot, each tool returns a few hundred bytes of relevant information that the agent can use to decide what to ask next.

Three things made the engineering hard.

First, summarising SDMX metadata into a form an LLM can plan against. A single dataflow can carry tens of dimensions, each with codelists in the hundreds or thousands, plus hierarchies, attributes, and constraints. Raw XML overflows any sensible context window. Most of the design work went into deciding what to surface, when, and in what shape, so the model sees a digestible view of the catalogue at every step without losing the information it actually needs to act.

Second, the shape of the tool set itself. A naïve “expose everything” interface made agents thrash. We iterated on a small set of progressive-disclosure tools whose return shapes nudge the agent toward the next useful call. Getting the granularity and the flow right, so the agent rarely has to retry or backtrack, took several rounds with real models.

Third, the more technical layer: provider variation. OECD, UNICEF, ESTAT, the World Bank, and SPC’s own deployment all interpret the SDMX 2.1 specification slightly differently. Different conventions for time codes, for empty constraints, for trailing slashes, for what counts as a valid key. The gateway smooths those differences so the consuming agent doesn’t have to know which endpoint is on the other end.

A smaller but load-bearing quirk worth mentioning: the empty-query trap. SDMX endpoints will happily validate a query without confirming it returns rows, so probe_data_url actually fetches the URL and either yields data or proposes minimal fixes when the result is empty. Most “build a dashboard for X” failures live precisely there.

Architecture diagram: AI Agent sends MCP calls to the SDMX MCP Gateway, which exposes six progressive-disclosure tools and fans out to five SDMX endpoints (SPC/PDH, OECD, UNICEF, Eurostat, World Bank). — The gateway sits between an AI agent and any SDMX 2.1 endpoint. Provider differences are normalised; the agent sees a uniform interface.

Status: production. Used internally for several lines of work.

2. SDMX Surfer

github.com/PacificCommunity/sdmx-surfer

A Next.js application where a user describes a dashboard in natural language and an agent, backed by the MCP gateway above, finds the right dataflow, queries the right slice, and assembles a dashboard configuration in real time.

The interesting architectural piece is a three-tier context cache:

Tier 1: stable SDMX knowledge (the spec, our component library docs), around 10 to 15K tokens, cached across sessions.
Tier 2: dataflows discovered during the current session, summarised.
Tier 3: per-turn fresh MCP calls.

That structure keeps token costs predictable while still letting the agent build a working knowledge of the catalogue over a session. The output is JSON configuration consumed by an existing React component library (sdmx-dashboard-components) we built for the Pacific Data Hub.

By design, the agent works entirely at the metadata layer. It uses MCP discovery to find the right dataflow, dimensions, codes, availability, and query shape, then produces query keys (SDMX selectors that point at a slice of a dataflow) and plot configurations (which chart, which axes, which formatting). The dashboard runtime then fetches observations from the SDMX endpoint directly. That closes off the worst-case hallucination, the one where a model produces a value and presents it as data.

That design choice extends beyond the hallucination problem. LLMs are trained predominantly on data from high-income, English-language, and Western-institutional contexts. Pacific island nations are among the most underrepresented groups in those corpora. A model reasoning about a Vanuatu labour survey or a Kiribati fisheries series draws on patterns learned elsewhere, patterns that carry embedded assumptions about which trends are normal, which movements are surprising, and what a number means in context. Those assumptions are culturally specific, and they are invisible to the model itself.

This matters in particular for the Pacific region, where Indigenous Data Sovereignty frameworks are well established. The CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics), developed by the Global Indigenous Data Alliance, and the principles of Te Mana Raraunga (the Māori Data Sovereignty Network) both assert that communities hold the right to govern how their data is interpreted and narrated. Delegating that interpretive authority to a language model is a governance question with practical consequences.

The Surfer’s architecture is built around that constraint. The agent picks a dataflow and assembles a query key, and selects a chart type and axis configuration: mainly technical choices, though never purely so. The agent’s output is a JSON configuration file, exactly the kind that dashboard builders at PDH were assembling by hand before this tool existed. That provenance matters: the format was designed for human readability, and it stays human-editable after the agent produces it. A user can inspect every choice the agent made, adjust any of it, and confirm the configuration before data is fetched. The values that appear on screen come from the client-side dashboard component calling the SDMX endpoint, not from the model. The model never generates a sentence about what those numbers mean. Interpretation stays with the person using the tool: the statistician from a Pacific national statistics office, the policy analyst at SPC, the government minister who commissioned the indicator. They have the context the model lacks, and the standing to draw conclusions from their own data.

Architecture diagram: the LLM agent uses MCP to discover SDMX metadata, writes a human-readable JSON dashboard configuration, the dashboard library reads that configuration, and the resulting interactive dashboard fetches SDMX data values directly from the endpoint. — The agent produces configuration; the dashboard library turns it into an interface; the interactive dashboard fetches values. The model writes the question and chart specification, not the numbers shown in the chart.

Status: invite-only alpha; public release rolling out in the coming weeks.

3. SDMX Mapper

gitlab.com/sis-cc/experiments/ai-based-modelling-tool, joint with StatsNZ, the OECD, and the ILO. The repo name is the long-form description of the work; the tool itself we call SDMX mapper.

An AI-assisted codelist harmoniser, built primarily for StatsNZ and scoped to their workflow. The hope, once it matures enough, is for it to be picked up across the Pacific region and ideally further afield.

The upstream problem. Source codelists in country surveys (employment status, age groups, occupation categories, education levels) get mapped into international standard codelists by hand. The work is slow, error-prone, and a known bottleneck for producing comparable statistics.

We’re evaluating six different LLM pipeline architectures (end-to-end, parse-then-compute, hierarchy-augmented, multi-step, hybrid, and a fan-out / fan-in orchestration) across a 675-configuration grid: 5 models × 3 prompt strategies × 5 concepts. Results are scored against an expert-curated gold standard with a target around 75 % F1 on at least three concepts.

The work is about measurement. We want to know which approaches are trustworthy enough at what accuracy to put into production, and which aren’t. Some approaches will fail; that is also useful information.

Architecture diagram: a source codelist with country survey terms feeds into six LLM pipeline architectures (end-to-end, parse-then-compute, hierarchy-augmented, multi-step, hybrid, fan-out/fan-in), evaluated across 675 configurations (5 models × 3 prompts × 5 concepts) against a target of 75% F1, producing a standard international codelist. — Six pipeline architectures, 675 configurations. A measurement of which approaches are trustworthy at what accuracy level.

Status: research-stage. Planning complete, production scaffold in place, grid run in progress.

Common threads

The three projects share methodology and infrastructure. The Mapper is StatsNZ’s, scoped to their own data pipeline; the Gateway and the Surfer run at SPC, oriented to the Pacific Data Hub. Underneath all three is the same approach: progressive-disclosure tool design for LLM clients, a strict separation between what the model decides and what the model touches, and SDMX as the common substrate.

Each tool is independently useful. The longer-term aim, taking the three together, is a setting where statistical offices in the region (and beyond) can produce and use comparable, standards-based data without first having to become SDMX experts themselves.

Technical follow-ups for each of the three are on the way. If any of this overlaps with what you’re working on, I’d love to compare notes.