Section 01

Innovation Research & Development

Investigating capabilities at the edge of what GenAI can do and cannot do.

Research Area 1.2

Data-Intensive Operations

Multi-agent systems that work with data the way human teams do

The Research

We're developing multi-agent systems that work with data the way human teams do: database engineers and analysts collaborating to answer questions. The AI versions are faster, more precise, and continuously available. This shifts what's practical: deeper analysis, more frequently, on demand.

While the concept was straightforward the implementation came with many challenges. Providing intuitive conversational interfaces while working against infrastructure and programming language constraints required extensive exploration. Finding setups that work, understanding what data platforms can actually do, repurposing features for purposes their designers didn't anticipate.

The research splits into three focus areas:

1. Data exploration across large, diverse data lakes.

We tested against EuroStat (the EU's statistical office datasets) to understand how agents navigate unfamiliar, heterogeneous data at scale. The focus was mainly on how reliably can the exploration find niche information (trust is Eurostat have many crayz stuff) in such large data lakes. The results are not perfect, but orders of magnitude better than a human can do. During the development, we came up with several interesting techniques that can improve the quality over time. On the top of all, the system can generates ad-hoc visualizations on the fly, presenting data (that never seen before) in real time rather than just returning massive query results.

2. Accurate answers from databases on demand.

Not simple query generation for SQL, MongoDB, or BigQuery, but systems that fully utilize database engine capabilities. The principle was to delegate computation to the database, minimize what the AI has to reason over. We tested with massive telecom traffic data, answering fairly complex operational questions with real-time visualization of results.

3. Autonomous transformation of ad-hoc datasets.

For example Excel files that arrive to you from 3rd parties. The developed system explore the structure, understand the content, and produce documented, clearly structured databases without manual intervention. This autonomous (but human validated) transofrmation allows the data being available for the actual work. Audit and accounting datasets proved ideal for testing this.

Building Cognition on Data

When reliable data access becomes a solved primitive, you can build cognition on top of it.

The shift is from "ask question, get answer" to systems that actually know what's happening across your data and can act on it. Decision systems that monitor, detect, and investigate on their own. Systems that can pull context from multiple sources to answer questions no single analyst could answer as quickly. When asking becomes cheap, organizations ask more — and learn things they never thought to look for.

Our SOC research (detailed in the Domain Research section) demonstrates this directly. A synthetic threat hunter connected to a SIEM system, proactively hunting on its own. The data access layer makes it possible; what you build on top determines the value.

Jevons' paradox applies. When analysis becomes fast and cheap, organizations don't do the same amount faster, they do more of it. Questions that weren't worth the analyst time before now get asked routinely.