The Source of Truth Problem is Getting Worse with Agents

Intelligence is an emergent, social phenomenon [1]. Knowledge is socially produced and, when that knowledge fails to be socialized, an intelligent process will draw dumb conclusions. One cannot be intelligent in a vacuum and every intelligent life form is nurtured in a world of physical processes and living things. Likewise, every language model bootstraps its intelligence off the frenetic activity of the internet. The point is overlooked in a discourse focused on model capabilities, but it has a direct implication: a model’s intelligence depends deeply on how well it can observe and react to the world as it changes [2].

“...agentic systems will only exacerbate the source of truth issue by accelerating knowledge creation at the edges, thus rendering themselves and the organization as a whole less intelligent over time as context fractures.”

Consider these two failure modes in deploying AI that no amount of improvement in reasoning capabilities can address: 

  1. Domain knowledge is fractured and stuck in various silos of platforms, databases, documents, PDFs and in people’s heads; no amount of intelligence can be useful when working on false assumptions.
  2. Modern databases that enforce strict schemas are overly constraining what models can write back into the organization's memory. 

These are technical conditions that create what we call the source of truth problem: when knowledge is created faster than it can be unified, competing versions of reality multiply and no one human or machine knows which one to trust. It’s a big reason why the text file has become a popular storage format for agents. Databases are too rigid and tend to lock data in, whereas text files are open, portable, shareable and have no preset schema or constraints. 

And it would be great if we could stop at files, but there's just one problem: files don't scale. Ten files works. A hundred files is ok. A thousand files is getting sketchy. Thousands more and I'm going to ask if this is just a database with extra steps [3]. The other obvious problem is that if you have a file and I have a file and I change something, how do we synchronize that change? Couldn’t we use something like git? That works in the simple case, but what if we have 100 people all trying to update the same file at once [4]? That shared file is a bottleneck, so to keep moving we will split them into local copies which become competing versions of reality. Note that reading in many competing versions of reality is very confusing for a model! At scale, files recreate the same problems they were meant to solve.

A screenshot of bond, Tonk’s internal agent orchestration framework which uses Carry to synchronize task information. In this demo, 6 parallel agents edit files simultaneously. Carry is built on Dialog-DB. 

This isn't just a technical problem, it's a fundamental property of how knowledge works at scale. In "The Use of Knowledge in Society", economist and philosopher Fridrich Hayek claims no single mind can hold access to all the information needed to make optimal decisions. Knowledge is local, dispersed and contextual [5]. This holds true with computerization, where computer systems organize knowledge just as they accelerate the creation of knowledge across contexts. At Airbnb, when I worked on the design systems team we had our own "source of truth" problem [6].

The issue was that the Airbnb app needed to feel like it was made by a unified team when in reality it had hundreds of designers and thousands of engineers split across all parts of the application experience. The solution was to form a dedicated team that could build a unified design system and then share that system with the rest of the organization. But this created a new problem where that team couldn’t keep up with the needs as they evolved across the organization. As a result, designers and engineers would implement bespoke modifications of the design system to fit their needs. Each time this happened a new "source of truth" was born. There was the official design system and the many bespoke unofficial design systems tucked away in the corners. This was fractal-like in nature, where even within a single design system category you'd have differences across target platforms, like Android and iOS. Which version was right? What components should we choose when making a new feature? How could we merge them back together into a single representation?

Near the end of my stay at Airbnb, I began to understand that the primary bottleneck was in the ability to write back to the system. Agentic systems will only exacerbate the source of truth issue by accelerating knowledge creation at the edges [7], thus rendering themselves and the organization as a whole less intelligent over time as context fractures. The Airbnb experience made it clear to me that what is really needed is something that accepts the distributed nature of knowledge rather than fighting it. That's what Dialog is.

Dialog-DB is a portable, embeddable local first database. It’s an open source project at the heart of Tonk substrate. Dialog solves the information problem Hayek describes by chopping the Gordian knot; it simply accepts that knowledge will be held at the edge. Once we accept that it is impossible to centralize all knowledge, we can reframe the problem as how to make it easy to share knowledge.

Dialog does this by taking inspiration from Git and from the web and applying it to a database. A Dialog repository lives on your machine just like a file, or more accurately, like a Git repository. However it’s still a database that is designed to handle hundreds of thousands of entries. It allows data to write back in whatever schema is required at the time of writing and read in whatever schema is required at time of use. There is no one-size-fits all schema, instead a single data attribute (e.g. “name”) can be connected to many different schemas (e.g. “employee”, “designer”, “volunteer”) [8]. In addition, Dialog is syncable which means allowing data entries to merge with other remote copies of that entry as knowledge updates simultaneously. If any two repositories are connected, they can sync. 

If Dialog had been at Airbnb, each team would still make their own components, but those components would be added into a shared substrate where they could be related and integrated into the whole. The system would create the right contact points to allow for the orderly co-evolution of the design system across the whole organization in real-time. In this regime, agents would help to unify context by identifying related parts and facilitating their integration. With Dialog, the agents, and the whole organization, would become more intelligent over time as the web of knowledge grows.

If the source of truth problem resonates with you, whether you're building agentic systems, managing knowledge at scale, or just tired of watching context fracture, come talk about it in our Discord.

If you want to follow what we are building or get notified when a new piece of writing is up, you can subscribe to the Log below.

Reference and Notes

  1. Negarestani, Reza. Intelligence and Spirit. Urbanomic, 2018.
  2. Concerns about the full replacement of humans by AI are likely overstated. Human intelligence is deeply social and context-aware in ways current large language model architectures are not.
  3. “Your AI Agent’s Memory Is Just a File? That’s the Problem.” Mem0, https://mem0.ai/blog/your-ai-agents-memory-is-just-a-file-thats-the-problem.
  4. “AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub.” arXiv, 2026, https://arxiv.org/abs/2604.03551.
  5. Hayek, Friedrich A. “The Use of Knowledge in Society.” The American Economic Review, vol. 35, no. 4, 1945, pp. 519–530.
  6. Chukreiev, Andrey. “Airbnb Design System: How an Industry Standard Exists Even If No One Has Ever Seen It.” Medium, https://medium.com/@chukreiev/airbnb-design-system-how-an-industry-standard-exists-even-if-no-one-has-ever-seen-it-2fa9da2829cb.
  7. “Allard, Antoine, et al. “Hierarchical Team Structure and Multidimensional Localization (or Siloing) on Networks.” arXiv, 2022, https://arxiv.org/abs/2203.00745.
  8. “Dialog: What Becomes Possible.” GitHub, Tonk Labs, https://github.com/tonk-labs/rfc/blob/909082fd4089f63f22d260369e47f63c51886221/rfc/dialog-capabilities.md.