Article

Four Vaults, One Job: The Knowledge Heist I Called in 2019

Five years ago I wrote a trilogy for Forbes about why, in the age of AI, we still spend our days gathering instead of analyzing. I named the marks, mapped the vaults, and called the getaway. Here is the same story, told now that the crew is finally assembled.

← Back to Blog

Let me tell you about a job. Not a glamorous one. No tuxedos, no fountains at the Bellagio. Just you, a deadline, and an answer that is sitting in plain sight across four different buildings, each with its own alarm system, its own guard, and its own strong opinion about what a "search box" should do.

In December 2019 I wrote the first of three pieces for Forbes about this job. I want to run it back, because in 2019 I could only describe the heist. I could name the marks and sketch the floor plan. What I could not do was assemble the crew. In 2026 the crew finally walked in the front door, and I want to introduce you to them.

The mark: four vaults, four alarm systems

Every knowledge worker is casing the same four vaults. I called them the four P's, and five years later they have only grown taller.

Public. The open web. Wonderful for finding a movie cast or a product spec. Less wonderful when you need the one chart inside the report, because it hands you a stack of documents to read and wishes you luck. It finds the container, not the contents.

Private. The corporate super-silos: Microsoft 365, Box, Salesforce, ServiceNow, and the long tail of CRMs, ERPs, ticketing systems and databases behind the firewall. Each runs a tidy search over its own little kingdom and politely declines to look next door. In 2019 they were silos. In 2026 they are walled cities.

Paid. The professional sources: case law, peer-reviewed research, patents, market and industry data. Priceless, and each one speaks a different dialect of advanced-search-form, taxonomy and special code. Knowing how to query them is its own profession.

Personal. Your own trail: the PDFs you keep re-saving, the bookmarks from a service that shut down, and the colleague three desks over who simply knows. The most valuable vault, and the one that walks out the door the day that colleague resigns.

Here is the tell that makes it a job and not an errand: the answer is almost never in one vault. It is a public ruling, plus a private contract, plus a paid dataset, plus the thing only you remember. Four vaults, one answer, and a clock running.

The long con that never pays

For thirty years the industry sold one way to crack this: build a bigger vault. Stand up an enterprise search engine, move all the data into it, remodel it, index it, and reindex it. I described that project in the second piece, and the lifecycle is grimly predictable. A year or more to buy or build it: requirements, an RFI, consultants, a team, pilots, schema mapping, connector development, a security review or two. Then the grand opening, which actually meets the sky-high expectations... for a year, maybe two.

Then the drift. The team that built it moves on. New sources get deferred. Schemas change. The survey scores curdle, someone declares "a clear problem to solve," and the whole procurement starts over. You did not solve the silo problem. You built a new silo and scheduled its funeral.

And here is the 2026 update, because the con came back wearing a sharper suit: the vector database. Same move, new vocabulary. Copy every document into an index, embed it, secure the copy, govern the copy, and then re-embed and re-secure it forever as the originals change underneath you. It is the same three verbs I spent a trilogy warning you about: move, remodel, reindex. The moment you make a second copy of the truth, you own something that is wrong by Tuesday.

The tell I kept circling

In the third piece I finally said the quiet part. Knowledge is not the same as files. It is smaller, messier, harder to capture, and it does not paste into Word correctly. A snippet with no source is not insight; it is evidence with no chain of custody. You cannot trust it, cannot quote it, cannot act on it.

What is actually worth keeping is not the document. It is the endorsed answer plus the lineage that makes it trustworthy: where it came from, who approved it, which version is official. Strip that away and you are back to gathering raw data you will lose and re-find and lose again. That was the real heist all along: not stealing the documents, but walking out with the one answer everyone agrees is right.

The line I almost got too cute with

At the end of that 2020 article I made a prediction I was a little too pleased with. I wrote that these knowledge streams would become "the crude oil for a variety of AI and machine learning agents" who would automate the gathering and let humans finally get to the analyzing.

Reader, the agents showed up. They have names now: Claude, Copilot, ChatGPT. They have a protocol: MCP. The prediction was not wrong; it was just early, and it was missing one thing. An agent pointed at four locked vaults is exactly as stuck as you are. The oil was real. What was missing was the crew that could get it out of the ground without blowing up the building.

The crew

So here is the team I could only dream about in 2019. Every member has one job, and no one makes a second copy of anything.

Diagram: the four vaults (Public, Private, Paid, Personal) are queried live by SWIRL, which federates with your permissions, re-ranks with no vector database, and endorses the canonical answer, then delivers it to your people in the Galaxy UI and your agents over the MCP server. No second copy, no index to drift.
The four-vault problem, solved: query live, never copy, endorse the answer, hand it to people and agents alike.

The driver: federation. SWIRL queries all four vaults at once, live, using your existing permissions, and brings back pointers, not copies. Nothing is moved, remodeled or reindexed. The three cursed verbs never happen.

The ranker: a three-pass relevancy pipeline. Keyword and BM25 first, so exact terms are honored; then embeddings with hybrid fusion for meaning; then a cross-encoder that reads the query and the document together to score what is actually relevant. It runs locally, with no vector database to build or secure. It finds the best part to read, which is the one thing the open web would never do for you in 2019.

The fixer: canonical answers. Teams pin the organization-approved result for a question, and every later search, by a person or an agent, returns the endorsed answer with its provenance attached. That is the context and lineage I said raw data always lacked, finally built into the result.

The inside man: the MCP server. SWIRL 5 is headless and API-first, so your agents call it as their governed knowledge layer and get ranked, permissioned, organization-approved answers instead of the model's best guess. The crude oil, on tap, for whichever agent you trust.

The lookout: the hallucination warning. Every generated answer is checked against the sources it cited. When a claim is not supported by what was actually retrieved, SWIRL flags it before it walks out the door. No fabricated citations, no confident nonsense.

The score

Remember the number that started all of this. IDC, 2001: two and a half hours a day looking for information. McKinsey, 2012: nineteen percent of the week. IDC again, 2018: data professionals losing half of every week to finding, governing and duplicating. A quarter century of cloud, mobile and connectivity, and the needle on gathering-versus-analyzing barely moved.

You do not move that needle by building a bigger vault. You move it by never moving the money in the first place. Leave every record exactly where it lives. Query it live, with the asker's own permissions. Rank it honestly, endorse the right answer, and hand it to the person or the agent who asked, with the receipts attached.

That was the plan in 2019. I just had to wait for the crew. They are assembled now, and the vault doors are open. You can spend tomorrow analyzing instead of gathering. Walk in the front door.

SWIRL 5 is the private knowledge layer for enterprise AI. To see it run against your own four vaults, request preview access, which includes a 30-minute guided stand-up on a slice of your systems.