Tragedy of the Cognitive Commons

When the shared mind becomes proprietary fuel

For decades, we built something extraordinary together. Through forums, collaborative projects, blog posts, shared datasets, and open-source code, we collectively shaped a living knowledge system — what could be called the cognitive commons. It wasn’t just a repository of facts or tools, but a dynamic ecosystem: a culture of inquiry, experimentation, and iterative learning. This commons was the infrastructure behind the internet’s most powerful promise — open participation in the creation of shared understanding.

But while we were busy contributing, something changed. The very content we offered freely — questions, insights, stories, code, essays, art — was quietly being harvested to train the large-scale language models and generative systems now dominating headlines and markets. Today, those same systems can mimic our voices, replicate our artistic styles, and summarize our ideas with uncanny fluency. What was once open and participatory has been repackaged, fenced off, and sold back to us under license.

It is theft, plain and simple. Not just of style or data, but of the collective effort behind meaning itself.

We Didn’t See the Theft

There was no singular moment of violation — no breach, no hacked gates.

Instead, the extraction was slow and systemic, enabled by terms of service most users never read and data collection practices that were often barely visible. Bots scraped the public internet — forums like Reddit and Stack Overflow, open-access research, social media posts, and creative platforms.

Content generated over decades by millions of people was absorbed into proprietary training datasets and transformed into high-value commercial assets.

Now, the very models built on these contributions are locked behind paywalls, metered APIs, and enterprise licensing agreements. The commons was not only mined — it was inverted. What we gave away freely is now being sold back to us, stripped of attribution and divorced from the networks of human dialogue that gave it meaning in the first place.

This is not innovation in the spirit of shared advancement. It is a privatization of the collective mind.

A New Kind of Extraction

Unlike resource extraction in the industrial age, this cognitive extraction leaves no visible crater. But the damage is real. Platforms that once thrived on contribution and discussion are imploding under the weight of unreciprocated use. Artists are increasingly displaced by synthetic imitations of their work. Writers are outranked by derivative outputs of their own voice. What we once called the public domain is being hollowed out — not through neglect, but through commodification.

The original spirit of the commons was not just about availability, but about process. It was built on intellectual friction — on public discourse, challenge, failure, refinement, and learning in the open. It was never intended to be mined passively. The depth and value of the commons came from the effort of collaborative thinking, not from the content alone.

It was a place to build together. Not a resource pool for trillion-dollar model pipelines.

The Inversion of Value

It is difficult to overstate the absurdity of the current arrangement. The models we now pay to use were trained on knowledge we created — not individually, but collectively. They learned from our culture, our language, our art, our problem-solving heuristics. They did not emerge in isolation. They are aggregations of human expression at scale.

The relationship has been reversed. The contributors — once the lifeblood of the open internet — are now positioned as consumers.

Their participation is no longer seen as the foundation of intelligence, but as marketable behavior. The commons itself has been repositioned from a creative substrate to a data supply chain. And the loop that once allowed thinkers, artists, and researchers to learn from each other has been severed. The models provide answers, but they don’t link back. They don’t credit. They don’t participate.

This matters — not just ethically, but structurally. A machine trained on collective thought should belong, at least in part, to the structure that enabled it. When research is conducted within a university, the institution is cited. When breakthroughs occur in corporate labs, the company is credited. Why should it be any different when the source is a distributed, participatory commons?

What We Lose When We Simulate Ourselves

There is something uniquely destabilizing about simulation without source.

When AI systems produce responses, they do so with increasing fluency and authority. But fluency is not fidelity. These outputs are not grounded in ongoing conversation or subject to revision. They cannot be challenged, refined, or debated in real time. And this has consequences for the epistemic health of our digital culture.

The commons once operated as a living knowledge engine: people contributed, challenged, improved, and iterated. But machine-generated synthesis flattens this into something static — an answer, not a process.

Over time, the expectation of certainty replaces the habit of inquiry. And slowly, we lose our tolerance for the ambiguity, conflict, and evolution that real knowledge demands. This is not simply about quality control or hallucination. It is about how thought degrades when it is separated from its context. We move from thinking in networks to consuming in fragments. From community to completion.

Rebuilding the Cognitive Contract

If the commons is to survive — let alone evolve — we need to re-establish a baseline of reciprocity between those who contribute to knowledge systems and those who build on them.

  • That begins with consent. No more silent scraping.

  • It continues with traceability. Outputs should carry with them the imprint of the sources they emerged from, wherever possible.

  • And it must include compensation — not as charity, but as structural recognition that value has been created and transferred.

We’ve navigated similar territory before. Digital Rights Management (DRM) — flawed as it was — represented an early attempt to place boundaries around digital goods: to define who could use what, how, and under what terms. It was an imperfect system, but the underlying principle was clear: creative output has structure, provenance, and value, and that value deserves protection.

The same logic should apply to the building blocks of synthetic intelligence.

  • Our writing.

  • Our images.

  • Our language patterns.

  • Our collective discourse.

These are not raw materials to be extracted without limit. They are the scaffolding of culture — and they deserve the same consideration we once fought to give to music, film, software, and art.

Of course, it isn’t always straightforward. Some AI outputs feel original not because they are, but because they are synthesized from countless untraceable fragments. Even the developers of these systems admit they can’t fully unpack what is being accessed, transformed, or remixed inside the model. That’s part of the reality we’re dealing with: artificial cognition that draws from human insight in ways we don’t fully understand.

But this complexity is not an excuse to abandon accountability. It is a call to design for humility. To build systems that acknowledge their sources and maintain the thread of intellectual lineage wherever they can. To shift from opacity to transparency. From erasure to attribution.

The Real Tragedy — and the Possibility Ahead

The tragedy is not merely that the commons has been exploited. It is that the very systems we built to expand knowledge are now contributing to its erosion.

The more we prompt, the less we participate.

The more we outsource, the more we forget how to originate.

The cognitive commons won't shrink because of use — it will shrink because of disconnection.

But it doesn’t need to be this way. he same systems that have diminished participation can be redesigned to support it. AI doesn’t have to exploit the commons — it can strengthen and expand it.

As I wrote in The Real Singularity Is Human :

“The future belongs to tools that extend human agency — not simulate it into irrelevance. The best use of intelligence is not to mimic us, but to amplify the systems we’re part of.”

That is the opportunity: not just to regulate AI’s excesses, but to use it as a mechanism for restoring integrity, participation, and continuity in how we build knowledge and culture.

At Vital Ocean, We Choose a Different Direction.

Instead of scraping the internet, we trained OceanGPT on peer-reviewed ocean science — every output traceable to a study, a researcher, a dataset. Not to limit intelligence, but to ground it. Not just to reduce hallucinations, but to preserve the scientific method in how we ask and answer.

This isn’t just about cleaner data. It’s about restoring trust — in knowledge, in dialogue, in the connective tissue between disciplines. OceanGPT was built to surface what’s overlooked, link what’s fragmented, and expand the impact of real research — not replace it.

We believe models like this can unlock entirely new forms of collaboration. But more importantly, they remind us that AI isn’t inevitable. It’s directional. The infrastructure we build reflects the choices we make.

We don’t need to simulate participation. We can design for it.

And that’s the real test — not what these systems can say, but what they can help us do together.

📥 Let’s Keep This Going

I’d love to hear your thoughts — or connect if you’re working on commons-based infrastructure, AI accountability, or regenerative knowledge systems.

Previous
Previous

Foresight - The New Superpower

Next
Next

Intelligence Without Direction