The Primitive the Internet Forgot

May 2026 15 min read

We didn't have a word for it until 1960.

But the instinct (to describe, label, and embed context directly into the things we create so they can be found, understood, and trusted long after we're gone) is at least four millennia old.

This is the history nobody tells. And it ends with a problem AI is about to compound.

Nippur, 2000 BCE

Oral cultures had memory systems. Bards, griots, shamans, entire civilizations preserved knowledge through rhythm, repetition, and formula. But memory isn't metadata. It's storage. Metadata is what you create when storage alone stops being enough.

In the ancient Sumerian city of Nippur, scribes maintained a clay tablet unlike the others.

It didn't contain a poem, a hymn, or a legal record. It contained a list: 68 literary works, each identified by its first line. No titles existed yet. So the scribes used what they had, the opening words of each text as its unique identifier. A system called an incipit.

If we used it today, The Hobbit would be cataloged as: "In a hole in the ground there lived a hobbit."

It was a way to browse an entire collection without reading a single piece in it. A catalog. And it is the earliest known act of descriptive metadata.

When Assyriologist Samuel Noah Kramer translated this tablet in 1942, he understood immediately what he was looking at. Not a poem, not a prayer, but a structured index. An ancient librarian's tool for making knowledge findable.

The moment human knowledge began to outgrow what a single memory could hold, we immediately invented metadata to manage it.

“The moment human knowledge began to outgrow what a single memory could hold, we immediately invented metadata to manage it.”

The Colophon, 1400-600 BCE

The Nippur catalog was a separate object, a list that lived apart from the works it described. Babylonian and Assyrian scribes took the next logical step: they embedded the metadata directly into the artifact itself.

They called it the colophon. A structured block of information pressed into the clay at the end of every tablet, separated from the main text by a deeply scored line. Not an index sitting on a shelf somewhere. Identity information baked into the object.

A colophon recorded the title, the series number ("Tablet 4 of He Who Saw the Deep"), the scribe's name, and the owner. But its most sophisticated element was what scribes called source attestation: a declaration of where the tablet came from and whether it could be trusted.

A standard formula read: "Written and collated according to its original; copied from a tablet from [City Name]."

If the source document was damaged, the scribe didn't guess. They wrote hepu ("broken") or ulidi ("I do not know") directly in the colophon. An honest declaration that a gap existed in the master original. Data corruption, documented in clay, three thousand years before anyone used the phrase.

The Library of Ashurbanipal took it further still. Tablets there carried curses in their colophons: "He who steals this tablet, may the gods strike down his name." Early DRM. Crude, but the instinct is identical. Declare ownership, assert rights, attach consequences to violation.

The Epic of Gilgamesh, the world's oldest surviving work of literature, uses this system across all twelve tablets. Every physical copy carries embedded attribution, sequence metadata, and provenance. The colophon is why we know it as a unified work at all.

Oracle Bones, 1300 BCE

On the other side of the world, with no contact with Mesopotamia, Shang Dynasty China independently arrived at the same conclusion.

Oracle bones (ox shoulder blades and turtle shells used for divination) were inscribed with a four-part schema that would be entirely recognizable to a modern database engineer:

Preface: the exact date and the name of the diviner. Timestamp and operator attribution.

Charge: the question being posed. Often carved as a binary pair, the positive charge on the right side of the shell, the negative on the left. "It will rain today." "It will not rain today." The Shang Dynasty was running split tests in clay three thousand years before the concept had a name.

Prognostication: the king's interpretation of the cracks. Always attributed explicitly: "The King read the cracks and said..."

Verification: what actually happened, carved days or weeks later. The audit log. Closed-loop feedback on the accuracy of the system.

The verification is the most remarkable element. Consider a fully intact inscription from the reign of King Wu Ding, around 1200 BCE, recording a royal pregnancy:

Preface: "On the day jichou, Que divined."

Charge: "Lady Hao will give birth, and it will be good."

Prognostication: "The King read the cracks and said: if she gives birth on a ding day, it will be auspicious."

Verification: "Three weeks and one day later, she gave birth. It was not good. It was a girl."

They recorded the failure. In a royal archive, under a king's name. Because the integrity of the system mattered more than the convenience of the prediction.

These bones weren't discarded after the ceremony. They were stored in underground archives. Pit YH127 at the Yinxu site held thousands of them, some drilled with holes and bound together with cord. A persistent, organized, re-accessible data repository. Not a ritual object. An archive.

Different medium, different civilization, same instinct: embed enough context into the artifact that it can be understood on its own, audited against reality, and preserved for whoever comes next.

“Different medium, different civilization, same instinct: embed enough context into the artifact that it can be understood on its own, audited against reality, and preserved for whoever comes next.”

Alexandria, 280 BCE

The Library of Alexandria is remembered for what it contained. Less remembered is how it was navigated, or how close it came to being completely unusable.

At its peak, the Library held an estimated 400,000 to 700,000 papyrus scrolls. They were stored horizontally in pigeonhole grids. To find out what was in a scroll, you had to physically pull it out, hold it with both hands, and unroll it. Doing that repeatedly destroyed the fragile papyrus within months. At that scale, without a navigation system, the collection would consume itself.

The solution was the sillybos: a small strip of leather or parchment attached to each scroll, dangling outward from the shelf. On it, author, title, and often a line count. Scholars could walk the halls and read the tags like a dashboard. Browse without unrolling. Scan without reading. The ancient equivalent of a thumbnail and a filename.

But the deeper innovation was the Pinakes. Around 250 BCE, the chief cataloger Callimachus compiled a 120-volume master index of the entire library's contents. He didn't just list the works. He invented a classification architecture. Genres as top-level categories. Authors alphabetized within each. Under each author: biographical metadata, a complete list of works, and a critical assessment of authenticity, whether each text was genuinely written by that author or a forgery.

Provenance and authenticity metadata. At library scale. In 250 BCE.

The line counts on the sillyboi served a second purpose: scribes were paid per line copied, so stichometry (counting lines) was also file-size metadata and a payment verification system. Metadata as infrastructure, not annotation.

The Long Middle

For roughly two thousand years after Alexandria, the structural logic of metadata didn't fundamentally change. Format shifted as scrolls became codices, codices became printed books, but the underlying architecture stayed Alexandrian. A central index, categories, attribution, sequence.

A few landmarks worth noting:

—1876 Melvil Dewey's Decimal Classification gives libraries a universal organizing language
—1901 Library of Congress begins printing standardized 3x5 index cards, physical metadata at industrial scale
—1968 A watershed year: Philip Bagley formally coins the term "metadata," and Henriette Avram develops MARC (Machine-Readable Cataloging) at the Library of Congress, the first metadata standard designed explicitly for machines to read, not humans. The word and the machine-readable practice arrive in the same year.
—1991/1994 The IPTC Information Interchange Model is approved, defining a structured schema for embedding captions, credits, and copyright directly into press photographs. Adobe Photoshop's 1994 adoption makes it practical. For the first time, a descriptive metadata layer travels inside the image file itself.
—1995 Dublin Core establishes a universal 15-element set for describing digital resources
—1998 W3C's XML gives the web a machine-readable language for metadata exchange

By the end of the twentieth century, the infrastructure for describing digital content was mature, standardized, and machine-readable. The stage was set for everything to finally work at scale.

Then the web split in two.

The Fork in the Road, 2011

In 2011, Google, Microsoft, Yahoo, and Yandex jointly launched Schema.org, a shared vocabulary for structured data that any website could embed in its pages.

The idea was simple: if you describe your content in a language machines understand, machines will understand your content. A recipe page could declare its ingredients, cook time, and rating in a format search engines could read directly. A product page could declare its price, availability, and reviews. An article could declare its author, publication date, and topic.

Text content got a semantic layer, a machine-readable identity that traveled with it across the open web. Images had IPTC inside the file. But embedded metadata is only as good as the systems that preserve and read it. On the open web, those systems never materialized for images.

“But embedded metadata is only as good as the systems that preserve and read it. On the open web, those systems never materialized for images.”

The Great Erasure, 2010s

The infrastructure for image metadata existed. Cameras had been embedding EXIF data (shutter speed, aperture, GPS coordinates, timestamp) automatically since the early 2000s. The IPTC standard meant descriptive metadata (captions, credits, copyright, keywords) could travel inside the image file too. A creator could embed their name, their rights declaration, their intent, directly into every file they exported.

Then the platforms arrived and deleted it.

Instagram, Facebook, X, Pinterest, every major social network stripped embedded image metadata on upload to save server bandwidth and simplify processing. Your copyright notice: gone. Your name: gone. Your description, your keywords, your rights declaration: gone. The file that left your camera or your studio carrying a genealogical record of its creation arrived on the platform as an anonymous object.

The standard existed. The platforms deleted it. And the AI scrapers that followed inherited a web full of images that had been deliberately blinded.

A benchmark study by Imatag analyzing over 40 million online images confirmed the scale of this digital amnesia: 85% of images on the web contain no embedded metadata at all. Of the 15% that do, only one in five contains anything meaningful (author, description, rights). Combined, roughly 97% of images on the web carry no meaningful creator attribution or context.

Four thousand years of accumulated instinct about how to make information findable and part of collective memory, stripped at the upload button.

“Four thousand years of accumulated instinct about how to make information findable and part of collective memory, stripped at the upload button.”

The Crisis, 2022-Present

For most of the web's history, the absence of image metadata was a quiet crisis, because a human was always the final interpreter. The model was a manual loop: you searched, scanned results, clicked, and used your own eyes to judge what you found. The machine's job was routing content toward you, and if that routing was imperfect, your brain corrected for it without thinking.

That loop is breaking.

Modern AI systems don't present options and wait for a choice. They interpret, select, and deliver: a single answer, a single result, a single image. When an agent answers a question or recommends a product, the machine's reading is the outcome. There's no human pass afterward. If the image has no structured context to work with, it doesn't exist in the result at all.

Vision models can identify what's physically inside an image with remarkable accuracy. What they cannot do is know what it's for, who made it, what it means in context, or how it's permitted to be used. Those things don't live in pixels. They have to be declared.

This problem didn't start with AI generation. The moment digital cameras and smartphones removed the friction of film, image volume outpaced any manual metadata workflow. Some photographers built workarounds (batch scripts, preset templates, tools to automate attribution) but it was always friction against scale. A gap that nobody closed.

The Image Chapter, Now

The 4,000 year arc of metadata history is a story about a single recurring human problem: how do you make information findable, attributable, and part of collective memory, beyond the lifespan of any single mind? Every civilization in this story faced it at a different scale and solved it the same way. Embed enough context into the artifact that it survives without you. In clay, in parchment, in card catalogs, in machine-readable standards.

AI has compounded an already overwhelming problem. The volume of visual content produced today (by cameras, by studios, by generative systems) eclipses anything that came before it. And almost none of it carries any memory of where it came from, who made it, or what it means.

We communicate in images now at a scale that dwarfs the written word, which makes the absence of structured context around them not a minor inconvenience but a foundational gap in how the web understands the world, and in how we preserve the memory of it.

“We communicate in images now at a scale that dwarfs the written word, which makes the absence of structured context around them not a minor inconvenience but a foundational gap in how the web understands the world, and in how we preserve the memory of it.”

The instinct that drove a Shang Dynasty scribe to record the diviner's name, the question posed, and the outcome verified, that instinct is exactly what's missing from 97% of images on the web today.

The primitive the internet forgot was the one it needs most.

Sources

—Kramer, S.N. (1942). "The Oldest Literary Catalogue." Bulletin of the American Schools of Oriental Research, 88, pp. 10-19.
—Kramer, S.N. (1956). History Begins at Sumer. University of Pennsylvania Press.
—Leichty, E. (1964). "The Colophon." Studies Presented to A. Leo Oppenheim. Oriental Institute of Chicago.
—Hunger, H. (1968). Babylonische und assyrische Kolophone. Neukirchener Verlag.
—George, A.R. The Babylonian Gilgamesh Epic. Oxford University Press.
—Keightley, D.N. (1978). Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China. University of California Press.
—Blum, R. (1991). Kallimachos: The Alexandrian Library and the Origins of Bibliography. University of Wisconsin Press.
—Casson, L. (2001). Libraries in the Ancient World. Yale University Press.
—Imatag. State of Metadata on the Web. Benchmark study analyzing 40M+ online images.

VISID embeds structured intelligence into images — giving them authorship, readability, and persistent identity in an AI-native internet. Learn more at visid.app.