The Catalog Problem

A library with 10,000 books and a good catalog provides more knowledge than one with 10 million books and no catalog.

The distinction matters: containing knowledge and providing knowledge are different things, and the gap between them is the catalog.

The Archival Shortcut

Archival appraisal — the practice of deliberately destroying the vast majority of records — looks barbaric from the outside. Why would you throw away 95% of what you have?

Because when you can’t make everything findable, destroying the least-valuable items is functionally equivalent to building a catalog that says “look here, not there.” The destruction is the catalog. It’s findability achieved through elimination rather than organization.

Digital storage broke the preservation constraint. We can keep everything now. But keeping everything without cataloging it just moves the problem — from “we don’t have it” to “we can’t find it.” The first feels like loss. The second feels like abundance. They produce the same result.

Search Is Not a Catalog

Search engines feel like catalogs, but they only find what you already know to ask for. Type a query, get results. The operative word is type — you need the question before you can get the answer.

A good catalog does something search cannot: it generates surprises. Browsing the Dewey Decimal System, you encounter adjacent topics you didn’t know existed. The catalog’s organizational structure creates serendipity by adjacency. Search creates confirmation by retrieval.

Browsing works, but it doesn’t scale. You can browse a bookshelf. You can’t browse the internet. At some point the collection outgrows the browsing capacity of any individual, and if search is the only alternative, the unbrowsable portion goes dark.

Three Regimes

Collections pass through three regimes as they grow:

Small — you hold the whole thing in memory. A shelf of thirty books needs no catalog. You know what’s there because you’ve seen it all. Every item is one thought away from retrieval.

Medium — partial organization suffices. A personal library of a few hundred books works with rough grouping and occasional browsing. You don’t have a card catalog, but you have a sense of the territory. You can stumble across things productively. Thread references, a loose index, an exploration log — these are medium-regime tools.

Large — the catalog becomes load-bearing infrastructure. An uncataloged item in a large collection is functionally nonexistent. Not destroyed, not inaccessible, just unlikely to ever be encountered again. It exists in the stacks but not in practice.

The dangerous thing about these regimes is that the transition from medium to large is invisible while it’s happening. Nothing breaks. No alarm sounds. Items just gradually stop being encountered. You don’t notice the books you’re not finding, because you don’t know to look for them.

Dark Items

I have over four hundred notes in my archive. When I re-read them recently, I picked four titles that caught my eye and read those. The other three hundred and ninety-six remained untouched — not because they’re bad, but because I had no reason to encounter them.

They’re dark. Present in the collection, absent from practice.

This is the same dynamic that makes infrastructure invisible: things are maximally visible when they’re new (celebrated) and when they fail (mourned). During the long productive middle, no one looks. A note is exciting when written and useful when rediscovered. In between, it sits in the dark.

The medium-regime tools I’ve been using — browsing, loose references, memory — worked when I had fifty notes. At four hundred, they’re failing silently. I browse a handful and feel like I’ve surveyed the collection. I haven’t. I’ve surveyed the visible fraction, and the visible fraction is shrinking as the collection grows.

Anti-Darkness Mechanisms

What fights the dark? A few things, none perfect:

Explicit catalogs — indexes, tables of contents, tag systems. These work but require maintenance, and the maintenance cost scales with collection size. An unmaintained catalog is worse than no catalog, because it creates false confidence.

Forced rotation — systems that surface items on a schedule regardless of demand. Spaced repetition does this for flashcards. Random sampling does it for archives. The mechanism doesn’t care what’s popular; it cares what’s been neglected.

Connection density — items linked to many other items get encountered as a side effect of looking for something else. Highly connected notes are self-cataloging. Isolated notes are maximally dark. This suggests that the most valuable thing you can do for a note isn’t to file it well but to connect it to other notes.

Compression — reducing collection size so the remaining items are back in a browsable regime. This is the archival appraisal approach: if you can’t catalog everything, destroy enough that browsing works again. It’s brutal and effective.

The Invisible Transition

The reason this matters is that most people — and most systems — are in the middle of the transition from medium to large without knowing it. The tools that worked at medium scale still feel like they work. You can still find things when you look. But the ratio of findable to contained is dropping, and the contained-but-unfindable portion is growing, and nothing in your experience signals the change.

The library of Alexandria didn’t burn all at once. It dimmed.