Notes on Metadata

2021-12-06
#data, #generalNotes

A couple notes from “Metacrap: Putting the torch to seven straw-men of the meta-utopia”.

I think it is a well-articulated critique of metadata and the idea that we can pigeon-hole all the information in the world into a set of machine-readable data.

A world of exhaustive, reliable metadata would be a utopia. It's also a pipe-dream.

The author isn’t saying “don’t leverage metadata”. Leverage it, but acknowledge its limitations.

Gather insights from metadata, then synthesize those insights with additional insights from different methods of analysis.

In broader terms: data can be an effective tool at formulating assumptions but not exclusively conclusions. Any conclusions should be enhanced, confirmed, or otherwise informed by additional methods of inquiry.

The following are a couple points I liked.

Metadata Exists in a Competitive World

When poisoning the well confers benefits to the poisoners, the meta-waters get awfully toxic in short order.

People are incentivized to not follow any schema agreed upon for the benefits of computers.

Schemas Aren’t Neutral

The way you structure your data has biases. Don’t pretend it doesn’t. Who knows exactly what the downstream effects of this are.

[schemas presume] that there is a "correct" way of categorizing ideas...Nothing could be farther from the truth. Any hierarchy of ideas necessarily implies the importance of some axes over others.

When it comes down to it, this mentality that we can express all forms of information into a sensible set of data is seems quite haughty. There are, and will always be, a variety of ways to describe things. Things in the world are multifaceted and any way of expressing them via a schema is inherently biased.

Reasonable people can disagree forever on how to describe something. Arguably, your Self is the collection of associations and descriptors you ascribe to ideas. Requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.

Implicit Metadata is Often More Useful than Human-Structured Metadata

A good example of how human-structured metadata falls apart is in the case of Google:

Google exploits metadata about the structure of the World Wide Web: by examining the number of links pointing at a page (and the number of links pointing at each linker), Google can derive statistics about the number of Web-authors who believe that that page is important enough to link to, and hence make extremely reliable guesses about how reputable the information on that page is.
This sort of observational metadata is far more reliable than the stuff that human beings create for the purposes of having their documents found. It cuts through the marketing bullshit, the self-delusion, and the vocabulary collisions.

That said, even human-structured metadata can be gamed because (as was noted earlier) metadata exists in a competitive world.