Knowledge Laundering

2023-08-02

I was reading Baldur’s recent piece about the transition taking place in open source — which I took notes on — and this excerpt talking about large language models (LLMs) stood out to me:

Why give somebody credit for the lines of code you’ve adapted for your own project when you can get a language model to whitewash it and let you claim it as your own?

I’d never thought of this parallel before, but it made me think of LLMs as a form of “knowledge laundering”.

What do I mean by that?

In my mind, “knowledge laundering” is the process of disguising, concealing, or otherwise hiding the origins of information by converting it into a seemingly legitimate expression of original synthesization through use of a language model.

It’s a riff on the term “money laundering”:

Money laundering is the illegal process of making large amounts of money generated by criminal activity, such as drug trafficking or terrorist funding, appear to have come from a legitimate source. The money from the criminal activity is considered dirty, and the process “launders” it to make it look clean.

You launder money to obscure its origins, making it appear “clean” and thus acceptable for legitimate use.

In a similar vein, you can use an LLM to “launder” knowledge and obscure its origins, making it appear original and thus clean and acceptable for legitimate use.

It feels like this is precisely what LLMs like Bard or ChatGPT do because they are not held to any standard of citing their sources. You prompt the model and it takes the collective work of humans, “launders” that knowledge with a model, and information is output clean; that is, free of any trace to the original sources that produced it.

Money laundering allows individuals and groups to profit from work that is otherwise considered illegal. Knowledge laundering, at least in some contexts, is not much different. It might not circumvent laws, but it circumvents established tests of academic and scientific integrity (see: plagiarism).

This is not a universally applicable to all applications of LLMs. But if there’s a parallel here, further pondering is an exercise for the reader.