Robots.txt
A few weeks ago, I saw a flurry of conversation about how you can now disallow OpenAI from indexing your personal website using robots.txt
:
User-agent: GPTBot
Disallow: /
That felt a bit “ex post facto“ as they say. Or, as Jeremy put it, “Now that the horse has bolted—and ransacked the web—you can shut the barn door.”
But folks seemed to be going ahead and doing it anyway and I thought to myself, “Yeah, I should probably do that too…” (especially given how “fucking rude” AI is in not citing its sources).
But I never got around to it.
Tangentially, Manuel asked: what if you updated your robots.txt
and blocked all bots? What would happen? Well, he did it and after a week he followed up. His conclusion?
the vast majority of automated tools out there just don't give a fuck about what you put in your robots.txt
That’s when I realized why I hadn’t yet added any rules to my robots.txt
: I have zero faith in it.
Perhaps that faith is not totally based in reality, but this is what I imagine a robots.txt
file doing for my website: