An Analysis of Feed URLs
I’ve always been curious about an accepted pattern (if there is any) for feed URLs.
- What should it be named?
- By directory convention?
- By format?
- By content type?
- By directory convention?
- Should it have an extension?
- Where should it live?
I’m sure the answer is: “it depends”. But is there a common pattern for this, the way HTML URLs tend towards extension-less paths with an
Does any of the above even matter? Given that you can make your feeds auto discoverable with HTML, maybe the precise feed URL(s) don’t matter much?
That’s possible, but I have a hard time believing it. URL design is one of the most important aspects of any web site—and I don’t think that’s exclusive to URLs for HTML resources.
Anyhow, these thoughts have been swirling in my head. Then one day I came across web-dev-feeds by simeviads, a collection of 1,000 feeds for web developers.
My first reaction was: “I gotta parse and analyze all those feeds! Surely that will surface common patterns for feed URLs!” So that’s what I did. Below are my findings.
Note: what follows likely isn’t 100% precise, but is meant as a rough analysis.
This data represents the name of the feed; meaning, the named resource at the end of a path regardless of other names within the path, i.e.
Looks like “feed” is the favorite, outranking the other top four choices combined!
Resource Location: Root or Nested?
This one is interesting, because if you peer a little deeper, how resources are named is dependent on their location—it’s not just the resource’s name that’s important but the entire path of the resource, which undoubtedly influences it’s name. For example:
/feed.xmlis, presumably, the feed for the entire host
/blog/feed.xmlis, presumably, the feed for the blog, but the host could have other resources, like
Given this interdependent relationship between naming a resource and the location at which it lives, I’m not sure how much weight this data point could bring to bear on any particular conclusion—but I still find it interesting to see.
As you can see, this one comes in pretty close. What I found most interesting was how these numbers broke down in their respective categories.
By far, XML is the popular one here—JSON feed even appears on the radar which is kind of neat.
Here’s how the naming within the
*.xml extension broke down:
feed.xml is clearly the most popular. But what I find interesting here is that the XML file extension doesn’t disambiguate between an RSS feed or an Atom feed. Granted, if you peered into the file itself—or possibly the HTTP headers—you’d know. Or if the file is named after the format, i.e.
atom.xml. But with a generic
feed.xml you can’t ascertain the format solely from the extension.
“What exactly is the difference between RSS and Atom,” you might ask? Honestly, I’m not knowledgeable enough to explain the difference. That’s a blog post for another day—read the original raison d'être for Atom as a starting point.
As for the second most popular
*.atom extension, here’s the breakdown:
Notice anything in those names? They’re all feed URLs provided by Github projects. It’s pretty neat when you think about it—these feed URLs are great ways to stay up-to-date on changes in open source projects. Here are a few examples of where these names came from:
main.atom- Recent commits to the w3c design principles project
gh-pages.atom- Recent commits to the public facing website of the Web Incubator’s background-fetch API proposal
releases.atom- Release notes from the Babel project
No Extension (462)
These names all roughly follow the top occurring names for resources with an extension, with
blog all in the mix. As noted earlier,
/feed/ is far and away the favorite name for a resource.
Where did that
/default/ name come from? Interestingly, every single occurrence of
default has an identical location:
/feeds/posts/default/. That makes me think these feeds are all being published by the same underlying technology. Maybe Wordpress? Nope. A cursory search indicates this pattern stems from Blogger RSS feeds.
Extra Info: Domains
While not specifically related to the topic of how common feed URLs are named, it was interesting to see what domains were common in this dataset. The most interesting thing here was that Feedburner is still alive and kicking in usage!
After sifting through this data and writing this post, my new posture towards naming a feed URL is probably this:
- Use the word
feedas the resource name
- Use an extension to hint at format you provide (
- Use nouns in the resource’s location to hint at and disambiguate content types (where necessary) (
- Use the the
<link>tag to make all your feeds auto discoverable
For example, if you’re serving only blog posts and that is qualified by your hostname, this seems appropriate:
Whereas if you are serving a variety of content types that can’t be inferred by your hostname, this seems appropriate:
Of course, this is all caveated by your site’s URL structure. Disregard my non-expert advice as necessary.
If you want to checkout how I parsed all these feeds and came up with these states, checkout the code.