An Analysis of Feed URLs
I’ve always been curious about an accepted pattern (if there is any) for feed URLs.
- What should it be named?
- By directory convention?
index.xml
/feed/index.php
- By format?
feed.xml
feed.atom
feed.rss
feed.json
feed.php
- By content type?
posts.rss
articles.xml
blog.atom
- By directory convention?
- Should it have an extension?
/rss/
/rss.xml
- Where should it live?
/index.xml
/path/to/index.xml
I’m sure the answer is: “it depends”. But is there a common pattern for this, the way HTML URLs tend towards extension-less paths with an index.html
file?
Does any of the above even matter? Given that you can make your feeds auto discoverable with HTML, maybe the precise feed URL(s) don’t matter much?
That’s possible, but I have a hard time believing it. URL design is one of the most important aspects of any web site—and I don’t think that’s exclusive to URLs for HTML resources.
Anyhow, these thoughts have been swirling in my head. Then one day I came across web-dev-feeds by simeviads, a collection of 1,000 feeds for web developers.
My first reaction was: “I gotta parse and analyze all those feeds! Surely that will surface common patterns for feed URLs!” So that’s what I did. Below are my findings.
Note: what follows likely isn’t 100% precise, but is meant as a rough analysis.
Resource Name
Name | Occurrence |
---|---|
feed | 512 |
rss | 154 |
atom | 63 |
index | 62 |
main | 20 |
This data represents the name of the feed; meaning, the named resource at the end of a path regardless of other names within the path, i.e.
/feed/
-> “feed”/path/to/feed
-> “feed”/rss/feed.xml
-> “feed”/blog/feed/
-> “feed”
Looks like “feed” is the favorite, outranking the other top four choices combined!
Resource Location: Root or Nested?
Location | Occurrence |
---|---|
Root /* | 675 |
Nested /**/* | 325 |
This one is interesting, because if you peer a little deeper, how resources are named is dependent on their location—it’s not just the resource’s name that’s important but the entire path of the resource, which undoubtedly influences it’s name. For example:
/feed.xml
is, presumably, the feed for the entire host/blog/feed.xml
is, presumably, the feed for the blog, but the host could have other resources, like/podcasts/feed.xml
Given this interdependent relationship between naming a resource and the location at which it lives, I’m not sure how much weight this data point could bring to bear on any particular conclusion—but I still find it interesting to see.
Resource Extensions
Extension | Occurrence |
---|---|
Any /feed.* | 538 |
None /feed/ | 462 |
As you can see, this one comes in pretty close. What I found most interesting was how these numbers broke down in their respective categories.
Extension (538)
Extension | Occurrence |
---|---|
*.xml | 459 |
*.atom | 49 |
*.rss | 20 |
*.php | 5 |
*.json | 4 |
By far, XML is the popular one here—JSON feed even appears on the radar which is kind of neat.
Here’s how the naming within the *.xml
extension broke down:
Name | Occurrence |
---|---|
feed.xml | 213 |
rss.xml | 102 |
atom.xml | 56 |
index.xml | 53 |
blog.xml | 9 |
feed.xml
is clearly the most popular. But what I find interesting here is that the XML file extension doesn’t disambiguate between an RSS feed or an Atom feed. Granted, if you peered into the file itself—or possibly the HTTP headers—you’d know. Or if the file is named after the format, i.e. rss.xml
vs. atom.xml
. But with a generic feed.xml
you can’t ascertain the format solely from the extension.
“What exactly is the difference between RSS and Atom,” you might ask? Honestly, I’m not knowledgeable enough to explain the difference. That’s a blog post for another day—read the original raison d'être for Atom as a starting point.
As for the second most popular *.atom
extension, here’s the breakdown:
Extension | Occurrence |
---|---|
main.atom | 19 |
gh-pages.atom | 8 |
releases.atom | 4 |
master.atom | 3 |
Notice anything in those names? They’re all feed URLs provided by Github projects. It’s pretty neat when you think about it—these feed URLs are great ways to stay up-to-date on changes in open source projects. Here are a few examples of where these names came from:
main.atom
- Recent commits to the w3c design principles projectgh-pages.atom
- Recent commits to the public facing website of the Web Incubator’s background-fetch API proposalreleases.atom
- Release notes from the Babel project
No Extension (462)
Name | Occurrence |
---|---|
/feed/ | 286 |
/rss/ | 51 |
/default/ | 19 |
/atom/ | 7 |
/blog/ | 6 |
These names all roughly follow the top occurring names for resources with an extension, with feed
, rss
, atom
, and blog
all in the mix. As noted earlier, /feed/
is far and away the favorite name for a resource.
Where did that /default/
name come from? Interestingly, every single occurrence of default
has an identical location: /feeds/posts/default/
. That makes me think these feeds are all being published by the same underlying technology. Maybe Wordpress? Nope. A cursory search indicates this pattern stems from Blogger RSS feeds.
Extra Info: Domains
Domain | Occurrence |
---|---|
feedburner.com | 62 |
github.com | 34 |
github.io | 23 |
mozilla.org | 11 |
w3.org | 11 |
While not specifically related to the topic of how common feed URLs are named, it was interesting to see what domains were common in this dataset. The most interesting thing here was that Feedburner is still alive and kicking in usage!
Conclusions
After sifting through this data and writing this post, my new posture towards naming a feed URL is probably this:
- Use the word
feed
as the resource name - Use an extension to hint at format you provide (
*.xml
,*.atom
,*.json
, etc.) - Use nouns in the resource’s location to hint at and disambiguate content types (where necessary) (
/blog/feed.xml
,/podcasts/feed.xml
, etc.) - Use the the
<link>
tag to make all your feeds auto discoverable
For example, if you’re serving only blog posts and that is qualified by your hostname, this seems appropriate:
blog.mysite.com/feed.{rss,atom,json}
Whereas if you are serving a variety of content types that can’t be inferred by your hostname, this seems appropriate:
mysite.com/{links,posts,notes,podcasts}/feed.{rss,atom,json}
Of course, this is all caveated by your site’s URL structure. Disregard my non-expert advice as necessary.
If you want to checkout how I parsed all these feeds and came up with these states, checkout the code.