An Analysis of Feed URLs

2021-08-30
#rss

I’ve always been curious about an accepted pattern (if there is any) for feed URLs.

What should it be named?
- By directory convention?
  - index.xml
  - /feed/index.php
- By format?
  - feed.xml
  - feed.atom
  - feed.rss
  - feed.json
  - feed.php
- By content type?
  - posts.rss
  - articles.xml
  - blog.atom
Should it have an extension?
- /rss/
- /rss.xml
Where should it live?
- /index.xml
- /path/to/index.xml

I’m sure the answer is: “it depends”. But is there a common pattern for this, the way HTML URLs tend towards extension-less paths with an index.html file?

Does any of the above even matter? Given that you can make your feeds auto discoverable with HTML, maybe the precise feed URL(s) don’t matter much?

That’s possible, but I have a hard time believing it. URL design is one of the most important aspects of any web site—and I don’t think that’s exclusive to URLs for HTML resources.

Anyhow, these thoughts have been swirling in my head. Then one day I came across web-dev-feeds by simeviads, a collection of 1,000 feeds for web developers.

My first reaction was: “I gotta parse and analyze all those feeds! Surely that will surface common patterns for feed URLs!” So that’s what I did. Below are my findings.

Note: what follows likely isn’t 100% precise, but is meant as a rough analysis.

Resource Name

Name	Occurrence
`feed`	512
`rss`	154
`atom`	63
`index`	62
`main`	20

This data represents the name of the feed; meaning, the named resource at the end of a path regardless of other names within the path, i.e.

/feed/ -> “feed”
/path/to/feed -> “feed”
/rss/feed.xml -> “feed”
/blog/feed/ -> “feed”

Looks like “feed” is the favorite, outranking the other top four choices combined!

Resource Location: Root or Nested?

Location	Occurrence
Root `/*`	675
Nested `/*/`	325

This one is interesting, because if you peer a little deeper, how resources are named is dependent on their location—it’s not just the resource’s name that’s important but the entire path of the resource, which undoubtedly influences it’s name. For example:

/feed.xml is, presumably, the feed for the entire host
/blog/feed.xml is, presumably, the feed for the blog, but the host could have other resources, like /podcasts/feed.xml

Given this interdependent relationship between naming a resource and the location at which it lives, I’m not sure how much weight this data point could bring to bear on any particular conclusion—but I still find it interesting to see.

Resource Extensions

Extension	Occurrence
Any `/feed.*`	538
None `/feed/`	462

As you can see, this one comes in pretty close. What I found most interesting was how these numbers broke down in their respective categories.

Extension (538)

Extension	Occurrence
`*.xml`	459
`*.atom`	49
`*.rss`	20
`*.php`	5
`*.json`	4

By far, XML is the popular one here—JSON feed even appears on the radar which is kind of neat.

Here’s how the naming within the *.xml extension broke down:

Name	Occurrence
`feed.xml`	213
`rss.xml`	102
`atom.xml`	56
`index.xml`	53
`blog.xml`	9

feed.xml is clearly the most popular. But what I find interesting here is that the XML file extension doesn’t disambiguate between an RSS feed or an Atom feed. Granted, if you peered into the file itself—or possibly the HTTP headers—you’d know. Or if the file is named after the format, i.e. rss.xml vs. atom.xml. But with a generic feed.xml you can’t ascertain the format solely from the extension.

“What exactly is the difference between RSS and Atom,” you might ask? Honestly, I’m not knowledgeable enough to explain the difference. That’s a blog post for another day—read the original raison d'être for Atom as a starting point.

As for the second most popular *.atom extension, here’s the breakdown:

Extension	Occurrence
`main.atom`	19
`gh-pages.atom`	8
`releases.atom`	4
`master.atom`	3

Notice anything in those names? They’re all feed URLs provided by Github projects. It’s pretty neat when you think about it—these feed URLs are great ways to stay up-to-date on changes in open source projects. Here are a few examples of where these names came from:

main.atom - Recent commits to the w3c design principles project
gh-pages.atom - Recent commits to the public facing website of the Web Incubator’s background-fetch API proposal
releases.atom - Release notes from the Babel project

No Extension (462)

Name	Occurrence
`/feed/`	286
`/rss/`	51
`/default/`	19
`/atom/`	7
`/blog/`	6

These names all roughly follow the top occurring names for resources with an extension, with feed, rss, atom, and blog all in the mix. As noted earlier, /feed/ is far and away the favorite name for a resource.

Where did that /default/ name come from? Interestingly, every single occurrence of default has an identical location: /feeds/posts/default/. That makes me think these feeds are all being published by the same underlying technology. Maybe Wordpress? Nope. A cursory search indicates this pattern stems from Blogger RSS feeds.

Extra Info: Domains

Domain	Occurrence
`feedburner.com`	62
`github.com`	34
`github.io`	23
`mozilla.org`	11
`w3.org`	11

While not specifically related to the topic of how common feed URLs are named, it was interesting to see what domains were common in this dataset. The most interesting thing here was that Feedburner is still alive and kicking in usage!

Conclusions

After sifting through this data and writing this post, my new posture towards naming a feed URL is probably this:

Use the word feed as the resource name
Use an extension to hint at format you provide (*.xml, *.atom, *.json, etc.)
Use nouns in the resource’s location to hint at and disambiguate content types (where necessary) (/blog/feed.xml, /podcasts/feed.xml, etc.)
Use the the <link> tag to make all your feeds auto discoverable

For example, if you’re serving only blog posts and that is qualified by your hostname, this seems appropriate:

blog.mysite.com/feed.{rss,atom,json}

Whereas if you are serving a variety of content types that can’t be inferred by your hostname, this seems appropriate:

mysite.com/{links,posts,notes,podcasts}/feed.{rss,atom,json}

Of course, this is all caveated by your site’s URL structure. Disregard my non-expert advice as necessary.

If you want to checkout how I parsed all these feeds and came up with these states, checkout the code.