So this is a quick hack of a post I’ve been spending far too long failing to complete. Time to just get it out the door. If parts are nonsensical, please accept my apologies. You get what you pay for.
A fair bit has been written over the years about designing good URIs. Whilst traditional teaching on the subject must also apply to web applications to some extent, how far does it go? Does the nature of the documents being served (in this case ‘active’ documents as part of a larger application) hold sway over the URI of the page?
First Principals
I tend to be pretty fussy about what appears in the location bar of any sites or apps that I architect. Partly this is down to aesthetics and some idealist goal of elegance, but primarily it rests with the core values of sustainability, perception of stability and also ease of use. Let’s unpack that.
The subject of sustainability in URI design should be familiar to us all. At a base level, /contact is good, but /contact.asp is bad because when you transition your site to PHP next summer the name of that document is going to change. A good URI doesn’t refer to a web page with a document name. Unless the visitor is supposed to grab the file and take it away from the site, leave the file extension off.
Perceived Stability
Slightly more abstract than this is the concept of perceived stability, which I think is best illustrated with an example from last weekend. Dissatisfied with the tools available for discovering what podcasts are available, I was taking a look into writing my own scripts to parse the ipodder.org podcast directory and find stuff I might be interested in. The first job was to find the URI to the directory so that I could take a look at it. After some hunting around, I found this address:
http://www.ipodder.org/discuss/reader$4.opml
Well, ok it looks fairly compact, but I have a few issues with it. The first is that dollar sign. Are those even legal? Well, with the dollar being so weak it’s certainly not a good thing to be throwing into your URIs, that’s for sure. My second issue is the file name as a whole – whilst I’m not sweating the OPML extension as I know that to be XML, what’s with the reader business? And finally, discuss? That suggests that this was posted by a user and is not a permanent resource I should be building an application on. So with this bad taste in my mouth, I posted to a list and just asked if it was the right address. I was releaved to find that I had the wrong page. Phew! There I go getting hot under the collar for no reason. But wait until you see the real URI:
http://homepage.mac.com/dailysourcecode/DSC/ipodderDirectory.opml
Deeeep breath. So I have issues here too. The first is the dot mac account, which is obviously at the mercy of Apple and where they take their dot mac service in the future. The second issue is that the document I want (a directory of podcasts) is filled under the name of a specific podcast. It’s just all messed up. (And don’t even get me started on why the damn thing is in OPML format). See how the chosen URI can have a detrimental effect on the user’s perception of stability of that URI?
Ease of Use
So what would a better address for the ipodder.org directory be? Well, in the first instance, it should be on the ipodder.org domain. That’s where a user would expect to find the feed – it comes down to ease of use. Secondarily, the feed isn’t part of the mail content of ipodder.org, so I’d expect it be to tucked away in a directory distinct from the rest of the site’s content. How about this:
http://ipodder.org/xml/directory.opml
Short and too the point. Memorable, and most of all, easy.
Where was I?
Oh yes, so that’s how URI design works at a basic level. The challenge that I’m currently faced with is deciding if the principals of the design can or should be fundamentally different for a web application vs a regular site. I’ll tell you what’s prompted this thought – working with Rails. Rails uses a URI model that goes pretty much like this:
/controller/method/options
Well, I guess that’s pretty neat. A controller in use is often mapped to something like an object within your app – say, a user. So we have a controller for users. The address to edit user #1234 would be something like:
/users/edit/1234
That makes a lot of sense. What it’s doing is taking a object oriented look at the address structure rather than a traditional hierarchical view. The URIs reflect the logical structure of the application, not the hierarchical flow of the user interface. A subtle shift, and one that may have zero effect, depending on how your interface is designed.
On that note, I just checked some of mine. Here’s how I edit user #1234 in one of my recent apps:
/admin/users/edit/?id=1234
So that would be pretty much the same then. I’m going to have to think further about whether that means that my interface is well laid out, or whether it means that there’s little fundamental difference between app-logic designed URIs and UI-hierarchy designed URIs. I dunno. Discuss.




Comments
The ”/users/edit/” part is fine, but the “1234” part isn’t so good. Presumably that “1234” would map to the user’s name or username in the database. Well, why not have ”/users/edit/joe-bloggs” then? That way the over-worked, stressed out site admins who should have gone to lunch 10 minutes ago can easily see exactly where they’re going just by hovering the link. It makes things so much easier
A great example of crappy URI structure is SpreadFirefox: The URI for the latest post is “http://www.spreadfirefox.com/?q=node/view/11288”. Ok, it’s crappy because it uses a querystring, but that’s easy to change to “http://www.spreadfirefox.com/node/view/11288” with a drop of mod_rewrite. More importantly though, it tells me absolutely nothing about where I’m about to navigate to, making it harder for me to decide whether I really want to click that link or not.
By the way, URIs recently turned into IRIs, making it even more confusing.
I recently implemented URI slugs for Digital Proof. Whilst I was hoping to be lazy and steal the code from WordPress, as far as I could see, it didn’t support properly unique slugs. That’s fine, assuming you’re using the /archives/year/month/day/slug format, and aren’t going to make two posts with the same name on the same day, but as we weren’t using dates in our URIs I wanted something that was truly unique. So when a post is made, the system runs through and checks if that slug is already in the database. If so, it simply appends -X, where X is the duplicate number. That works fine, except it seems to be slightly bugged and applying dupe numbers where they’re not needed ; ). Must sort that some time.
Anyway, the real point is making the user’s life easier. If creating unique URI slugs is the way to do so then that’s most certainly what should be done.
/users/edit/1234_joe-bloggs
you probably have noticed that most search-engine-friendly news websites already do this.
/section/article_id/whatever-you-like-it-makes-no-odds
One thing that interests me more with URIs is how much information should we put into them? Obviously the more that’s there, the better the idea you can get about where you’re heading to. But when does the length become so great that the new pieces of information make it harder to see what you’re going to?
The other big problem with URIs, is what if you want two different classification schemes. Do you have the same article up at completely different URIs so you can use both? What then about visited links, and which do you choose as the main link? The best example of this I can think of is a news site. Should urls be organised by date, or by category posted in? If I’m looking back through visited URIs in my address bar dropdown, I might want to look for all articles on a certain date, which placing them under category first will ruin. But equally, what if I can only remember what category it was in? If it’s organised date first, then I can’t check through by category easily at all.
And of course a url like
newssite.com/2005/02/04/news/abbreviated_story_title
is really getting a bit into the category of too much cruft there making it harder to look at what’s going on. And that’s without any CMS generated junk.
http://www.w3.org/Provider/Style/URI
The part about removing file extentions in the URI is interesting, which I’ve done for some file types (scripts) but not all as it points out.
I had not heard of ISIs, thanks Anne. I found this at the W3C concerning ISIs :
http://www.w3.org/International/O-URL-and-ident.html
Basically, IRIs are an international version of URIs. IRIs allow the use of the Universal Character Set while URIs only allow the use of the US-ASCII character set.
Otherwise, I always do this on my personal websites.
.htaccess is my best friend. They’re starting to look almost as big as my style sheets… (almost)
Have you got in touch with Adam Curry to suggest a change to the location of the opml file? He’s a pretty amiable guy – I reckon he’d move it if you explained it calmly and clearly ;)
The URI is the interface to your website, and is important. Granted, you’ve got search engines and links to help people get where they want, but thinking about the URL interface of your site is probably a very good start to thinking about its information architecture, and all aspects of the user experience – aside from letting people play with the URI to find what they want.
And as this post emphasised, if you want other people to make use of your site programmatically, a good URI interface can speed the process up no end.
I’ve got issues with the controller/method/options format (I’ve got even bigger issues with the messy query-string-laden urls that we’re so often dumped with).
My big issue is that the web is not an application in the traditional sense. The pseudo-OO address style is a lot like the RPC style, where URIs point to functions not to resources.
HTTP tells us that the web is a collection of resources, which we can manipulate (by requesting representations of those resources, or modifiying or even deleting them…) using HTTP verbs like GET, POST, PUT and DELETE. If I’ve got a web app then I’ve got some kind of information space, so it makes sense to view the URI map as a map of resources and various kinds of representations of them.
So, in the user admin example you could have this:
/users/1234/edit
to get an editable HTML representation of user 1234.
This kind of URI pattern works on a restriction basis, from class of thing, to thing, to special case of thing, so:
/users/1234 would give us a plain old representation of the user,
and /users might give us a list of all the users. Equally, when there are several things I can do with a user then a resource-focussed URI structure keeps things that bit more intelligible:
/users/1234/favourites might return a list of that user’s favourite things. We could POST to it to add new things, and GET it to see HTML. We could even use content negotiation to GET different representations of the same resource by asking for HTML or XML in the request headers.
This is all straight-out-of-Fielding stuff. I’d advise digging out his thesis, and even reading the HTTP spec (it’s actually very readable).
On the WebLinc site, we’ve dynamically built out aliases for every ‘page’ on the site.
Also how do you all feel about the uri having or not having a trailing slash?
If anyone can help give me some direction I would really appreciate it.
and no, I’m not looking for a solution where I have to manually create directories (nor do I want a script that just creates them on the fly…I know how to do that).
the solution need not be free…but I’m just hoping for something that could possibly be cheaper than what the main stream module costs.
Thanks to anyone who helps out.
I’ve lost track of how many times I’ve forwarded that Cool URIs Don’t Change article in the interests of enlightening colleagues, some who would otherwise devise some of the most convoluted linkage I have ever laid eyes on.
(... and not necessarily on purpose either – that’s the kicker. Most didn’t realize what kind of rabbit hole they were diving into at the time!)
Reading all this inspired a (somewhat related) change to my site early this morning: enforcing a fully qualified domain name for all HTTP requests.
http://www.smalig.com/url_rewrite/