URLs are the Uniform Way to Locate Resources

Tue Mar 30 16:06:38 -0700 2010

When you hear the term URL, what do you think of? Probably a web address - e.g., a publicly accessible HTML page such as http://google.com/ or http://news.ycombinator.com/. But URLs have a much wider application.

URL stands for Uniform Resource Locator. Decoding this, a URL is a uniform (standard) way to locate (find) any resource (service) over a network (the internet or a LAN).

Any time you wish to locate a resource on the internet, use a URL.

Example: Git

If you use Git, then you’ve probably already encountered a non-HTTP URL: the Git protocol. For example, here’s the URL to the public Git repo for the Paperclip file attachment library:

git://github.com/thoughtbot/paperclip.git

A Git repo is not an HTML page, but it is a resource on a network, so using a URL makes perfect sense.

You could potentially encode this repo’s location in another way. For example, you could break it out into pieces and provide it in a JSON file:

{
  "protocol": "git",
  "host": "github.com",
  "username": "thoughtbot",
  "project": "paperclip"
}

Why don’t we use this format for locating Git resources? There are a few potential answers, such as the convenience of being able to easily cut-and-paste the location into a command line tool or a URL bar. But the best answer is that our ad-hoc JSON format is not uniform. The JSON above would work for locating Git resources on Github, but nowhere else. URLs are standard and uniform.

Example: Databases

Another great example is the location of a database. One approach is to have a long list of configuration values, probably copied into a file like config/database.yml by hand, one at a time. This format is probably specific to your ORM, e.g. not standard or uniform in any way. It’s the equivalent of the JSON address we used to specify a Git repo in the previous section.

Just like Git, the more elegant approach is to put everything needed to locate the database into a URL. This will typically look something like:

mysql://myuser:mypass@db8.myhost.com:3306/mydatabase

Ruby ORMs like Sequel and DataMapper use this very method. This makes configuring your database very simple:

Sequel.connect(the_database_url)

Beautiful.

Yet More Examples: RabbitMQ, Email, Memcache

What else can we use URLs for? Anything that needs to be located on a network, be it the internet or a local network. For example, how about your RabbitMQ message queue?

amqp://user:pass@hostname/vhost

Or your SMTP mail server?

smtp://user:pass@hostname/domain

Or your Memcache server?

memcache://hostname/prefix

On this last item, you might point out that a Memcache cluster often has multiple hosts. Typically, these are specified in an array of IP addresses passed to the client object constructor. While this works, it’s not uniform. A better solution here is to use an internal hostname (such as memcache.internal.yourhost.com) which returns multiple A records, one per server in your cluster. The returned IPs may well be 10. or 192. addresses, not publicly addressable. In addition to allowing your memcache config to conform to the URL specification, this also gives the benefit of managing your server IPs in a single place, DNS. The alternative is hardcoding IPs into every component of your system that uses your memcache servers.

What About Extra Config Options?

If the protocol for a given resource requires additional config options, you can pass them as query parameters:

sqlite://development.sqlite3?encoding=utf8

I would urge you to think carefully before using query params. 99% of cases should be representable within the base URL.

Summary

URLs are uniform. Use them to locate your resources.

a tornado of razorblades