How it works: i.sfu.caget help

All about i.sfu.ca

We've had a number of people ask about our URL shortener: both individuals and institutions have asked questions such as: why we built out own URL shortener instead of using existing services, how the URL shortener works, and what we have learned since putting the service online.  This page will attempt to address those questions. If after reading this information you still have questions, please feel free to ask us more via email at webmaster@sfu.ca

Why build our own URL shortener?

One of the first questions we were asked was: "why build your own URL shortener instead of using one of the existing services?" The answer is largely "branding." 

An instructor commented that when using various social media platforms, particularly Twitter with its 140-character limit, URL shortening was often required to include references to their work, research, events, etc. in those media. That often meant that in the posting the connection to SFU was not always obvious, but the instructor did want to maintain the connection to SFU semantically via the URL related to the post. That instructor commented that it would be handy to have an SFU-branded URL shortener to our team in SFU IT Services.

We asked a number of others in positions across campus whether they thought such a service would be valuable, and the answer was consistently yes. The primary concern that was raised was the potential of hiding malware or other content that SFU would not wish to be associated with behind a URL that implied a relationship to SFU. More on this later.

It took no more than a few hours to have a working prototype put together, and once we determined that the basic system was functional we set about determining appropriate hostnames and asking for those hostnames to be assigned to one of our web servers. The total time to the first 'beta' release was about 3 days, most of that time was simply time required for resources to be provisioned.

Initially we left the system open, allowing anybody to use the API underlying the shortener to create URLs. Within a month we found that a 3rd-party Firefox URL-shortening plugin had included support for the "i.sfu.ca" service in its repetoire of supported services. We were somewhat surprised because we had done no active promotion of the service, only telling a few people it was available to them initially.

Costs relative to existing open-source or purchased solutions?

Though this wasn't a primary consideration when we developed the solution, the total cost of the service is incredibly low: between all the people involved (two developers, network provisioning, etc.), the development of the system took no more than a single FTE workday to put online, and very little has been done to tweak the system since then. We spent little time looking at other alternatives in the open-source space since an initial search found very little compelling software available, and the actual mechanism required was so simple that building the service required very little coding. It probably would have taken us more time and effort to install and configure a package with which we were not familiar than it did to put such a simple service in place.

How does it work?

If you're interested in what happens 'behind the scenes,' in the URL shortener at a somewhat technical level, this section is for you.

If a site visitor is not following a link to an existing shortened URL, they will be presented with a login page if they are not already logged into our Single Sign On system (  Central Authentication System  or "CAS" )

How is authentication triggered

The web server is configured to look for certain URL patterns that are special directories in the requested URL, if there is no match, the path is assumed to be a short URL that needs conversion.

If the user asks for a short URL that does not exist, or asks for the root of the host ( eg. http://i.sfu.ca/ with not path information), they are redirected to the /create subdirectory. /create and a handful of other directories are protected by apache's authentication mechanism. mod-auth-cas, an apache httpd module for JASIG CAS is used on our i.sfu.ca server as the authentication mechanism.

CAS, on successful authentication of the user, sets the REMOTE_USER server variable to the SFU Computing ID of the person who logged in, The user is now known to our scripts.

With no existing link to follow, the URL shortener presents the short URL creation page. The site user enters the long URL that they'd like shortened on the page's only field. 

In this case, the person has entered the URL:

http://www.sfu.ca/srs/ehs/research-safety/hazardous-material-management/spill-response/biohazard-spills.html

That original URL is 108 characters. Using this original URL in a medium such as Twitter would leave little room for the message accompanying the URL, only about 30 characters.

Hitting the button labled 'Shorten It!' will request a short URL.

What happens when you hit "Shorten It!"?

The "Shorten It!" button is not a form submission, but a Javascript call to a perl-based script in our API folder. The Javascript sends the URL to the API and expects a JSON response with the original URL requested, the shortened URL, and a date field representing the first time that the long version of that URL was shortened. If a URL had already been shortened, the existing short URL is returned.

I mean: "what really happens, in code?"

The actual code is incredibly simple: the basis is an associative array (which might be called various things in different use cases or languages: a "hash", key-value store, etc.)

The code looks for the URL in a list of existing long URLs: if a shortened URL already exists, it returns that short URL. If not, it finds a unique, 6 character, pseudo-random, mixed-case string that doesn't already exist as a short URL and creates a two-way lookup between that string and the long URL.

That string becomes the short form URL path that is evaluated when appended after the http://i.sfu.ca, http://at.sfu.ca, or http://get.sfu.ca hostname: all three hosts use the same short URLs, so you can use the short URL with any of the three hostnames you like.

In either case: if the URL has already been shortened, or a new short URL has been created, the short version of the URL is sent back to the browser as JSON data along with the date it was created and the number of clicks it's received (if applicable).

Javascript populates the shortened URL field and highlights it so that you can copy out the shortened URL without having to click around. 

The shortened URL created in this example is:

http://i.sfu.ca/jQytsQ

The unique portion of the URL is the part after the last slash: jQytsQ . That portion can be used with the "i.sfu.ca", "get.sfu.ca", or "at.sfu.ca" hostnames and still reach the same destination:  eg.

http://get.sfu.ca/jQytsQ

URL Resolution

Let's see what happens when you follow a link to a shortened URL on the client side. It's very simple. It's also very fast and difficult to really "see" the process in a browser, so I'll demonstrate using a command-line tool called "cURL" which can show the exact content that the server will send to your browser in text mode.

Using the following command: curl -IL http://i.sfu.ca/jQytsQ requests the web server to send a response to you in the same way a browser would request a page. The "-IL" limits what we see to only the "header" information and makes sure that we follow all redirects (redirects being important since at this point, all the URL shortener is doing is redirecting your browser to the long URL associated with the short URL you follow to the site).

$> curl -IL http://i.sfu.ca/jQytsQ

HTTP/1.1 302 Found
Date: Wed, 13 Jun 2012 18:40:06 GMT
Server: Apache/2.2.3 (Red Hat)
Location: http://www.sfu.ca/srs/ehs/research-safety/hazardous-material-management/spill-response/biohazard-spills.html
Connection: close
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 200 OK
Date: Wed, 13 Jun 2012 18:40:07 GMT
Server: Apache
Set-Cookie: renderid=rend02; path=/;
Content-Type: text/html;charset=UTF-8

That first block of text demonstrates the one thing that the URL shortener sends back to the browser: an HTTP response with the header named "Location" set for the long URL associated with the short URL you've just followed.  Your browser sees this location header and follows that link to the original long URL.

In order, the URL shortening server actually performs these steps

  1. the web server receives a request for "/jQytsQ"
  2. the web server checks that request against the list of special directories. If there is no match, this is probably a shortened URL
  3. the request is sent to a script that will look up that short URL, first in a list of disabled short URLs, then in the list of long URL equivalents if the short URL has not be disabled.
  4. If the URL has been disabled a page describing the problem with the link is displayed, otherwise the "Location:" header is returned to the visiting browser.
  5. If the short URL is not disabled or has never been created, the site visitor is sent to the short URL creation page.

Lessons Learned

The SFU URL shortener has been running since June 2010. On average about 500 URLs have been created per month, and about 12,000 URLs have been created in total as of June 2012. Since August 2010, about 730,000 click-throughs from shortened links have been followed.  The system is primarily used by staff in public affairs, events and related media-facing areas of the university, though there is a low level of use by faculty, instructors and students. The system is not widely advertised, so awareness of it has grown by word-of-mouth, most often by people noticing a shortened URL being used.

Authentication

Initially we launched http://i.sfu.ca without any authentication required. Our intent was to allow short URLs to be created by SFU Alumni as well as current students, staff and faculty, showing their relationship with Simon Fraser University through those links. We have a general policy of not actively policing content on SFU websites in general, responding to complaints about inappropriate use through the proper channels rather than making decisions within IT about what content is "okay."

We built in a mechanism to block reported malicious URLs and have used that mechanism to disable malicious or misleading URLs that were reported to our abuse@sfu.ca or webmaster@sfu.ca service points. Early on, there were very few abuses, but we saw increasing use of our URL shortener to mask target URLs within spam, phishing or other malicious activities in various online media. As this activity increased we decided that it would be best to place the URL shortener behind authentication, allowing only people with active SFU Computing Accounts the ability to create short URLs with the SFU branding. 

Shortly after applying the authentication, reports of abuse dropped off to zero. Required authentication remains in place today.

Statistics

We built very limited statistic gathering information into our service. A URL creator can visit http://i.sfu.ca/myurl to see a list of their created short URLs, full-length equivalents, and the number of clickthroughs the short URL received. We have maintained a log of all clickthrough activity so that we could produce more details statistics if required, but so far there has been no stated demand for anything more than the basic number of clickthroughs.

External Links

Our short URL redirector deals with all target URLs in the same way, regardless of whether they are internally or externally (outside of *.sfu.ca) directed. Some services and institutions apply a standard mechanism where warnings for externally directed links are provided via either a pop-up dialogue box or an intermediate page warning that the site visitor is about to be directed to a site beyond the control of the domain's webmaster. As we do not have that standard on any other web presence on campus, no such intermediary page or warning is provided with the SFU URL shortener. Since requiring authentication, we've had no further complaints about the target of SFU-created short URLs, so this has not been an issue of discussion.