Friday, 14 December 2012

Building Your Own URL Shortener Free

Most of us are familiar with seeing URLs like bit.ly or t.co on our Twitter or Facebook feeds. These are examples of shortened URLs, which are a short alias or pointer to a longer page link. For example, I can send you the shortened URL http://bit.ly/U1KXUN that will forward you to a very long Google URL with search results on how to iron a shirt. It would be much easier to text the 20-character bit.ly URL to your son who is in college and preparing for his first big job interview.

In this article you’ll learn how to create a fully functional URL shortener for your website that will work whether you use a front controller/framework or not. If you use a front controller, I’ll discuss you how to easily integrate this URL shortener without having to dig into the controller’s programming.

Answering Some Common Questions


So with bit.ly and many other URL shorteners like it out there and freely available, why should we bother building our own? Most of these shortening services even have an easy-to-use API so that we can programmatically generate a shortened URL, and use it within our PHP scripts.

The best reasons are for convenience, aesthetics and brand recognition. If for example your website has an application that creates a large amount of reports, a very active blog or a large photo album, there will be a lot of links. A URL shortener will allow you to programmatically create a clean, simple link that can be emailed to your readers or published on your website. The obvious advantage to having your own is that your readers have instant brand recognition with your website.

You may wonder why you always see letters mixed with numbers in shortened URL’s. By having more than ten options (0-9) per digit, we are able to have dramatically more combinations while keeping the code as short as possible.

The characters we’ll be using are the digits 1-9 along with various upper/lowercase letters. I have removed all of the vowels to prevent having links created which are unintended bad words, and I have removed any characters that could be confused with each other. This gives us a list of about 50 characters available for each digit, which means that with two characters, we have 2,500 possible combinations, 125,000 possibilities with three characters, and a whopping 6.5 million combinations with just four characters!

Planning the Database


Let’s set up the short_urls table. It’s a simple table and the create statement is found below:

CREATE TABLE IF NOT EXISTS short_urls (

id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,

long_url VARCHAR(255) NOT NULL,

short_code VARBINARY(6) NOT NULL,

date_created INTEGER UNSIGNED NOT NULL,

counter INTEGER UNSIGNED NOT NULL DEFAULT '0',

PRIMARY KEY (id),

KEY short_code (short_code)

)

ENGINE=InnoDB;


We have our standard auto-incrementing primary key and fields for the full URL, the shortened code for the URL (indexed for faster retrieval), a timestamp when the row was created, and the number of times the short URL has been accessed.

Note that the long_url field has a maximum length of 255 characters, which should be sufficient for most applications. If you need to store longer URLs then you’ll need to change its definition to TEXT.

Now on to the PHP!

Creating a URL Short Code


The code to create and decode short URL codes will be in a class named ShortUrl. First, let’s look at the code responsible for creating the short codes:

<?php

class ShortUrl

{

protected static $chars = "123456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ";

protected static $table = "short_urls";

protected static $checkUrlExists = true;

protected $pdo;

protected $timestamp;

public function __construct(PDO $pdo) {

$this->pdo = $pdo;

$this->timestamp = $_SERVER["REQUEST_TIME"];

}

public function urlToShortCode($url) {

if (empty($url)) {

throw new \Exception("No URL was supplied.");

}

if ($this->validateUrlFormat($url) == false) {

throw new \Exception(

"URL does not have a valid format.");

}

if (self::$checkUrlExists) {

if (!$this->verifyUrlExists($url)) {

throw new \Exception(

"URL does not appear to exist.");

}

}

$shortCode = $this->urlExistsInDb($url);

if ($shortCode == false) {

$shortCode = $this->createShortCode($url);

}

return $shortCode;

}

protected function validateUrlFormat($url) {

return filter_var($url, FILTER_VALIDATE_URL,

FILTER_FLAG_HOST_REQUIRED);

}

protected function verifyUrlExists($url) {

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_NOBODY, true);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_exec($ch);

$response = curl_getinfo($ch, CURLINFO_HTTP_CODE);

curl_close($ch);

return (!empty($response) && $response != 404);

}

protected function urlExistsInDb($url) {

$query = "SELECT short_code FROM " . self::$table .

" WHERE long_url = :long_url LIMIT 1";

$stmt = $this->pdo->prepare($query);

$params = array(

"long_url" => $url

);

$stmt->execute($params);

$result = $stmt->fetch();

return (empty($result)) ? false : $result["short_code"];

}

protected function createShortCode($url) {

$id = $this->insertUrlInDb($url);

$shortCode = $this->convertIntToShortCode($id);

$this->insertShortCodeInDb($id, $shortCode);

return $shortCode;

}

protected function insertUrlInDb($url) {

$query = "INSERT INTO " . self::$table .

" (long_url, date_created) " .

" VALUES (:long_url, :timestamp)";

$stmnt = $this->pdo->prepare($query);

$params = array(

"long_url" => $url,

"timestamp" => $this->timestamp

);

$stmnt->execute($params);

return $this->pdo->lastInsertId();

}

protected function convertIntToShortCode($id) {

$id = intval($id);

if ($id < 1) {

throw new \Exception(

"The ID is not a valid integer");

}

$length = strlen(self::$chars);

// make sure length of available characters is at

// least a reasonable minimum - there should be at

// least 10 characters

if ($length < 10) {

throw new \Exception("Length of chars is too small");

}

$code = "";

while ($id > $length - 1) {

// determine the value of the next higher character

// in the short code should be and prepend

$code = self::$chars[fmod($id, $length)] .

$code;

// reset $id to remaining value to be converted

$id = floor($id / $length);

}

// remaining value of $id is less than the length of

// self::$chars

$code = self::$chars[$id] . $code;

return $code;

}

protected function insertShortCodeInDb($id, $code) {

if ($id == null || $code == null) {

throw new \Exception("Input parameter(s) invalid.");

}

$query = "UPDATE " . self::$table .

" SET short_code = :short_code WHERE id = :id";

$stmnt = $this->pdo->prepare($query);

$params = array(

"short_code" => $code,

"id" => $id

);

$stmnt->execute($params);

if ($stmnt->rowCount() < 1) {

throw new \Exception(

"Row was not updated with short code.");

}

return true;

}

...


When we instantiate our ShortUrl class, we’ll pass it our PDO object instance. The constructor stores this reference and sets the $timestamp member.

We call the urlToShortCode() method passing it the long URL that we wish to shorten. The method wraps up everything needed to create the short URL code, which we will appended to our domain name.

urlToShortCode()
calls validateUrlFormat() which simply uses a PHP filter to make sure that the URL is properly formatted. Then, if the static variable $checkUrlExists is true, verifyUrlExists() will be called which uses cURL to contact the URL and make sure that it doesn’t return a 404 (Not Found) error. You could alternatively check for a 200 (OK) status, but this could cause issues if the page were to unexpectedly return a 301 (Moved) or 401 (Unauthorized) response code.

It doesn’t make sense to have duplicate entries, so the code checks for that with urlExistsInDb() which queries the database for the long URL. If it finds the URL, it will return the corresponding short code, otherwise it returns false so we know we need to create it. Note that http://www.example.com and http://example.com are different URLs, so if you want to prevent this kind of duplication then you will have to add some regular expressions.

createShortCode()
delegates the following tasks to specific methods:
insertUrlInDb()
to insert the long URL into the database and return the new row’s ID.
convertIntToShortCode()
to convert the new row’s ID to our base-50 number scheme.
insertShortCodeInDb()
to update the row with the newly created short code.

When we want to create a short URL, all we have to do is instantiate the class, passing a PDO instance to the constructor, call the urlToShortCode() method with the long URL we wish to shorten, and append the returned short code to the domain and pass it back to the controller that requested it.

<?php

include "../include/config.php";

include "../include/ShortUrl.php";

try {

$pdo = new PDO(DB_PDODRIVER . ":host=" . DB_HOST .

";dbname=" . DB_DATABASE,

DB_USERNAME, DB_PASSWORD);

}

catch (\PDOException $e) {

trigger_error("Error: Failed to establish connection to database.");

exit;

}

$shortUrl = new ShortUrl($pdo);

try {

$code = $shortUrl->urlToShortCode($_POST["url"]);

printf('<p><strong>Short URL:</strong> <a href="%s">%1$s</a></p>',

SHORTURL_PREFIX . $code);

exit;

}

catch (\Exception $e) {

// log exception and then redirect to error page.

header("Location: /error");

exit;

}


Decoding a Short Code


The code to decode a short code and obtain the long URL is part of the ShortUrl class too. We call the shortCodeToUrl() method and pass it the short code we have extracted from the URI. shortCodeToUrl() also accepts an optional parameter, $increment, which defaults to true. It then delegates the following:
validateShortCodeFormat()
makes sure that the provided short code only contains letters and numbers.
getUrlFromDb()
queries the database for the supplied short code and returns the record’s id, long_url, and counter fields.
If the $increment parameter is true, incrementCounter() is called to increment the row’s counter field.

Here’s the rest of the class:

...

public function shortCodeToUrl($code, $increment = true) {

if (empty($code)) {

throw new \Exception("No short code was supplied.");

}

if ($this->validateShortCode($code) == false) {

throw new \Exception(

"Short code does not have a valid format.");

}

$urlRow = $this->getUrlFromDb($code);

if (empty($urlRow)) {

throw new \Exception(

"Short code does not appear to exist.");

}

if ($increment == true) {

$this->incrementCounter($urlRow["id"]);

}

return $urlRow["long_url"];

}

protected function validateShortCode($code) {

return preg_match("|[" . self::$chars . "]+|", $code);

}

protected function getUrlFromDb($code) {

$query = "SELECT id, long_url FROM " . self::$table .

" WHERE short_code = :short_code LIMIT 1";

$stmt = $this->pdo->prepare($query);

$params=array(

"short_code" => $code

);

$stmt->execute($params);

$result = $stmt->fetch();

return (empty($result)) ? false : $result;

}

protected function incrementCounter($id) {

$query = "UPDATE " . self::$table .

" SET counter = counter + 1 WHERE id = :id";

$stmt = $this->pdo->prepare($query);

$params = array(

"id" => $id

);

$stmt->execute($params);

}

}


Bringing It All Together


Building/altering a front controller or tailoring this package to an existing framework are outside the scope of this article, and so I’ve opted to contain our decoding logic in a file named r.php (r standing for redirect). We can write our shortened URLs as http://example.com/r/X4c where r.php (or r/index.php depending on your design) will be the controller. This format will be easy to integrate into just about any framework without touching the existing front controller.

On a related note, if you would like to learn how to build your own front controllers, check out the excellent series An Introduction to the Front Controller Pattern.

One advantage of this design is that, if you wanted to, you can have a separate controller for different parts of your site using different tables to keep the short codes organized and as short as possible. http://example.com/b/ could be for blog posts, and http://example.com/i/ could be for images.

"But what if I don’t use a front controller or framework?" you ask, "Did I just read this whole article for nothing?" Although it’s not as pretty, you can use the format http://example.com/r?c=X4c where r/index.php contains the decoding script.

Here’s what r.php looks like:

<?php

include "../include/config.php";

include "../include/ShortUrl.php";

// How are you getting your short code?

// from framework or front controller using a URL format like

//
 
http://.example.com/r/X4c
 
// $code = $uri_data[1];

// from the query string using a URL format like

//
 
http://example.com/r?c=X4c where this file is index.php in the
 
// directory http_root/r/index.php

$code = $_GET["c"];

try {

$pdo = new PDO(DB_PDODRIVER . ":host=" . DB_HOST .

";dbname=" . DB_DATABASE,

DB_USERNAME, DB_PASSWORD);

}

catch (\PDOException $e) {

trigger_error("Error: Failed to establish connection to database.");

exit;

}

$shortUrl = new ShortUrl($pdo);

try {

$url = $shortUrl->shortCodeToUrl($code);

header("Location: " . $url);

exit;

}

catch (\Exception $e) {

// log exception and then redirect to error page.

header("Location: /error");

exit;

}


Depending on how you are getting the short code, the variable $code is set along with your other configuration settings. We establish our PDO connection, instantiate an instance of ShortUrl, and call shortCodeToUrl() passing it the short code and leaving the counter setting the default value. If the short code is valid, you’ll have a long URL which you can redirect the user to.

In Closing


So there you have it, your very own URL shortener that is incredibly easy to add to your existing site. Of course, there are plenty of ways that this package could be improved, such as:
Abstract your database interaction to remove redundant code.
Add a way to cache shortened URL requests.
Add some analytics to the requested short URLs beyond the counter field.
Add a way to filter out malicious pages.

I’d would like to take this opportunity to thank Timothy Boronczyk for his patient advice throughout my writing process. It was an honor to write this article for PHPmaster and to work with him.

Feel free to fork this on the PHPMaster Github page and share your contributions and improvements.

Thanks for reading and happy PHPing!

Image via Fotolia

0 comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...