Ruby URI: A Module to Handle Uniform Resource Identifiers

ruby uriThe Ruby programming language was created in 1993 by Yukhiro Matsumoto of Japan. This language has features resembling that of Perl, Python and Smalltalk.  Ruby incorporates a wide range of powerful functions which can be used in Ruby programs. It provides convenient connectivity to databases such as Oracle, Sybase, DB2 and MySQL. Also there exists support  for several GUI tools such as OpenGL, GTK and Tcl/Tck.  This useful programming language can be installed in both POSIX and Windows environments. Its syntax is easy to learn and similar to that of many programming languages such as C++.  Ruby is a server side scripting language used to create Common Gateway Interface(CGI) scripts. The language is very scalable and Ruby programs are easy and convenient to maintain. Ruby is a true object oriented programming language used for developing Internet and intranet applications. It is also an interpreted language which is embedded into Hypertext Markup Language(HTML). The programming language is open source but subject to a license.

Today we walk you through this intermediate level tutorial on Ruby URI(Uniform Resource Identifier). We assume that you know the basics of the Ruby programming language else you can take this beginners course on Ruby – no prior programming experience required. If you just need to brush up your Ruby concepts, this tutorial can help.

URI is a module which provides classes to handle Uniform Resource Identifiers.  Features of this module include uniform handling of URI’s, handling custom URI schemes and flexibility to have an alternate URI::Parser( or just different patterns and regexp’s).

What is OOPS ?

Structured programming was widely used before the advent of object oriented programming . The problem was when the programs developed using structured programming became too big, they became too difficult to understand and use. In this old approach, data and functions were kept separate. However the real world is defined in terms of objects. Each object has properties (data) and behavior (functions). In Object oriented programming the program is defined in terms of different objects.  Classes are similar to blueprints and define the data and functions operating on that data. Objects are created as a result of the instantiation of classes. Inheritance, Polymorphism, Abstraction and Encapsulation are some of the properties embodied by OOPS.

A class inherits the properties of its parent classes. It is similar to the children inheriting the characteristic features and behavior of parents. When the same operator or function acts differently as per different operand or function parameters the behavior is termed polymorphism. When we know a function works but its internal logic is hidden then the concept is called Abstraction. Encapsulation is the bundling of data and the functions operating on that data into a single unit. Examples of object oriented programming languages include C++ and Java. Note that the Ruby programming language is a true object oriented programming language. To learn more about the Object Oriented nature of Ruby, you can take this course.

Class tree of URI Module

The following is the hierarchy of classes in the URI module.

URI::Generic
URI::FTP
URI::HTTP
URI::HTTPS
URI::LDAP
URI::LDAPS
URI::MailTo
URI::Parser
URI::REGEXP
URI::REGEXP::PATTERN
URI::Util
URI::Escape
URI::Error
URI::InvalidURIError
URI::InvalidComponentError
URI::BadURIError

This is quite vast and would be difficult for us cover full in this tutorial. We’ll walk you through some of the most common methods below, along with some examples. But to learn more about these, you should take this in depth Ruby course.

URI Class Methods

The following are the methods of the URI class. These are explained in detail below.

  • ::decode_www_form
  • ::decode_www_form_component
  • ::encode_www_form
  • ::encode_www_form_component
  • ::extract
  • ::join
  • ::parse
  • ::regexp
  • ::scheme_list
  • ::split

EXAMPLE 1: Program to Extract URI Properties

require 'uri'
uri = URI("http://xyz.com/blogs?id=30&limit=5#time=1305298413")
URL:http://xyz.com/blogs?id=30&limit=5#time=1305298413>
uri.scheme
uri.host
uri.path
uri.query
uri.fragment
uri.to_s

uri.scheme extracts the scheme which is “http”. Uri.host extracts the host which is “xyz.com”. Uri.path extracts the path which is “/blogs”. Uri.query extracts the query which is id=30&limit. Uri.fragment extracts the fragment which is  time=1305298413.Uri.to_s converts URI to string which is "http://xyz.com/blogs?id=30&limit=5#time=1305298413". To see some more practical examples, go over to this special course full of practical tips for Ruby programming.

EXAMPLE 2: Adding Custom URIs

module URI
class RSYNC < Generic
DEFAULT_PORT = 873
end
@@schemes['RSYNC'] = RSYNC
end
URI.scheme_list
uri = URI("rsync://rsync.xyz.com")

In this program, URI.scheme_list returns {“FTP”=>URI::FTP, “HTTP”=>URI::HTTP, “HTTPS”=>URI::HTTPS,

“LDAP”=>URI::LDAP, “LDAPS”=>URI::LDAPS, “MAILTO”=>URI::MailTo,

“RSYNC”=>URI::RSYNC}

Public Class Methods

Here we examine the methods of Ruby URI in detail.

  • decode_www_form(str, enc=Encoding::UTF_8, separator: ‘&’, use__charset_: false, isindex: false)

This function decodes URL-encoded form data from given str.

This useful method decodes application/x-www-form-urlencoded data and returns array of key-value array.

  • decode_www_form_component(str, enc=Encoding::UTF_8)

This function decodes given str of URL-encoded form data.

  • encode_www_form(enum, enc = nil)

This method generates URL encoded form data from given enum. It internally uses ::encode_www_form)_component. As this method does not convert the encoding of passed items, convert them before using this method if you wish to send data as other than original encoding or mixed encoding data. Note that this method doesn’t handle files. To handle a file, use multipart/form-data.

Take a look at the following code.

URI.encode_www_form([["a", "python"], ["lang", "en"]])
URI.encode_www_form("a" => "python", "lang" => "en")
URI.encode_www_form("a" => ["python", "perl"], "lang" => "en")
URI.encode_www_form([["a", "python"], ["a", "perl"], ["lang", "en"]])

The first line of this code returns “a=python&lang=en”. The second line returns “a=python&lang=en”. The third line returns “a=python&a=perl&lang=en”. The final line returns “a=python&a=perl&lang=en”.

  • URI::extract(str[, schemes][,&blk])

The argument str starts for the String to extract URIs from.  The argument schemes limit URI matching to specific scheme(s). This function extracts URIs from given string. It iterates through all matched URIs if block is given. The function returns nil if block given or array with matches.

require "uri"
URI.extract("text here http://zoo.example.org/blar and here mailto:newtest@example.com and here also.").

The function returns  [“http://zoo.example.com/blar”, “mailto:newtest@example.com”]

  • URI::join(str[, str, …])

The parameter passed are the strings to work with. This function joins URIs.

 URI.join('http://newexample.com', 'zoo')
 URI.join('http://newexample.com', '/zoo', '/car')

The first line returns #<URI::HTTP:0x01ab80a0 URL:http://newexample.com/zoo>

The second line returns #<URI::HTTP:0x01aaf0b0 URL:http://newexample.com/car>

  • URI::parse(uri_str)

The parameter passed is the string with URI. This function creates one of the URI’s subclasses instance from the string.  URI::InvalidURIError is Raised if URI given is an incorrect one. Take a look at the following Ruby program

require 'uri'
uri = URI.parse("http://www.ruby-lang.org/")
p uri
p uri.scheme
p uri.host

The third line returns  #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/>. The fourth line returns  “http”. The fifth line returns “www.ruby-lang.org”

  • URI::regexp([match_schemes])

The parameter passed is an array of schemes. If present resulting regexp matches to URIs whose scheme is one of the match_schemes. This function returns a Regexp object which matches to URI-like strings. The Regexp object returned by this function  includes arbitrary number of capture group (parentheses). Remember to never rely on it’s number. Take a look at the following code

require 'uri'
html_string.slice(URI.regexp)
html_string.sub(URI.regexp(['ftp'])
html_string.scan(URI.regexp) do |*matches|
  p $&
end

The third line extracts first URI from html_string. The line after that removes ftp URIs. Finally note that you should not rely on the number of parentheses.

  • scheme_list()

This function accepts no parameters and returns a Hash of the defined schemes.

  • URI :: split(uri)

The argument uri is string with URI. This method splits the string  into the following parts and returns array with the result. The different parts are scheme, userinfo, host, port,  registry, path, opaque, query and fragment. Take a look at the following program.

require 'uri'
p URI.split("http://www.ruby-lang.org/")

It returns [“http”, nil, “www.ruby-lang.org”, nil, nil, “/”, nil, nil, nil]

Hope this article was both informative and useful. Try out these examples for yourself. Also experiment with the code to see if the results are different.  This will give you better feel of how the code works. Once you’re ready to move to the next level, this advanced Ruby course can help achieve true mastery.