Ruby CSV: How You Can Process and Manipulate CSV Files with Ruby

ruby csvRuby is a server side scripting language designed by Yukihiro Matsumoto in 1993. This language is open source but subject to a license. It is an interpreted and object oriented language. The language has an easy syntax which enables the beginners and other language programmers to learn it quickly and easily. Ruby has syntax which is similar to other programming languages like Perl and C++. Developers use this language to develop Internet applications. It can be embedded into Hypertext Markup Language and is also used to write Common Gateway Interface scripts. Ruby can be easily connected to the databases such as Oracle, Sybase, DB2 and MYSQl. CSV is an important file type for data base management, and Ruby makes it easy for programmers to work with CSV files. Today, in this intermediate level tutorial, we look at different ways to process and manipulate CSV files. Those who do not have prior programming experience may want to first take this beginners Ruby course, specially designed for them.

 What is a CSV File?

CSV is an acronym for comma-separated values. A CSV file is formatted with commas that act as the delimiters separating each data element. Note that this file can be easily read and understood by both humans and computer applications. This file is the standard method of exporting and importing data from many software applications. As it is based on a simple string text delimiter common for all software programs, it is able to be used by multiple programming languages and operating systems. Many databases use this format which makes it easy to import and export data.

Imagine you have a toy shop and you keep a spreadsheet of all your customers. Each row contains in a separate cell

  • The name of the customer
  • The total number of times they purchased toys.
  • Total money spent.
  • A short phrase used by them to describe the customer service.

Here is a visual representation of the file.

Cust_name    Total no: of visits      Amount spent($)       Description of customer service

John                                  45                                        1050                                 Excellent

Marie                               20                                         2400                                Amazing, loved it

Martin                             30                                         3600                              Could be better, I am pleased

Georgia                          10                                          1500                                  The best

Note that before writing a Ruby code, you have to include the declaration to include ‘csv’ in the program file. This tutorial can help give you a step by step guide to using Ruby.

CSV.read Method to Read the Complete File

Ruby stores each table row as an array with each cell as a string element of the array. We use the CSV. read method and parse the file name as argument. This method will read the entire file and store it in the variable.  See the example below

require 'csv'
customers = CSV.read('customers.csv')

The customers variable is now a big array which contains the rows as its elements. Note that Ruby represents each table row as an array. So the customer variable is an array containing other arrays.

To learn more about Ruby programming, we recommend you take this Ruby course for beginners.

CSV.foreach Method to Read the File Line by Line

We will now try to read the file line by line. The CSV.foreach method does that with the file name passed as the argument. Also a block variable is provided to contain the already processed row as an array. See the example below

CSV.foreach('customers.csv') do |row1|
puts row1.inspect
end

The above code will result in the following output:

["John", "45", "1050", "Excellent"]
["Marie", "20", "2400", "Amazing, loved it"]
["Martin", "30", "3600", "Could be better, I am pleased"]
["Georgia", "10", "1500", "The best"]

Remember that everything processed from the CSV file is a string which includes the numbers too.

CSV. Parse Method Converts Data into Ruby CSV

If there is a comma separated data as a String object in Ruby, the CSV.parse method will convert the data into the Ruby representation of CSV. (The table will be an array which contains other arrays i.e the rows) Take a look at the example below:

new_string = "John,45\n Marie,20"
CSV.parse(new_string)
The result #=> [["John", "45"], ["Marie", "20"]]

A variation of the csv.parse method:

Note that a block can also be provided to the CSV. parse function

CSV.parse(new_string) { |row1| puts row1.inspect }

The result #=> [“John”, “45”] and [“Marie”, “20”] on separate lines

CSV.parse without a block is analogous to CSV.read. They accept comma separated data from different types of sources (CSV.parse takes input from a string object while CSV.read from a computer file.) But the output is the same, an array of arrays.

 Go through this course on the basics of Ruby programming to get up and running faster with Ruby.

Manipulating CSV Files with Semicolon Separated Values

Assume that you have another file in which the values are separated by semi-colons. Name this file as new_customers.csv.

The content of the file is as follows:

Richard;  1;  37;     Average
Simon;  1;    65;     Awesome experience
Svetlana;   2;  46;    "incredible; amazing; the best!"

In all the four methods that we have seen so far, there was only one argument which is the file name. There exists a second optional argument which is a hash (key value pair) that contains various options that instructs Ruby, how to process the file.

:col_sep=> ';' specifies the separator used in the file. If we add this hash argument, the four methods will work. 
ee the example below
recent_customers= CSV.read('new_customers.csv', { :col_sep => ';' })
CSV.foreach('new_customers.csv', { :col_sep => ';' }) { |row| p row }

Program to Manipulate CSV Data:

Suppose we have to calculate the average spent by the customers per visit. We have to divide the total money the customer spent with the total number of visits.  Take a look at the code below

avg_money_spent=Array.new
CSV.foreach(customers.csv) do|new_row|
avg_money_spent<< new_row[2]/new_row[1]
end

By default, Ruby treats everything from the csv file as strings. To change this default behavior we have to use the key value option argument.

(converters: : numeric)

Here the key is a symbol (converters:) and the value is a symbol(:numeric).

As all options are a part of a hash we can specify more than one option at the same time as given in the following example.

CSV.read('customers_separated_with_semicolons.csv', col_sep: ';', converters: :numeric)

Here all the numbers will get converted to their corresponding formats. The integers will be converted to fixnums, the decimals to floats and large numbers to Bignums.

To see more practical examples of how to write code in Ruby, you can take this course.

Program to Add a New Column to Each Row

 Suppose we have the content of the entire customers.csv file as an array with 4 elements each of which is an array too. Assume the array name is customers_newarray. The avg_money_spent variable also has 4 elements -the first element corresponding to the average money spent for row 1, the second for row 2, etc. The following code adds this number to each row.

customers_newarray = CSV.read('test.txt')
customers_newarray.each do |customer|
customer << average_money_spent.shift
end

Now each row/array has a new cell/element. We open a (CSV) file for read, write or append mode and use either < < or puts to append a new line.

The only difference between File.open and CSV.open is that with File.open, we append the strings.

With CSV.open, we append rows (represented as arrays). Take a look at the following example.

CSV.open('new-customers-file.csv', 'w') do |csv_object|
customers.array.each do |row_array|
csv_object << row_array
end
end

Finally we now have a new, updated file named new-customers-file.csv.

CSV files are very commonly use across multiple software programs to save and manipulate databases. Ruby has powerful programming features to process and manipulate these kind of files. Go through the above examples and craft your own code to become proficient in handling CSV files. Once you’re ready to move to the next level, this advanced course can help you master Ruby.