This class returns CSV data file with the following features.
Implemented in C++ and thus operate at high speed.
Handle any format with or without field names.
Loosely follow RFC 4180.
Assumed that the number of items in each row is fixed.
* MCMD::Mcsvout::new(arguments){block}
Create Mcsvout object. Specify the following arguments as character string delimited by space at arguments.
o= |
Output file name (String) |
f= |
Set the array of field names in character string as header (first line) of CSV output data. |
When size= is specified without f=, the CSV output will not include field names. |
|
size= |
Specify the number (Fixnum) of columns in CSV when field names is not present in output. |
precision= |
Specify the number of significant digits as floating variable. Default value is 10 digits. |
Value in C language output format "%.ng"のnの値。 |
|
Round 100/3 to significant digits of 5 becomes 33.333, and significant digits of 2 becomes 33. |
|
bool= |
Specify the true and false output value separated by comma. Default value is "1,0". |
* MCMD::Mcsvout::write(values)
Return CSV data stored in array. Data classes that can be stored in the array includes String, Fixnum, Bignum, Float, nil, true, false. All other classes will be treated as nil. If the size of array is less than the number of field names, null value is added to the output. If the size of array is greater than the number of field names, the excess will not be included in the output.
If comma is included in character string, the value is automatically enclosed in double quotes. Double quotes in a character string is replaced by two double quote characters.
csv=MCMD::Mcsvout.new("i=rsl.csv f=a,b,c"){|csv| csv.write(["1",2,3.4]) csv.write([1,2,3,4,5]) csv.write([1,2]) } # Output results (rsl.csv) a,b,c 1,2,3.4 1,2,3 1,2,
csv=MCMD::Mcsvout.new("i=rsl.csv size=3"){|csv| csv.write(["1",2,3.4]) csv.write([true,nil,false]) csv.write(["4\"5","","6,7"]) } # Output results (rsl.csv) 1,2,3.4 1,,0 "4""5",,"6,7"
MCMD::Mcsvout.new("i=rsl.csv size=3 precision=3 bools=T,F"){|csv| csv.write([0.123456,123456.0]) # Note that the decimals beyond the specified significant digits are not displayed. csv.write([123456,0]) # Specifying the number of significant digits does not affect Fixnum csv.write([true,false]) } # Output results (rsl.csv) 0.123,1.23e+05 123456,0 T,F
# dat1.csv customer,date A,20081201 B,20081002 MCMD::Mcsvin.new("i=dat1.csv -array"){|csvIn| MCMD::Mcsvout.new("i=rsl.csv f=#{csvIn.names.join(",")}){|csvOut| csvIn.each{|val| csvOut.write(val) } } } # rsl.csv customer,date A,20081201 B,20081002
The processing speed for various Ruby extension library are benchmarked in terms of writing CSV data. Two libraries are benchmarked as follows.
http://www.gesource.jp/programming/ruby/database/fastercsv.html
http://www.ruby-lang.org/ja/old-man/html/CSV.html
The results of benchmark test is shown in Table 2.3. 1 million rows, 10 million rows, and 100 million rows of data is written for the experiment. However, the data is not written to an actual file, instead, it is printed to null device (/dev/null). An excerpt of the benchmark test script is shown in Figure 2.2. Since Mcsvout is implemented in C++, its processing speed is faster than the other two libraries. The difference is because the other two libraries are implemented in Ruby native code.
Number of rows |
10K |
100K |
1000K |
Mcsvout |
0.0158 |
0.150 |
1.50 |
FasterCSV |
0.232 |
1.90 |
20.0 |
CSV |
0.279 |
2.80 |
27.9 |
require 'rubygems'
require 'csv'
require 'fastercsv'
require 'mtools'
require 'benchmark'
$data = ["12345678", 10, 1.1, true, nil, false]
puts Benchmark.measure{
(0...10).each{|i|
# Case of Mcsvout
MCMD::Mcsvout.new("o=/dev/null size=6){|csv|
(0...10000).each{|j|
csv.write($data)
}
}
# Case of FasterCSV
FasterCSV.open("/dev/null", 'w'){|csv|
(0...10000).each{|j|
csv << $data
}
}
# Case of CSV
CSV.open("/dev/null", 'w'){|csv|
(0...10000).each{|j|
csv << $data
}
}
}
}
Mcsvin : Read from CSV data.