Introduction to the CsvUtil Module

The CsvUtil module provides functionality for parsing CSV-like files.

To use this module, use "%requires CsvUtil" in your code.

All the public symbols in the module are defined in the CsvUtil namespace

Currently the module provides the following classes:

CsvAbstractIterator: base abstract iterator class for iterating line-based CSV data
CsvDataIterator: iterator class allowing for CSV string data to be processed line by line on a record basis
CsvFileIterator: iterator class allowing for CSV files to be processed line by line on a record basis
AbstractCsvWriter: a base class for new CSV writer implementations
CsvFileWriter: CSV file writer
CsvStringWriter: CSV in memory writer

Note that the CsvFileIterator class can be used to parse arbitrary text files; the field separator character can be specified in the constructor, as well as the quote character and end of line sequence. See the constructor documentation for more information.

Examples:

#!/usr/bin/env qore
%new-style
%requires CsvUtil
CsvFileIterator i("example-file.csv");
CsvFileWriter writer("example-file-copy.csv");
while (i.next()) {
    printf("%d: %y\n", i.index(), i.getValue());
    writer.writeLine(i.getValue());
}

If "example-file.csv" is:

UK,1234567890,"Sony, Xperia S",31052012
UK,1234567891,"Sony, Xperia S",31052012
UK,1234567892,"Sony, Xperia S",31052012
UK,1234567893,"Sony, Xperia S",31052012

The data is read verbatim, each value is returned as a string, header names are generated numerically; the output is:

1: {0: "UK", 1: "1234567890", 2: "Sony, Xperia S", 3: "31052012"}
2: {0: "UK", 1: "1234567891", 2: "Sony, Xperia S", 3: "31052012"}
3: {0: "UK", 1: "1234567892", 2: "Sony, Xperia S", 3: "31052012"}
4: {0: "UK", 1: "1234567893", 2: "Sony, Xperia S", 3: "31052012"}

Also the "example-file-copy.csv" will contain data from the original file formatted as CSV.

If header names are provided and field types are specified, the output looks different:

#!/usr/bin/env qore
%new-style
%requires CsvFileIterator
CsvFileIterator i("example-file.csv", ("headers": ("cc", "serno", "desc", "received"), "fields": ("serno": "int", "received": ("type": "date", "format": "DDMMYYYY"))));
while (i.next())
    printf("%d: %y\n", i.index(), i.getValue());

Now the hash keys in each record returned are those given in the constructor, and the fields "serno" and "received" are given other data types; this produces:

1: {cc: "UK", serno: 1234567890, desc: "Sony, Xperia S", received: 2012-05-31 00:00:00 Thu +02:00 (CEST)}
2: {cc: "UK", serno: 1234567891, desc: "Sony, Xperia S", received: 2012-05-31 00:00:00 Thu +02:00 (CEST)}
3: {cc: "UK", serno: 1234567892, desc: "Sony, Xperia S", received: 2012-05-31 00:00:00 Thu +02:00 (CEST)}
4: {cc: "UK", serno: 1234567893, desc: "Sony, Xperia S", received: 2012-05-31 00:00:00 Thu +02:00 (CEST)}

Use the "header-lines" and "header-names" options to automatically read the header names from the file if present. Use the "fields" option to describe the fields and perform transformations on the data read. For more information, see the CsvFileIterator class.

Release Notes

Version 1.4

fixed the "format" field option when used with "*date" field types
implemented the "tolwr" parser option
changed the default field type when parsing and generating CSV files from "string" to "*string"

Version 1.3

added the "write-headers" option to CsvUtil::AbstractCsvWriter and subclasses to enable headers to be suppressed
added the "optimal-quotes" option to CsvUtil::AbstractCsvWriter and subclasses to enable more efficient csv output (now the default output option); to revert back to the previous behavior (where all fields are quoted regardless of data type or content), set to False in the constructor

Version 1.2

fixed CsvUtil::CsvDataIterator::next() when header_lines > 0 and working with empty input data
implemented support for the "*int", "*float", "*number", and "*date" types
implemented support for allowing subclasses of CsvUtil::CsvFileIterator to implement support for other custom types
fixed "date" type handling with empty input; now returns 1970-01-01 (use "*date" to map empty input to NOTHING)
added the CsvUtil::CsvStringWriter, CsvUtil::AbstractCsvWriter, and CsvUtil::CsvFileWriter classes
if "headers" are not given in the CsvUtil::CsvFileIterator::constructor() but "fields" are, then set the headers from the field descriptions automatically

Version 1.1

bug fixes to header and fields option processing
fixed CsvUtil::CsvFileIterator::index() to return the line index
added CsvUtil::CsvFileIterator::lineNumber() to return the current line number in the file

Version 1.0

initial version of module

Table of Contents