How To Wiki
Advertisement

There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)

The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC

Methods[]

ODF::lpOD module[]

  • Opening a spreadsheet:
$doc = odf_document->get($filename);
  • Main sheet selection function:
$sheet = $doc->get_body->get_table($tablename);
  • Main function to get read/write access to a cell:
$cell = $table->get_cell(8, 2);
$cell = $table->get_cell("C9");
  • Main function for grabing a cell value (two addressing syntaxes):
$cell = $table->get_cell_value(8, 2);
$cell = $table->get_cell_value("C9");
  • Example for reading a cell:
use ODF::lpOD;
$file = 'chipList-v2.14.ods';
my $doc = odf_document->get($file);
my $content = $doc->get_body;
my $sheet = $content->get_table("My Chips");
my $res = $sheet->get_cell_value("A1");
print $res;


OpenOffice::OODoc module[]

  • Opening a spreadsheet:
    • $doc = odfDocument(file => $filename);
  • Main function for grabing a cell:
    • $doc->getCellValue($sheet, $row, $col)
      note variables start at 0. i.e. the first sheet, first row, first column is (0, 0, 0)
  • Example for reading a cell:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'chipList-v2.14.ods');
print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1

OpenOffice::OOSheets module[]

note this example has a problem on my computer

  • Example for reading a cell:
use OpenOffice::OOSheets;
use Archive::Zip;
$file = 'chipList-v2.14.ods';
my $zip = Archive::Zip->new($file);
my $content=$zip->contents('content.xml');
my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=>
                 [
                   {
                     'cells' => [
                                'A1'
                              ],
                     'table' => 'My Chips'
                   },
                 ]); #Reads sheet named 'My Chips', cell A1
print $res;

Spreadsheet::Read module[]

  • This module uses OpenOffice/Excel cell naming format.
    • Example: row 1, column 1 is A1
    • There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
      • $cell=cr2cell ( 1, 1) # $cell is = 'A1'


  • Example for reading a cell:
use Spreadsheet::Read;
my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods");
print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1
print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)


Custom Subroutine, parse_ods()[]

This function is much faster than other modules

  • Usage:
    • use Archive::Zip;
    • @spreadsheet_data = &parse_ods($spreadsheet_file_name);
    • $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
      • sheet, row, column start at 0
sub parse_ods {
	my $file = $_[0];
	my @data;
	
	my $zip = Archive::Zip->new($file);
	my $content=$zip->contents('content.xml');
	
	$content =~ s/<\?xml .*\?>\n//;
	$content =~ s/^.*<office:spreadsheet>//;
	$content =~ s/<\/office:spreadsheet>.*$//;
	$content =~ s/<table:named-expressions>.*<\/table:named-expressions>//;
	$content =~ s/<table:database-ranges>.*<\/table:database-ranges>//; #keep greedy?
	$content =~ s/<\/table:table-cell>//; #maybe not neeeded
	$content =~ s/<\/table:table-row>//; #maybe not neeeded
	
	my @sheets = split(/<table:table .*?>/,$content);
	shift(@sheets); #skips the first _blank_ element
	
	my $sheet_num=0;
	foreach my $sheet (@sheets) {
		my $row_num=-1;
		foreach my $row (split(/<table:table-row .*?>/,$sheet)) {
			if ($row_num != -1) {
				#skips first element
				my $col_num=-1;
				foreach my $cell ( split(/<table:table-cell/,$row )) {
					if ( $col_num == -1 ) {
						#skips first element
						$col_num++;
					} else {
						my $cell_content = $cell;
						if ( $cell_content =~ /<text:p>/ ) {
							$cell_content =~ s/.*<text:p>//;
							$cell_content =~ s/<\/text:p>.*//;
						} else { # no content in cell
							$cell_content = '';
						}
						if ($cell =~ /number-columns-repeated/) {
							# repeat cells contents n-times
							my $col_repeated = $cell;
							# Original: $col_repeated =~ s/table:number-columns-repeated="//;
							$col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better
							$col_repeated =~ s/".*//;
							foreach (1..$col_repeated) {
								$data[$sheet_num][$row_num][$col_num]=$cell_content;
								$col_num++;
							}
						} else {
							# no repeated cols
							$data[$sheet_num][$row_num][$col_num]=$cell_content;
							$col_num++;
						}
					}
				}
			}
			$row_num++
		}
		$sheet_num++;
	}
	return @data;
}

Comparison[]

A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells

Method Time to read 1 cell Time to read 600 cells
Unzip and read xml
no processing
0.01s NA
OpenOffice::OODoc 1.8s 24.9s
OpenOffice::OOSheets ? ?
Spreadsheet::Read 0.88s 0.88s
parse_ods() 0.08 0.08

As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods(), is much much faster, but can only be used to read a file, not write or modify.

Advertisement