There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)

The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC

Methods[]

ODF::lpOD module[]

Details: ODF::lpOD at CPAN

Opening a spreadsheet:

$doc = odf_document->get($filename);

Main sheet selection function:

$sheet = $doc->get_body->get_table($tablename);

Main function to get read/write access to a cell:

$cell = $table->get_cell(8, 2);

$cell = $table->get_cell("C9");

Main function for grabing a cell value (two addressing syntaxes):

$cell = $table->get_cell_value(8, 2);

$cell = $table->get_cell_value("C9");

Example for reading a cell:

use ODF::lpOD;
$file = 'chipList-v2.14.ods';
my $doc = odf_document->get($file);
my $content = $doc->get_body;
my $sheet = $content->get_table("My Chips");
my $res = $sheet->get_cell_value("A1");
print $res;

OpenOffice::OODoc module[]

Details:OpenOffice::OODoc at CPAN

Opening a spreadsheet:
- $doc = odfDocument(file => $filename);
Main function for grabing a cell:
- $doc->getCellValue($sheet, $row, $col)
  note variables start at 0. i.e. the first sheet, first row, first column is (0, 0, 0)
Example for reading a cell:

use OpenOffice::OODoc;
my $doc = odfDocument(file => 'chipList-v2.14.ods');
print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1

OpenOffice::OOSheets module[]

Details:OpenOffice::OOSheets at CPAN

note this example has a problem on my computer

Example for reading a cell:

use OpenOffice::OOSheets;
use Archive::Zip;
$file = 'chipList-v2.14.ods';
my $zip = Archive::Zip->new($file);
my $content=$zip->contents('content.xml');
my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=>
                 [
                   {
                     'cells' => [
                                'A1'
                              ],
                     'table' => 'My Chips'
                   },
                 ]); #Reads sheet named 'My Chips', cell A1
print $res;

Spreadsheet::Read module[]

Details: Spreadsheet::Read at CPAN

This module uses OpenOffice/Excel cell naming format.
- Example: row 1, column 1 is A1
- There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
  - $cell=cr2cell ( 1, 1) # $cell is = 'A1'

Example for reading a cell:

use Spreadsheet::Read;
my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods");
print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1
print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)

Custom Subroutine, parse_ods()[]

This function is much faster than other modules

Usage:
- use Archive::Zip;
- @spreadsheet_data = &parse_ods($spreadsheet_file_name);
- $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
  - sheet, row, column start at 0

sub parse_ods {
	my $file = $_[0];
	my @data;
	
	my $zip = Archive::Zip->new($file);
	my $content=$zip->contents('content.xml');
	
	$content =~ s/<\?xml .*\?>\n//;
	$content =~ s/^.*<office:spreadsheet>//;
	$content =~ s/<\/office:spreadsheet>.*$//;
	$content =~ s/<table:named-expressions>.*<\/table:named-expressions>//;
	$content =~ s/<table:database-ranges>.*<\/table:database-ranges>//; #keep greedy?
	$content =~ s/<\/table:table-cell>//; #maybe not neeeded
	$content =~ s/<\/table:table-row>//; #maybe not neeeded
	
	my @sheets = split(/<table:table .*?>/,$content);
	shift(@sheets); #skips the first _blank_ element
	
	my $sheet_num=0;
	foreach my $sheet (@sheets) {
		my $row_num=-1;
		foreach my $row (split(/<table:table-row .*?>/,$sheet)) {
			if ($row_num != -1) {
				#skips first element
				my $col_num=-1;
				foreach my $cell ( split(/<table:table-cell/,$row )) {
					if ( $col_num == -1 ) {
						#skips first element
						$col_num++;
					} else {
						my $cell_content = $cell;
						if ( $cell_content =~ /<text:p>/ ) {
							$cell_content =~ s/.*<text:p>//;
							$cell_content =~ s/<\/text:p>.*//;
						} else { # no content in cell
							$cell_content = '';
						}
						if ($cell =~ /number-columns-repeated/) {
							# repeat cells contents n-times
							my $col_repeated = $cell;
							# Original: $col_repeated =~ s/table:number-columns-repeated="//;
							$col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better
							$col_repeated =~ s/".*//;
							foreach (1..$col_repeated) {
								$data[$sheet_num][$row_num][$col_num]=$cell_content;
								$col_num++;
							}
						} else {
							# no repeated cols
							$data[$sheet_num][$row_num][$col_num]=$cell_content;
							$col_num++;
						}
					}
				}
			}
			$row_num++
		}
		$sheet_num++;
	}
	return @data;
}

Comparison[]

A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells

Method	Time to read 1 cell	Time to read 600 cells
Unzip and read xml no processing	0.01s	NA
OpenOffice::OODoc	1.8s	24.9s
OpenOffice::OOSheets	?	?
Spreadsheet::Read	0.88s	0.88s
parse_ods()	0.08	0.08

As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods(), is much much faster, but can only be used to read a file, not write or modify.

How to read OpenOffice OpenDocument spreadsheets in Perl

Contents