Wikia

How To Wiki

How to read OpenOffice OpenDocument spreadsheets in Perl

Talk0
1,791pages on
this wiki

There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)

The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC

MethodsEdit

ODF::lpOD moduleEdit

  • Opening a spreadsheet:
$doc = odf_document->get($filename);
  • Main sheet selection function:
$sheet = $doc->get_body->get_table($tablename);
  • Main function to get read/write access to a cell:
$cell = $table->get_cell(8, 2);
$cell = $table->get_cell("C9");
  • Main function for grabing a cell value (two addressing syntaxes):
$cell = $table->get_cell_value(8, 2);
$cell = $table->get_cell_value("C9");
  • Example for reading a cell:
use ODF::lpOD;
$file = 'chipList-v2.14.ods';
my $doc = odf_document->get($file);
my $content = $doc->get_body;
my $sheet = $content->get_table("My Chips");
my $res = $sheet->get_cell_value("A1");
print $res;


OpenOffice::OODoc moduleEdit

  • Opening a spreadsheet:
    • $doc = odfDocument(file => $filename);
  • Main function for grabing a cell:
    • $doc->getCellValue($sheet, $row, $col)
      note variables start at 0. i.e. the first sheet, first row, first column is (0, 0, 0)
  • Example for reading a cell:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'chipList-v2.14.ods');
print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1

OpenOffice::OOSheets moduleEdit

note this example has a problem on my computer

  • Example for reading a cell:
use OpenOffice::OOSheets;
use Archive::Zip;
$file = 'chipList-v2.14.ods';
my $zip = Archive::Zip->new($file);
my $content=$zip->contents('content.xml');
my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=>
                 [
                   {
                     'cells' => [
                                'A1'
                              ],
                     'table' => 'My Chips'
                   },
                 ]); #Reads sheet named 'My Chips', cell A1
print $res;

Spreadsheet::Read moduleEdit

  • This module uses OpenOffice/Excel cell naming format.
    • Example: row 1, column 1 is A1
    • There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
      • $cell=cr2cell ( 1, 1) # $cell is = 'A1'


  • Example for reading a cell:
use Spreadsheet::Read;
my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods");
print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1
print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)


Custom Subroutine, parse_ods()Edit

This function is much faster than other modules

  • Usage:
    • use Archive::Zip;
    • @spreadsheet_data = &parse_ods($spreadsheet_file_name);
    • $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
      • sheet, row, column start at 0
sub parse_ods {
	my $file = $_[0];
	my @data;
	
	my $zip = Archive::Zip->new($file);
	my $content=$zip->contents('content.xml');
	
	$content =~ s/<\?xml .*\?>\n//;
	$content =~ s/^.*<office:spreadsheet>//;
	$content =~ s/<\/office:spreadsheet>.*$//;
	$content =~ s/<table:named-expressions>.*<\/table:named-expressions>//;
	$content =~ s/<table:database-ranges>.*<\/table:database-ranges>//; #keep greedy?
	$content =~ s/<\/table:table-cell>//; #maybe not neeeded
	$content =~ s/<\/table:table-row>//; #maybe not neeeded
	
	my @sheets = split(/<table:table .*?>/,$content);
	shift(@sheets); #skips the first _blank_ element
	
	my $sheet_num=0;
	foreach my $sheet (@sheets) {
		my $row_num=-1;
		foreach my $row (split(/<table:table-row .*?>/,$sheet)) {
			if ($row_num != -1) {
				#skips first element
				my $col_num=-1;
				foreach my $cell ( split(/<table:table-cell/,$row )) {
					if ( $col_num == -1 ) {
						#skips first element
						$col_num++;
					} else {
						my $cell_content = $cell;
						if ( $cell_content =~ /<text:p>/ ) {
							$cell_content =~ s/.*<text:p>//;
							$cell_content =~ s/<\/text:p>.*//;
						} else { # no content in cell
							$cell_content = '';
						}
						if ($cell =~ /number-columns-repeated/) {
							# repeat cells contents n-times
							my $col_repeated = $cell;
							# Original: $col_repeated =~ s/table:number-columns-repeated="//;
							$col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better
							$col_repeated =~ s/".*//;
							foreach (1..$col_repeated) {
								$data[$sheet_num][$row_num][$col_num]=$cell_content;
								$col_num++;
							}
						} else {
							# no repeated cols
							$data[$sheet_num][$row_num][$col_num]=$cell_content;
							$col_num++;
						}
					}
				}
			}
			$row_num++
		}
		$sheet_num++;
	}
	return @data;
}

ComparisonEdit

A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells

Method Time to read 1 cell Time to read 600 cells
Unzip and read xml
no processing
0.01s NA
OpenOffice::OODoc 1.8s 24.9s
OpenOffice::OOSheets ? ?
Spreadsheet::Read 0.88s 0.88s
parse_ods() 0.08 0.08

As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods(), is much much faster, but can only be used to read a file, not write or modify.

Around Wikia's network

Random Wiki