How to read OpenOffice OpenDocument spreadsheets in Perl
Talk0this wiki
Redirected from Howto read OpenOffice OpenDocument spreadsheets in Perl
There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)
The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC
Contents |
Methods
Edit
ODF::lpOD module
Edit
- Details: ODF::lpOD at CPAN
- Openning a spreadsheet:
- $doc = odf_document->get($filename);
- Main sheet selection function:
- $sheet = $doc->get_body->get_table($tablename);
- Main function to get read/write access to a cell:
- $cell = $table->get_cell(8, 2);
- $cell = $table->get_cell("C9");
- Main function for grabing a cell value (two addressing syntaxes):
- $cell = $table->get_cell_value(8, 2);
- $cell = $table->get_cell_value("C9");
- Example for reading a cell:
use ODF::lpOD;
$file = 'chipList-v2.14.ods';
my $doc = odf_document->get($file);
my $content = $doc->get_body;
my $sheet = $content->get_table("My Chips");
my $res = $sheet->get_cell_value("A1");
print $res;
OpenOffice::OODoc module
Edit
- Details:OpenOffice::OODoc at CPAN
- Opening a spreadsheet:
- $doc = odfDocument(file => $filename);
- Main function for grabing a cell:
- $doc->getCellValue($sheet, $row, $col)
- note variables start at 0. ie the first sheet, first row, first column is (0, 0, 0)
- $doc->getCellValue($sheet, $row, $col)
- Example for reading a cell:
use OpenOffice::OODoc; my $doc = odfDocument(file => 'chipList-v2.14.ods'); print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1
OpenOffice::OOSheets module
Edit
- Details:OpenOffice::OOSheets at CPAN
note this example has a problem on my computer
- Example for reading a cell:
use OpenOffice::OOSheets;
use Archive::Zip;
$file = 'chipList-v2.14.ods';
my $zip = Archive::Zip->new($file);
my $content=$zip->contents('content.xml');
my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=>
[
{
'cells' => [
'A1'
],
'table' => 'My Chips'
},
]); #Reads sheet named 'My Chips', cell A1
print $res;
Spreadsheet::Read module
Edit
- Details: Spreadsheet::Read at CPAN
- This module uses OpenOffice/Excel cell naming format.
- Example: row 1, column 1 is A1
- There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
- $cell=cr2cell ( 1, 1) # $cell is = 'A1'
- Example for reading a cell:
use Spreadsheet::Read;
my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods");
print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1
print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)
Custom Subroutine, parse_ods()
Edit
This function is much faster than other modules
- Usage:
- use Archive::Zip;
- @spreadsheet_data = &parse_ods($spreadsheet_file_name);
- $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
- sheet, row, column start at 0
sub parse_ods {
my $file = $_[0];
my @data;
my $zip = Archive::Zip->new($file);
my $content=$zip->contents('content.xml');
$content =~ s/<\?xml .*\?>\n//;
$content =~ s/^.*<office:spreadsheet>//;
$content =~ s/<\/office:spreadsheet>.*$//;
$content =~ s/<table:named-expressions>.*<\/table:named-expressions>//;
$content =~ s/<table:database-ranges>.*<\/table:database-ranges>//; #keep greedy?
$content =~ s/<\/table:table-cell>//; #maybe not neeeded
$content =~ s/<\/table:table-row>//; #maybe not neeeded
my @sheets = split(/<table:table .*?>/,$content);
shift(@sheets); #skips the first _blank_ element
my $sheet_num=0;
foreach my $sheet (@sheets) {
my $row_num=-1;
foreach my $row (split(/<table:table-row .*?>/,$sheet)) {
if ($row_num != -1) {
#skips first element
my $col_num=-1;
foreach my $cell ( split(/<table:table-cell/,$row )) {
if ( $col_num == -1 ) {
#skips first element
$col_num++;
} else {
my $cell_content = $cell;
if ( $cell_content =~ /<text:p>/ ) {
$cell_content =~ s/.*<text:p>//;
$cell_content =~ s/<\/text:p>.*//;
} else { # no content in cell
$cell_content = '';
}
if ($cell =~ /number-columns-repeated/) {
# repeat cells contents n-times
my $col_repeated = $cell;
# Original: $col_repeated =~ s/table:number-columns-repeated="//;
$col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better
$col_repeated =~ s/".*//;
foreach (1..$col_repeated) {
$data[$sheet_num][$row_num][$col_num]=$cell_content;
$col_num++;
}
} else {
# no repeated cols
$data[$sheet_num][$row_num][$col_num]=$cell_content;
$col_num++;
}
}
}
}
$row_num++
}
$sheet_num++;
}
return @data;
}
Comparison
Edit
A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells
| Method | Time to read 1 cell | Time to read 600 cells |
|---|---|---|
| Unzip and read xml no processing | 0.01s | NA |
| OpenOffice::OODoc | 1.8s | 24.9s |
| OpenOffice::OOSheets | ? | ? |
| Spreadsheet::Read | 0.88s | 0.88s |
| parse_ods() | 0.08 | 0.08 |
As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods(), is much much faster, but can only be used to read a file, not write or modify.