Including OpenOffice.org word processor (writer) documents within a website - Linux file permissions reference document

26 August 2009

I've added a new tutorial to the Penguin Tutor Linux Tutorial pages. The latest tutorial is on file permissions within Linux, including commands such as chmod, chown and umask.

View the tutorial below:

New tutorial - Linux file permissions reference document

The tutorials are all created in OpenOffice.org and then exported as xhtml for inclusion in the website. This worked out well for the older versions of OpenOffice.org, but with version 3 the exported document format is slightly changed which broke my script for incorporating the document into my website. I must however admit that the php script used to do this is a bit of a hack rather than fully developed software (as is a lot of the php content on the web).

I've now updated my script to allow documents to be exported in html rather than xhtml, which is easier to handle. I'm not sure whether I'll continue to create the tutorials in this way in future. It was useful when I needed to work with the documents offline on another computer, but web access is becoming more pervasive and it may be easier to edit it on a web based repository in future (perhaps wordpress).

In the meantime this is the code that I used to incorporate the html from OpenOffice.org, first stripping out the start and end tabs and by incorporating the style sheet information into the html header and the main part into the body. Note that some parts work due to the writer (word processor) template that I have used, so may not work with a different layout OpenOffice.org file.

// Load in the file
// Check the file exists, if not use defaulterrorfile
if (!file_exists ($docfilename)) {$docfilename = $docpath.$defaulterrorfile;}
$document = file($docfilename);

// Pre-Process the document                          
// status holds different sections where we have entries split over multiple lines (e.g. <style>)
$status = "";

// First pass through document - get styles
foreach ($document as $currentline)
{
if (preg_match("/<style.*>/i", $currentline)) {$status="style"; continue;}
if (preg_match("/<\/style>/i", $currentline)) {$status=""; continue;}

// If we are in the style section
if (strcmp ("style", $status)==0)
{
// Ignore start / end comments
if (preg_match ("/<!--/", $currentline)) {continue;}
if (preg_match ("/-->/", $currentline)) {continue;}

// Some entries cannot be used e.g. @page
if (preg_match ("/\@page/i", $currentline)) {continue;}

// Otherwise this is a style entry - add #document to stop this changing rest of page layout and put into style array
$indocstyles[] = "#document " . trim($currentline);

continue;
}
}

// Parse through file looking for keywords <p> tags and then we want the next <p> tags
$status = "";
foreach ($document as $currentline)
{
// If we have passed keywords identifier
if (strcmp ("keywords", $status) == 0)
{
if (preg_match("/<p>(.*)<\/p>/i", $currentline))
{
$dockeywords = $matches[1];
break;
}
}
// Start of keywords (next <p></p> is the actual keywords
if (preg_match("/<p>Keywords<\/p>/i", $currentline)) {$status="keywords"; continue;}
}

Within the header:

<style type="text/css">
div #document { margin-top: 20px;}

<?php 
foreach ($indocstyles as $thisstyle)
{
// remove any fixed width stylings greater than 2.999cm in the styles
// This is to stop the tables being too wide, and make it flow better in the rest of the page
$thisstyle = preg_replace ("/width:\d{2,}\.\d*?cm\;/", "", $thisstyle);
$thisstyle = preg_replace ("/width:[^012]\.\d*?cm\;/", "", $thisstyle);

// Remove any clear:both - as these will format around the ads etc.
$thisstyle = preg_replace ("/clear:both\;{0,1}/", "", $thisstyle);

// If we've emptied the style then no point in showing
if (preg_match ("/\{\s*\}/", $thisstyle)) {continue;}

echo ("$thisstyle\n");
}
?>

</style>

Finally in the main body:


<div id="document">

<?php
;
// Now add the main part of the document
// Note that this works for both html and xhtml
$status = "";
foreach ($document as $currentline)
{
if ((strcmp ($status, "body") == 0) || preg_match ("/\<body.*\>/i", $currentline)) 
{
$status="body";
// Remove up to body tag
$currentline = preg_replace ("/.*?\<body.*?\>/i", "", $currentline);
// Remove after body as well
$currentline = preg_replace ("/\<\/body\>.*/i", "", $currentline);
// Remove width parts of tablecols or on styles
$currentline = preg_replace ("/\<col width=\"\d*\" \/\>/i", "<col />", $currentline);
$currentline = preg_replace ("/width:\d*\.\d*cm\;/i" , "", $currentline);

// If we have a relative reference in the document, then change this to be in the docpath
$currentline = preg_replace ("/src=\"/i", "src=\"$docpath", $currentline);

// Finished replaces - now output
echo $currentline;
}

}


?>
</div>

You may find this useful, but it would need some customisation and a containing php file. I am starting to think this may be better created directly into (x)html and CSS which would give much more control.

Including OpenOffice.org word processor (writer) documents within a website - Linux file permissions reference document

Related links