91.5 Convert MS/Word to PDF
Out Of Date OpenOffice.org has a powerful support for plug-ins (called Macros)
that allow a lot of additional functionality to be added to
application. One common task is to convert MS/Word documents
.doc) into PDF. The recipe here uses Basic to program an
OpenOffice.org macro to convert from DOC to PDF. We then illustrate
how to turn this into a command line tool to convert from DOC to
PDF. (This example was developed by
First, start up OpenOffice.org, perhaps as oowriter. Then, from the Tools menu, select Macros, Organize Macros, OpenOffice.org Basic. A window will popup. Navigate, in the Macro from area, to My Macros, Standard, Module1. Click on Edit to edit the Main module to include just the following code:
REM ***** BASIC ***** Sub ConvertWordToPDF(cFile) cURL = ConvertToURL(cFile) ' Open the document. ' Just blindly assume that the document is of a type that OOo will ' correctly recognize and open -- without specifying an import filter. oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0, Array(MakePropertyValue("Hidden", True), )) Dim comps comps = split (cFile, ".") If UBound(comps) > 0 Then comps(UBound(comps)) = "pdf" cfile = join (comps, ".") Else cfile = cFile + ".pdf" Endif cURL = ConvertToURL(cFile) ' Save the document using a filter. oDoc.storeToURL(cURL, Array(MakePropertyValue("FilterName", "writer_pdf_Export"), )) oDoc.close(True) End Sub Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue Dim oPropertyValue As New com.sun.star.beans.PropertyValue If Not IsMissing( cName ) Then oPropertyValue.Name = cName EndIf If Not IsMissing( uValue ) Then oPropertyValue.Value = uValue EndIf MakePropertyValue() = oPropertyValue End Function
Save and exit from OpenOffice.org.
Now create a shell script, perhaps called
#!/bin/sh DIR=$(pwd) DOC=$DIR/$1 /usr/bin/oowriter -invisible "macro:///Standard.Module1.ConvertWordToPDF($DOC)"
Then simply run it:
$ doc2pdf my.doc
and you should end up with a
The script is nothing perfect, and there is an issue in that the script will return before OpenOffice.org has finished its work. Thus, to convert a whole directory of files, you may want sonething like:
$ for i in *.doc; do echo $i; doc2pdf "$i"; sleep 5; done
The code to handle the newer xml based MS/Office file formats (or more specifically to handle the 4 character filename extensions like .xlsx, .docx, .pptx) was contributed by Victor Danilchenko. It replaced:
cFile = Left(cFile, Len(cFile) - 4) + ".pdf"
Dim comps comps = split (cFile, ".") If UBound(comps) > 0 Then comps(UBound(comps)) = "pdf" cfile = join (comps, ".") Else cfile = cFile + ".pdf" Endif
Markus Dietsch offered a solution to the sleeping for 5 seconds above, due to the fact that oowriter detaches itself from the shell. His solution is to start oowriter and obtain the running process’ name:
$ oowriter $ ps ax | grep openoffice 7378 pts/3 Sl 0:00 /usr/lib/openoffice/program/soffice.bin -writer -splash-pipe=5
Use this process command line with our start arguments:
/usr/lib/openoffice/program/soffice.bin -writer -invisible "macro:///Standard.Module1.ConvertWordToPDF($DOC)"
This process does not detach itself from the shell and lives as long as the converting takes and is then coming back.
Your donation will support ongoing availability and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.