Adobe plunges PDF into XML
Connecting state and local government leaders
Today at the XML 2006 Conference, being held this week in Boston, Adobe Systems Inc., will reveal a radical beta of what could be the next version of its veritable Portable Document Format'one made up entirely of the Extensible Markup Language.
Although the current version of PDF allows a document creator to bundle an XML-encased transcript of the text in that document, documents rendered in this new document layout format--codenamed Mars'will be comprised entirely in XML, explained Joel Geraci, Adobe's PDF developer evangelist.
The company's research lab has released the software for public review. Should the feedback prove helpful and the Adobe corporate Powers-That-Be bless the new format, Mars could be the next generation PDF, and be rolled into the company's offerings as early as with the next major release of Acrobat.
'PDF is over 15 years old. It predates XML. The technology it's not at the same level compatibility as XML, where there are a lot of tools and knowledge about how to work with XML,' admitted Phillip Levy, the PDF and XML architect for Adobe who helped develop Mars. 'So moving the PDF technology onto an XML base gives us a lot better integration with the rest of the world.'
Like the documents rendered in Microsoft Office's new XML formats, documents rendered in Mars will be a zipped collection of individual files. A plain-text Scalable Vector Graphics file will hold not only the document text but also explicit instructions on rendering the precise look and feel of the document. The zipped collection will also include any images that were incorporated into that document as well.
Adobe's use of SVG could represent a major step forward for that XML-based format, Levy said. Still going through developmental growing pains, this XML-based language describes how to depict visual elements in a presentation, with precise controls on where each element appears on the layout. Adobe will also add its own XML-based extensions to cover visual elements not handled by standardized SVG tags, Levy said.
Performance-wise, this new format should be on par with the current PDF, Levy said. Although XML encoding can be particularly verbose, the zipped compression should keep the file size manageable. Also, the processing power needed for rendering should be about on par with current PDFs, Levy noted. Eventually, the new format should be able to have all the advanced features, such as security, that the current PDF offers.
Levy said that by having PDF all-XML, organizations will be better able to incorporate into their workflow functions like PDF generation and information extraction from PDF documents.
Geraci demonstrated this potential ease with a proverbial 'Hello World' file. He displayed a Mars file for a document with only one line of text, 'Hello World.' He then opened up that document's SVG file and copied the 'Hello World' line, with its SVG encasements, to another line below the original, changing the offset value tag so it would appear just below the original line. Saving the SVG file, Gerace reopened the document in a viewer to display that the second line was added.
Although most PDF SVG files will be too complex to change by hand, the demonstration showed how easily an XML-parsing application could manipulate a PDF file, Gerace stated.
While today, external actions on PDFs can be done using Adobe PDF libraries, developers can find these libraries difficult to work with, Levy said. XML should come far easier, because the syntax is familiar and can be readily incorporated into Java and other programming languages, he said.
NEXT STORY: Senate confirms new Defense secretary Gates