Wednesday, February 27, 2013

Convert word document (.docx) to PDF


This post will describes how to convert word document to PDF using Java.

To convert document to Pdf we will have different type of approaches.
But in this post i am using  docx4j. It is one of the good API for conversion from XSLT to PDF and Word Document to PDF etc..


We can convert from document to Pdf with Simple java program.

Steps to follow.

Step1 :open Eclipse and create new java project- provide name as you like.

Step 2: Create new Java class  which ever you like (ex: ConvertDocToPDF )

Step 3: Paste the below lines of code inside main method of created java class


 try {


long start = System.currentTimeMillis();

// 1) Load DOCX into WordprocessingMLPackage

InputStream is = new FileInputStream(new File("test.docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
//If your header and body information got over lapped then use the below code
List sections = wordMLPackage.getDocumentModel().getSections();
for (int i = 0; i < sections.size(); i++) {

System.out.println("sections Size" + sections.size());
wordMLPackage.getDocumentModel().getSections().get(i).getPageDimensions().setHeaderExtent(3000);
}

//if you want use any Physical fonts then use the below code.

Mapper fontMapper = new IdentityPlusMapper();

PhysicalFont font = PhysicalFonts.getPhysicalFonts().get("Comic Sans MS");

fontMapper.getFontMappings().put("Algerian", font);

wordMLPackage.setFontMapper(fontMapper);

// 2) Prepare Pdf settings

PdfSettings pdfSettings = new PdfSettings();

// 3) Convert WordprocessingMLPackage to Pdf

org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);

OutputStream out = new FileOutputStream(new File("test.pdf"));
conversion.output(out,pdfSettings);
System.err.println("Time taken to Generate pdf  "+ (System.currentTimeMillis() - start) + "ms");
} catch (Throwable e) {

e.printStackTrace();
}


Step 4: Now you can run the Java program, PDF will be generate for your Document file.

28 comments:

  1. Isn't it much easier to use a web-based app for the conversion process? I have been using GroupDocs Conversion for some time now and it is quite simple and provides an embed code to use without your web-page.

    ReplyDelete
  2. i think u forgot mention the required jar files

    ReplyDelete
  3. You should try Aspose.Words for Java API also for converting word docs to pdf and to many other formats.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I tried the provided code for convertion of word to pdf by including all the required jars, but got some exceptions and errors (YOU CAN SEE MY NEXT POST FOR ERRORS). So please help me in this regard.

    ReplyDelete
  7. log4j:WARN No appenders could be found for logger (org.docx4j.utils.ResourceUtils).
    log4j:WARN Please initialize the log4j system properly.
    18 [main] INFO org.docx4j.utils.Log4jConfigurator - Since your log4j configuration (if any) was not found, docx4j has configured log4j automatically.
    37 [main] WARN org.docx4j.XmlUtils - Using default SAXParserFactory: null
    294 [main] INFO org.docx4j.jaxb.Context - JAXB: RI not present. Trying Java 6 implementation.
    295 [main] INFO org.docx4j.jaxb.Context - JAXB: Using Java 6 implementation.
    295 [main] INFO org.docx4j.jaxb.Context - loading Context jc
    4160 [main] INFO org.docx4j.jaxb.Context - loaded com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl .. loading others ..
    4294 [main] INFO org.docx4j.jaxb.Context - .. others loaded ..
    4303 [main] WARN org.docx4j.jaxb.JaxbValidationEventHandler - [(non)FATAL_ERROR] : unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}p
    4303 [main] INFO org.docx4j.jaxb.JaxbValidationEventHandler - continuing (with possible element/attribute loss)
    4303 [main] ERROR org.docx4j.openpackaging.packages.OpcPackage - javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
    org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't load xml from stream
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:238)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:210)
    at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:184)
    at asd.main(asd.java:25)
    Caused by: javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector.startElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver.scanRootElementHook(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)






    ReplyDelete
  8. I got an exception java.lang.NoClassDefFoundError:

    ReplyDelete
  9. may i know where you are getting NoClassDefFoundError: exception

    ReplyDelete
  10. i got the following error while i tried to work with jboss 6.1. The same code is working fine with jboss 4.0.
    13:04:34,423 ERROR [org.docx4j.utils.ResourceUtils] Couldn't get resource: docx4j.properties
    13:04:34,438 ERROR [org.docx4j.Docx4jProperties] Error reading docx4j.properties: java.lang.NullPointerException
    at org.docx4j.utils.ResourceUtils.getResource(ResourceUtils.java:45) [docx4j-2.7.1.jar:]

    i tried with the latest docx4j jars (i.e 3.1 and 3.2) but it didnt work for me..

    ReplyDelete
  11. very good post bro thanks it is very useful for me

    ReplyDelete
  12. Thumbs up guys your doing a really good job. altoconvertpdftoword.com

    ReplyDelete
  13. suitable internet site! I basically marvel how it is straightforward to apply upon my eyes it is. i'm wondering how I might be notified whenever a auxiliary kingdom has been made. i've subscribed to your RSS which might also get the trick? Have a critical daylight! docx converter online

    ReplyDelete
  14. Thanks for sharing this post. I'm very interested in this topic. https://onlineconvertfree.com

    ReplyDelete
  15. This information is meaningful and magnificent which you have shared here about the Pdf Conversion. I am impressed by the details that you have shared in this post and It reveals how nicely you understand this subject. I would like to thanks for sharing this article here.

    ReplyDelete
  16. You have given such wonderful information which you have shared here. I am very happy to get some best knowledge from this post. Keep it up. PDF Conversion.

    ReplyDelete
  17. Thanks for publishing such excellent information. You are doing such a good job. This information is really helpful for everyone. Keep it up. Thanks. mt4 India

    ReplyDelete
  18. Excellent job, this is great information which is shared by you. This info is meaningful and factual for us to increase our knowledge about it. So please always keep sharing this type of information. Read more info about IT Outsourcing Company

    ReplyDelete
  19. I unquestionably truly loved all aspects about Free Online Image Size Converter and I likewise have you spared to fav to take a gander at new data in your site.

    ReplyDelete
  20. An Ignissta EML to PDF converter is a tool that allows you to convert emails saved in the EML file format to the PDF file format. This can be useful if you want to save emails for long-term storage or to share them with others in a format that is easy to read and print. Some EML to PDF converters also offer additional features, such as the ability to merge multiple EML files into a single PDF or to convert EML files with attachments. Overall, an EML to PDF converter can be a useful tool for anyone who needs to work with emails in the PDF format.

    ReplyDelete

AddToAny

Contact Form

Name

Email *

Message *