PDF FILES TRANSLATION - José Henrique Lamensdorf - translation - tradução

Go to content




There is a new way to translate publications that have sophisticated or complex layout, without the need for a Desktop Publishing operator specialized in the software package originally used in their development, who don't necessarily understand the target language(s), as long as such publications are available as PDF files.

Below is an explanation of the concept of PDF files translation. If you came here to see how it is actually done, please click here.

Historically, PDF – Portable Document Format – developed by Adobe in 1993, is a very handy format, which indeed became a standard in the marketplace. The major features that led to it were:
    • Files are very compact, making their distribution easier, either on physical media (disks) or over the Internet.
    • The same PDF file may be viewed or printed regardless of the computer or operating system used.
    • There is software to view/print PDF files using any operating system, and they are all free.
    • Any computer program capable of printing to a PostSctipt (also developed by Adobe) printer may generate a PDF file.
    • If the file has been properly generated, the user’s computer doesn’t need to have the type fonts originally used to create the document, to display it exactly as the original.

As a result, an overwhelming number of companies began publishing their formerly hard copy catalogs, manuals, folders, and other publications in PDF format. On top of saving an unimaginable quantity of paper, contributing to the environment, updating became much easier and faster.

However it isn’t that simple. Such publications are usually developed using DTP (= DeskTop Publishing) software, typically PageMaker, InDesign, FrameMaker, QuarkXpress, and others. Exception to PageMaker and InDesign, which are “father & son”, mastering each of them requires a fresh learning approach; i.e. each one is a novelty for the user of another DTP package.
Were that not enough, the files each one generates use its proprietary format, incompatible with the others. None of the few converters available between them works well. What they all have in common is the ability to generate PDF files.

It is worth noting that, from the translation standpoint (and others as well), there are two types of PDF files. One is the “software-generated”, “distilled” (as Adobe names it), or "live" type, files which are editable, therefore translatable. The other one is scanned or "dead", where a printed page is converted into a graphic.

In order to make it crystal clear, the letter “O” in a generated PDF is a letter “O” with certain features (font, size, bold, italic, underlined, etc.). In a scanned PDF, the letter O is simply a circle or an oval somewhere in a drawing that takes up the entire page.

The conventional process for translating such publications is complex. Let’s call the Desktop Publisher a DTPer, for short. I am both a translator and a DTPer, however in the second activity I’m limited to using PageMaker.

Generally, the traditional process comprises the following steps:
    • DTPer extracts the text from the original file and sends it to the Translator as a table, to know which original segment corresponds to what piece of the translation.
    • Translator translates on another column, preserving the original table format, as well as the formatting of certain words, i.e. which of them should be in italics, bold, underscored, and sends the table back to the DTPer.
    • DTPer carefully copies and pastes, one by one, each block of text to their right place on the original file. Then, if the text has changed in size, DTPer makes the necessary adjustments. Next, DTPer distills a PDF file, and sends it to the Translator.
    • Translator carefully reviews the publication, looking for missing, surplus, or misplaced text, wrong diacritics due to incompatible fonts, hyphenation mistakes, and others. Translator prepares a list of corrections, either on a separate file, or by means of annotations on the PDF itself.
    • A ping-pong game begins between Translator (or Reviewer) and the DTPer, which will only end when the corrections list is reduced to nothing.

The reason for both translators and translation agencies eschewing PDF files is obvious. Generally, as good DTPers like to create new publications, they take such translation jobs only to fill in their available time. Nevertheless, it’s a nuisance to all involved.

Recent technology brought us a new way to translate PDF files. One day I saw a software named Infix, which allowed someone having practice in DTP to edit PDF with relative ease. At least it was a lot better than going back to the origins and finding someone capable of dealing with that specific DTP program’s proprietary format file. I don’t know if I was the only one to do it, however I wrote to the Infix developers suggesting them to adapt the software for the PDF files translation market. That’s how Infix became a PDF translation tool.

So what is the workflow using Infix?

In my case, using the DTP experience with PageMaker, the PDF files created with any other application became accessible; I wouldn’t have to buy and learn to use InDesign, Quark, FrameMaker, nor the second-tier ones, Microsoft Publisher, PagePlus, or Scribus. The most complex pubs created with Microsoft Word (and converted into PDF files) would no longer pose such a challenge to preserve the layout after translation. So yes, a translator having experience with any DTP software (and Microsoft Word is not one of them), like me, can single-handedly offer pristinely laid-out translated PDF publications. After all, a translator will not be creating a new layout.

The workflow with Infix looks like this:
    • Analyze the document regarding type fonts used, and prepare to get them or find equivalents.
    • Export all text from the PDF into a XML or TXT file. This process will leave tags on both the PDF and the TXT/XML files to enable importing the text back while preserving formatting (i.e. font, size, color, bold, italic, etc.) as well as the position of each text block.
    • Using another program (which may be Microsoft Word), translate the XML file, keeping all tags intact. Check for spelling, grammar, etc.
    • Back to InFix Pro, import the TXT/XML file back. At this time, the program will ask for the partially embedded fonts in the original PDF file, which don’t have all the characters used. Readiness is required to replace them.
    • Closely fine-tune all PDF layout, one text block at a time. Maybe some effects will have to be eliminated (such as shadowed or glowing fonts), or re-created in graphic mode with some other software. Some tables, especially if fonts are replaced with different ones, may need individual adjustments to each cell.
    • Check if there are pictures, i.e. graphic files, with embedded text, translate, and edit as needed.

The translated PDF will be ready, with considerable gains in time and costs. Infix’s learning curve is about as steep as the any high-level DTP program The great advantage is that it works on the common output from them all.

If you have PDF files that you need translated between English and Portuguese, please be welcome
e-mail me, using the e-mail button on the left.

Back to content