Google Translator Toolkit (GTT)

The current edition Translation Journal (http://accurapid.com/journal/49google.htm)has carried an interesting review of the Google Translator Toolkit targetted at translators and that is all the rave in the translation and localization sites. Many players already recommended a thorough understyanding of the tool before applying it especially in professional translations. We have picked some excerpts from the TJ as an introductory review of this product and in view of initiating debate in the Tamarind Linguists' Forum online. Google Translator Toolkit is free...it will have a good response on the semi-professional or non-professional translation market. But first things first: Here is what GTT is. It presents you with a rather well-designed front-end that allows you to do one of two things: You can either upload a file (in HTML, Word .doc [not .docx!], OpenOffice .odt, .txt, or RTF) or you can specify a URL and the corresponding HTML page will be uploaded. When you select the file you need to select the language pair (the only available source language right now is English (!), but there is a choice of 48 available target languages, including double-byte and bi-directional languages). Then you need to choose whether this is a "shared" translation—i.e., whether you are using and contributing to a large anonymous translation memory or whether you would like to upload your own translation memory in TMX format. Here you can also upload or define a glossary. Should you choose to upload a glossary, it needs to be in CSV format with a strictly defined pattern, but it can have fields for part-of-speech or definition aside from source and target. When you have made those choices the upload happens and your original file is displayed on the left pane of a split window (unless you choose horizontal panels in the View menu). On the right side you can see a pretranslated version of the file. The pretranslated material comes first from the translation memory(ies). If nothing is found there—and this is likely since at this point they don't seem to be particularly content-laden—a machine translation with Google Translate is performed. (If you choose to forego the machine translation step, you can select Pre-fill with source text instead of machine translation under Settings.) Before you start the actual translation, you should click the Show toolkit button on the top of the window. This will open a new pane at the bottom of the window that contains four different tabs: Translation Search Results (TM hits), Computer Translation (results from Google Translate), Glossary (hits from your glossary—the number in parentheses indicates whether there are any and, if so, how many hits are being found), and Dictionary (this opens a search box in which you can enter single terms for lookup—no idea where that content comes from). According to which translation segment you highlight in the Target pane, the different tabs will show the corresponding content—if any is found. The file is more or less displayed in WYSIWYG format, meaning that it is displayed the way you would see it in a browser or MS Word. Only when you select an individual translation segment (as in normal TEnTs, this typically corresponds to sentences) is the WYSIWYG view of that segment replaced with a text-only view in which inline codes are displayed as numeric codes with curly brackets ({1})—déjà vu, déjà vu! Unfortunately, it is not possible to enter the numeric codes with keyboard shortcuts. You will either have to use the ones that were automatically entered during the pretranslation and translate around them, or you can highlight the phrase that is surrounded by codes, select Insert HTML tags, select the ones you need, and they will be placed around the phrase (I'm not sure why it says "HTML tags" independent of the source format). Since not all text is displayed in the normal WYSIWYG view of a document (think of keywords in HTML files or footnotes in Word files), there is a list of those at the bottom of the target pane under Hidden Text where they can be translated separately. And while you are working on your translation (or before, or after), you can also invite others to participate in your translation/editing efforts by selecting Share> Invite people. All they need is some kind of Google account. When you are finished with the work on your file, you can download the translated file in the original format (and formatting). And be happy ever after. Or not? Let's look at the tool in more detail aside from its functional aspects. The first thing I noticed when I started to look at it—with the original Google Translation Center still burned in my memory—is that it is much less project management- and process-oriented than what was originally planned. This new tool is aimed at the translator rather than the translation buyer. (Remember that glimpse of Google Translation Center that we all saw, where translation buyers could upload a document and then choose translators to work in it? None of that anymore.) The official Google blog says this: At Google, we consider translation a key part of making information universally accessible to everyone around the world. While we think Google Translate, our automatic translation system, is pretty neat, sometimes machine translation could use a human touch. Yesterday, we launched Google Translator Toolkit, a powerful but easy-to-use editor that enables translators to bring that human touch to machine translation. The only argument that most of us would have with the statement that "sometimes machine translation could use a human touch" would be the "sometimes" (and that it's often more than just the "human touch" that's required). But the statement still tells us quite a bit about the immediate intent: to add the human expertise to Google's machine translation efforts. Remember, Google Translate, unlike engines like FreeTranslation or Babel Fish, uses a statistical machine translation engine that relies on good bilingual data—lots of good bilingual data. And I don't think that there is anything wrong with it if it's just that I am adding data from the translation that I am currently working on. What I did not like was this: As mentioned above, it is possible to upload TMX translation memories, and as you upload you can select whether you want to use it yourself or with others—and that only seems fair. What I did not read before I uploaded a fairly large TM for testing purposes was this: By submitting your content through the Service, you grant Google the permission to use your content permanently to promote, improve or offer the Services. If Google publicly displays any of the content you submitted through the Service, Google will display only portion(s) and not the entirety of the content at one time. This means that even though I can now go ahead and delete my TM (so that I and possibly other users won't have access anymore), Google will continue to use it. That strikes me as odd—to say the least. Also, as you know, TMX stands for translation memory exchange, but it is not possible to get your TM (or your glossary) out of GTT once it's in there. The only thing you can get out is your translated file, though you can, of course, continue to use the TM and glossary content within GTT—but only there. To be fair, this was a problem in the first incarnation of Lingotek as well and they fixed it rather quickly, but we're talking about Google here, and I'm sure they don't do things without thinking them through. (Though other large companies do, and must also humbly concede defeat—see below.) Aside from the thorny intellectual properties issue, there are a number of other things that make me think that GTT in its current state is not for the professional translator (unlike what we saw—or imagined seeing—in the Google Translation Center): •The only source language (and UI) is English (though I imagine that this will change in no time). •There is no possibility to alter the segmentation. •There is no possibility for concordance searches. •There is no way to manage fuzziness and it's not very clear how fuzziness is decided on. •There are no QA tools and the only spell-checking is that of your browser. •The number of file formats is much too limited (no XML, Office 2007, InDesign, etc.). •You can only work on single files at a time. •There are no project management facilities. •There is no link to professional translation formats aside from TMX (such as TBX, TTX, or bilingual Word docs). Where I think GTT will have a good response is on the semi-professional or non-professional translation market. The Wikipedia/Knol feature that allows you to take an English page, translate it, and publish it right to the server will certainly play a role in this (though I admit that I could not make this feature work!). Source(© Copyright Translation Journal and the Author 2009 URL: http://translationjournal.net/journal/48google.htm)

Tamarind Linguists' Forum - Nairobi

20090831

Google Translator Toolkit (GTT)

No comments:

Post a Comment