It’s not all head in the clouds!!

When the developer of the Word Cloud plugin for SDL Trados Studio first showed me the application he developed I was pretty impressed… mainly because it just looked so cool, but also because I could think of a couple of useful applications for it.

You could see at a glance what the content of the project was and how interestng it might be for you
It looks cool… or did I say that already?

Actually if I’m honest I never got any further than thinking about those two things so this application was a kind of “head in the clouds” app that was almost an interesting experiment that seemed a good idea but we’re not 100% sure why. But there is one more interesting feature to this plugin which is that you can click on the words and it tells you how many occurrences of them there are. This is interesting because if you had a termbase before you start containing all these terms then you have consistency of translation as well as autosuggest capability which could be quite useful. So it’s a sort of term extraction tool without you really having to do any work at all. Well it would be a term extraction tool if you could get them out!

So I looked around to see where the file was that held this information after you created the word cloud and interestingly enough it’s saved in the same folder as the project… and even more interestingly it’s a nice simple XML file! In fact it looks like this:

<?xml version="1.0" encoding="utf-8"?>
<wordcloud>
 <hash>1</hash>
  <words>
   <word text="Advisor" count="43" />
   <word text="Company" count="30" />
   <word text="Construction" count="26" />
   <word text="flowers" count="1" />
  </words>
</wordcloud>

The actual file is much bigger than this as you’d expect but the format is repeated all the way through. You get the word as an attribute followed by the word count as an attribute. This is perfect… I guess you can see where I’m going with this now? I can create a simple xml filetype for Studio that can do two things:

Extract all the words for translation
Only extract those that are above a certain value

I added the second point because you might not be interested in all the words that are not repeated… you might be, but you might not. So if I create the possibility to set this value in the filetype you can make your own mind up and the filetype becomes very useful. So, what two rules do I need for this, and do I even need two?

The first one to extract the words from the text attribute is simple enough:

//word/@text

So this just uses XPath to extract the words from the text attribute. To set the count I can add this into the same expression like this:

//word[@count>5]/@text

So this just means only take the word elements that have a count value greater than 5 (you can change this to whatever you like… 0 if you want everything, or omit the count part from the rule), and then just take the contents of the text attribute. Simple, and now I have this as my filetype parser rules. I added the //* out of habit to ensure nothing else is parsed… you don’t really need it at all in this case:

Now what?

So now I translate the file in Studio. When I’ve done this, keeping in mind the end goal here is a termbase, I need to convert the SDLXLIFF to a TMX (unless the developer of the Glossary Converter adds SDLXLIFF to the convertible file formats ;-)) because from there I can easily create the termbase. Conveniently there is an app on the OpenExchange called SDLXliff2Tmx which will allow me to convert an SDLXLIFF to TMX with a drag and drop.

So the process is OpenExchange all the way… with a little translation along the way.

Wordcloud -> XML -> SDLXLIFF -> TMX -> SDLTB

Now if all that sounds complicated it’s not… here’s a short video to explain the process:

So the Wordcloud plugin has a surprising benefit after all… it’s also a free term extraction tool that takes no effort at all and allows you to create a Project termbase before you start your work. Very cool! One last thing… if you want more information on how to use XPath, or how to create custom XML filetypes, you can find a couple of articles here which might be useful:

More Regex? No, it’s time for something completely different.

Why do we need custom XML filetypes?

It’s not all head in the clouds!!

Now what?

Trending Articles

Stalker hid in bushes leaving his ex 'terrified'

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Teen Shot In Miami Drive-By Dies From Injuries

Black Angus Grilled Artichokes

Notts men wanted over alleged cocaine smuggling plot

Blackstone — Befi Mano (Throw Back Thursday)

D16 Group Phoscyon v1.9.5 Incl.Keygen WiN/MAC-R2R

MCQ Questions for Class 12 History: Ch 10 Colonialism and the countryside

Azura Botanify v1.0 (For FL Studio)-FANTASTiC

LC4245W - TOSHIBA LCD TV - POWER SUPPLY SCHEMATIC [Circuit Diagram]

SANIDAPA LIVE IN HALDADUWANA 2005-06-26

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Police charge man, 23, with assault and criminal damage following incident in...

Man arrested for threatening to shoot up police station

Hizia picha za utupu za meneja wa benki imekaaje?

BO RUSSELL BENDER Arrested by Clackamas County Sheriff's Office on Mar 11, 2020

Ko Droka na Bogi

A Bottle of Dew Class 6 Worksheet English Poorvi Chapter 1

'Exceptionally dangerous' rapist Bradley Trengove from Camborne...

Chaoro Lyrics Translation | Mary Kom - Priyanka Chopra