Knowledge people, language people - with Jochen Hummel (updated with transcript!)
Meet Jochen Hummel, one of the founders of Trados. Even if you're not familiar with his name, you have almost certainly heard of the piece of software that he helped create and that many translators rely upon each and every day to get their work done. Jochen and I talk about how Trados came to be, what he's been up to since selling Trados to SDL and what the future holds for language technology. Take a listen!
- Maria Pia Montoro interviews Jochen Hummel
- Long term memories: Trados and TM turn 20
- Iko Knyphausen
- Colloquium on Principles and Practices of Translation and Interpretation in the Multilingual European Union, February 11-12, 2015
- Second Life | Metaversum | Twinity
- What's the difference between taxonomies and ontologies?
- What Europe needs isn’t just a Digital Single Market, but a Multilingual one
- The Imperative of a Multilingual Digital Single Market
- Introduction to Coreon: Knowledge meets language
- Multilingual provision is cheaper than English-only
- EU Commissioner calls multilingualism business hurdle, advocates language tech
- LTI Cloud
- Seth Grimes
- Bonus: Jochen and the Berlin Affordable Art Gallery
- Music: Feeding Pigeons, by Podington Bear
- Photo: Captured from YouTube
Transcript of the episode
You're listening to episode 26 of LangFM, the podcast about people at the intersection of language and technology.
Before I introduce today's guest, I have great news. LangFM was nominated for best podcast about interpreting at the 2016 ProZ Community Choice Awards, for the second time in a row. And, also for the second time in a row, LangFM actually won first prize. This year, I'm sharing the honours with Verónica Gutiérrez, my Mexican colleague and host of TerpWise. Check it out at http://www.terpwise.com. A big thank you to all of you who have nominated or voted in the ProZ awards, thank you to all the guests who agreed to come on the show and thanks to you, yes you, the listener, of course.
Now, on with the show. My guest today is Jochen Hummel. Even if you're not familiar with his name, you have almost certainly heard of the piece of software that he helped create and that many translators rely upon each and every day to get their work done. I'm speaking of Trados, of course. Jochen and I talk about how Trados came to be, what he's been up to since selling Trados to SDL and what the future holds for language technology. Take a listen!
First of all, I would like to talk a little bit about your personal background, so maybe you can tell us about where you're from and how you got into the world of languages and technology.
OK. Well, then I have to go way back. Because I've been around in this industry for a while. I' originally from Stuttgart, Germany. The first company I founded was actually Trados. At that time, IBM was starting to localise their products - probably one of the first, if not the first IT company going into mainstream localisation. They were looking for people with some computer background and some language background to help them localise their products. I didn't really have a good language background. I spent some time in the US, so I spoke some English. I knew a bit about computers, and so that's how I got into this whole field.
So the language skills were basically, well, not self-taught, but from your time in the US. But the technology skills, did you teach them to yourself? Because this was pretty much early days in terms of computers and technology in general.
Correct. I never had the pleasure of studying informatics or software engineering, so yes, indeed, I'm an autodidact and self-taught, which was possible at that time. With some good books and with a lot of patience and the will to fight your way through, you could kind of teach yourself how to program.
In terms of programming, we're talking about what? BASIC, Pascal? For the nerds among us…
At that time, the very first one was BASIC. Then very quickly Pascal. You're right: that was the time. At IBM, of course, I learned how to program on mainframe machines. Actually, I learned quite a bit at IBM. It's mid-eighties we're talking about, they had, at that time, many technologies which are standard today but at that time were very, very advanced. They already had email, for example, they had tagging languages for tagging documents, sort of an early XML/HTML. A lot of stuff that was very advanced at the time.
How did it work at IBM - were you a consultant or an employee of IBM. What was the relationship?
I started as a contractor, as a freelancer, and then very quickly out of this founded, together with a school buddy, Iko Knyphausen, Trados. IBM was one of the very first companies to make use of computers in translation. That's where I learned the concept of translation memories and terminology management systems. They were running on very expensive hardware, and of course it was impossible for normal translators to buy and use these this technology. That's where we thought: This could also be done on a PC, at affordable cost, and that's how the whole idea for founding and creating Trados started.
So you saw there was a need or maybe a potential need for translators using this new technology and then that's how you started Trados?
That's were I got to know the concept of these technologies. We thought: Hey, that's a that's a cool technology, that's a great way of doing things. That's how we got into the development. Trados started as a service company, but then we very quickly got into developing software in that field. The whole thing started.
I think the first applications were MultiTerm and Translator's Workbench. Is that correct?
MultiTerm being a tool to manage terminology which, I think, is especially important for big companies because they want to have or need to have consistent terminology. Translator's Workbench, if I'm not mistaken, is the translation memory tool. For those who've never heard of translation memory: basically, the more you translate, the more it helps you keep your work consistent. If you've translated something before, it will recognize that and give you that as a suggestion. Is that a fair summary?
That's a fair summary, yes. To put it in one sentence: Never translate the same thing twice. We didn't invent the concept, but we made it commercially accessible for normal translators.
I'm taking this from a Wikipedia article (anybody who's interested can can read it themselves) - I think you had a few big clients who helped you very much in the beginning. Microsoft were among them, and the European Commission, which of course has a huge need for consistent terminology and translation.
When we won Microsoft and the European Commission, Trados was already quite an established business. That helped us to get to the next stage. In 1997, we won both customers. But already before, we were quite established among freelance translators and industry businesses.
I'm just trying to imagine what that was like. You were a company already, maybe not a huge one. Did Microsoft and the Commission approach you, saying “We heard you have this great product, we'd like to use it”. Or did you pitch it to them? How did that work?
It was 1997. By that time, we were already fully established, so we were doing our marketing the way we could. We were already going to CeBIT, for example, the big computer trade show in Germany which was really cool at the time. We went to conferences, we were exhibiting in different places. We were already well know then, and when you land a deal with the Commission, that's a very long sales cycle. We were pitching and presenting and talking to these people for years already. For Microsoft, their supply base had been using Trados for years and they figured out that it was the only third-party tool that was mission-critical for their release process. You're right, at the time we were still a fairly small company, and that kind of scared them. They thought: Oops, what happens if somebody buys Trados and then takes the tool off the market or does something different with it? Then we're exposed when releasing our products. That's when they actually bought a share in Trados to have a foot in the door.
There's one question I've always been curious about as an interpreter. I dealt a little bit with language technology when I was at university and I was always wondering if you ever had any plans to release a product that was more geared towards interpreters, maybe a simplified version of MultiTerm for glossaries?
Yeah, I mean, we talked to interpreters and these requirements did come up. But it was never so clear-cut. There were so many other things to do in our traditional market, it never really materialized. I know that some interpreters were using MultiTerm to prepare for certain gigs, but we never coupled it with voice or more advanced things to support interpreters.
It's interesting to see that it was at least on your radar. In 2005, Trados was acquired by SDL. Did you stay on in the company or did you move on to other adventures after that?
I stayed there for a couple of months to help the integration. But after six months or so, I left the company.
So now we have 2016. Can you give us a high-level overview of what you did in the meantime? And then we can talk a little bit about your current projects.
After Trados was sold, I stayed on with SDL for 6 months or so to help to integrate. But then it became pretty clear that there was no real future there for me. I did something very, very different because in the Trados transaction, I had a pretty tough non-compete. For quite a while, I couldn't really continue in that space. Also, having worked for Trados for more than 20 years, I was also ready for doing something new. I went into something totally different: online social gaming. I don't know if you remember that product which was pretty hyped in that time, Second Life, a virtual world? We wanted to build a kind of Second Life product, but not putting it into a fantasy setting but rather into a real setting, to have a virtual world which is actually a copy of the real world, to do all kinds of more practical things. That business I founded in 2006, it was called Metaversum, the product was called Twinity. Maybe, technically, it was a bit too early, it was quite a challenge to get this done. In 2008 or 2009, we couldn't really close a financing round and passed the business on to an Australian company, which is now running it. And then I came back into the space of language, with Esteam and Coreon, two companies (Esteam I joined and Coreon I co-founded), which I'm running today.
The idea behind Coreon is to fuse knowledge management and language management. (I did my homework there!) Can you tell us about what that means in practice? What do you offer, what's the thing you're working on?
Well, there are two different worlds: the world of knowledge and the world of language. You would think that they should overlap quite a bit, but as a matter of fact they don't. People dealing with knowledge go to different conferences, use different products and talk to different people then the language people. When you process knowledge, typically, you try to relate things, you create a taxonomy or an ontology. Every node represents a certain meaning and you create links between them. That's the classic way of processing knowledge and, of course, there's software to do that. Now, these nodes are language-independent because they represent a certain meaning, like a profession or a product or a tool or whatever. But as humans, when we talk about things, we use language to talk about it. In the end, these nodes in the knowledge graph are the labels or language or terminology. Language people, on the other hand, they're mainly concerned about finding the right word - either for content creation or translation; they want to find the right word to express something, the right word in the context or used by your organization. If they use a good terminology database like MultiTerm, they also have a concept-oriented approach. They try to group things by meaning and then collect the different labels in the different languages. But these concepts, they're not really linked to each other. It would be very helpful for the language people if these concepts were structured in a graph, taxonomy or ontology because that would allow them to explore language and to understand how these things relate to each other. For the knowledge people, it would be very helpful to use the resources which the language people create because there's a lot of knowledge in terminology databases or other linguistic resources. Many of these knowledge graphs could be created automatically or at least semi-automatically by making use of the resources that language people are sitting on. We want to bring these two things together and that's exactly what Coreon is doing.
It actually sounds rather complicated. It's clear what you're trying to do but I imagine it to be quite complicated. Do you use technology for that, like machine translation or something like that? How do you approach it?
You're right, it's not simple. That's the challenge for us in rolling out the product, and first developing it, in rolling out such a product and explaining and establishing the concept. Whenever you bring two worlds together, both sides have to learn a bit. They have to learn about the benefits. On the other hand, that's exactly what Coreon is doing: make something which could potentially be complex, but also make it very easy, visualize things in a very easy way. It allows you to search and browse and explore databases in a way you otherwise could never do. We also support people in creating these resources. For many purposes, you have to create these resources anyhow, in one way or the other. A tool like Coreon makes that much easier.
I think that ties in nicely with the next topic I wanted to talk about: multilingualism in Europe and the Digital Single Market…
Absolutely. That's one of the areas where Coreon could be a tool which helps a lot to enable cross-border interoperability, cross-border e-commerce.
There's a big push going on at the moment for a digital single market, an initiative by the European Commission which covers a lot of things. Stuff like geo-blocking, electronic payments and just making the Single Market more fit for the digital age. We all have seen those problems: trying to order something online in another country is often more difficult than ordering something from the US, strangely enough. That's also what you said right now, and I think an example you like to use is online shopping. If you're looking for new shoes or a photo camera, that's very difficult to do across languages and across countries because there's no easy way for easy searching and then getting lots of results from different countries. It may be cheaper somewhere else, or you may be able to get a better product somewhere else - it's very complicated, apparently.
Let's say you have an e-commerce company in Germany, and now you want to sell to your neighbouring countries. The first thing which comes into mind for people is, okay, we have to translate our website. Yes, you have to do this, you have to translate what I call static content. Your website needs to be translated. And then, of course, people think: But I also need to translate my product catalogue, the dynamic content. A bit more difficult. An online catalogue is probably more dynamic, changes more often than a printed catalogue. But both - translating static and dynamic content - is something which, in a way, is just a different medium. But it has been done already for decades, and there are companies out there who can do this and will do this for you. It's difficult and expensive, but in the end you will find people who will do this for you.
But before people come to your website, they have to find you. You're probably familiar with concepts like search engine optimization and search engine marketing. You need to make sure that you're found on the internet. If you want to reach your French customers, as a German company, they will use French terms for searching for offers in a certain category. Your search engine optimization, search engine marketing, needs to work in other languages, too. You need to be able to manage the terms and your content not only in one language, but in many languages. Once people have found your website, they will start to search for products, and again: they might not search exactly for the terms your translator has used, but maybe for other terms. People often name things in a colloquial way and not necessarily the way which is correct, so you want them to find the right products in your catalogue. So here, you're very quickly in an area where you have to categorize your knowledge, your company terminology, not only translate it, but somehow make sure that things like search engine optimization and search engine marketing, product search, that all these things also work in other languages.
Where do you see this going forward? Is it going to be almost everything machine translation, done automatically or by machines? Or will human translators and human experts still play a big role? Do you see a sort of synergy between the two? What's your take on where things are heading at the moment?
it will be a mixture. What I often find with people who praise certain solutions - in the academic area, they always try to push one certain concept - but when you have a complex task, you always use many tools and somehow tie them smartly together. It will always be a mixture of everything, depending on the resources you have at hand, depending on the money you have at hand, depending on the size of the problem and so on. What is important is that companies understand and have their knowledge and the knowledge about their markets clear and straight, that they understand it, have access to it, and that they have it in many languages. That's an effort, but it can be done. In the end, you often talk only about 5.000 or 10.000 concepts, you already have the knowledge of a company covered and you can have that in 20, 30, or 40 languages. Once you have this, you can decide where to use machine translation, where you need computer-assisted translation, what quality levels you need. For many things, machine translations is absolutely enough. Then you can pick and choose and select the tool which is best for a given task.
In one of your blog posts, you also mention current trends like big data and machine learning and so on. Are you working with that already for for the projects you have going on at the moment? Does that play a role?
It's trendy, yes. People have different ideas about what Big Data means. I would rather call that smart data because the databases we're working with are not that big. It's not billions of records, rather millions. But there's a lot of knowledge in that data. I gave you the example before, with e-commerce, where you have to enable your customers to search for your products, customers have to be able to find you. But on the other hand, in e-commerce, customers also talk back to you. Customers give you feedback, for example. Customers do talk about your products and about your site on social media. They do it in their own language. It's very important for companies to mine this knowledge. If you're smaller company, there won't be billions of tweets about your products, but maybe thousands or tens of thousands, in different languages, maybe hundreds of thousands. To get the information out of this data in different languages can be crucial for the success of your company. And again: it requires that you have your knowledge well mapped out, then you can use data analytics, sentiment analysis, different tools to answer the questions and get the insights you want to have about certain markets, certain products.
There's one thing I've been wondering about, maybe you have an idea. Multilingualism in Europe has a rich tradition, obviously, but many people say it's an obstacle, we should maybe all switch to English-only, that would make things so much easier. On the side, we have the US, which are basically unilingual. Maybe Spanish is playing a bigger role, but it's basically English. Interestingly, many language technology innovations come from the US. Google Translate is a very obvious example. I'm wondering, do you have an idea, is it because Europe is so fragmented and there are no big players? Why do you think that is?
I can give you a long answer to that. There are many reasons. First of all, you're right. I would say it is a barrier, definitely, it doesn't make it easier that we have this many languages in Europe. For European businesses, especially when you come from a smaller country, it's hard to grow. Growing abroad, growing into different cultures, different languages, is not easy. The American company of course has the advantage that in a fairly homogeneous market with one language, one set of rules, they can address 300 million well-funded customers and can very quickly grow to a certain size. But every problem, every challenge is at the same time also an opportunity. The American business will grow very quickly to 200, 250 million before they go abroad. At that time monolingualism or the American way of doing things is in the DNA of the management team but also in the DNA of the product. They have a harder time, typically, on the world market, on the global market, whereas the European company is faced with this much earlier. In Europe, especially within a professional environment, most people have a certain command of English and you get away often without translating your product, the global market is definitely multilingual. Countries like China, Japan, Brazil, Russia, they will never accept another language. The global market is definitely multilingual, and you can turn the challenge into an asset, because then you're already better prepared. I wish the European politicians would understand this instead of trying to avoid this thorny subject or even suggesting that people should rather learn English. They should take this as an opportunity because being able to deal with languages and with foreign cultures is a huge asset. The products you mentioned, like Google Translate - many of these technologies actually were originally developed in Europe, or it's Europeans who run these projects. Many technologies and many ideas and many of the people who are working on this, they actually originate from here, no surprise. The Americans, of course, are very smart in hiring, buying the right companies. And there's another reason why there's such a renaissance recently in natural language processing in the US and that's the whole area of artificial intelligence. I think artificial intelligence is really on the brink of becoming mainstream and changing the world. Changing the way we're working. You won't communicate with AI with a keyboard. You will talk to intelligent cars and robots and so on. And the knowledge AI needs to process: I always say half of all knowledge is probably textual, the other half is numerical. Whatever is textual is language and whatever is language is multilingual - in order to build smart AI systems, you need a lot of natural language processing. Big companies like IBM and Google and others are investing heavily in these fields. If Europe isn't careful, we might in the end, in a couple of years license language technology from the US although most of these language technologies originated here. And we have the problem, we should be the world leaders in that, but unfortunately there's not the drive, the push, the awareness of what kind of opportunity this would be for Europe in IT.
I absolutely agree with you, very interesting answers there. Coming towards the end, I would like to talk a little bit about the other initiative you're working on. LT Innovative, LT Observe and others. Recently, at the end of May, there was a conference here in Brussels, the LTI Summit, which is a very interesting conference format actually. People who have a need in terms of language or language technology can get together with vendors or companies who have solutions in the field of language technology. Can you tell us about how you how you started that?
Disclosure: I'm chairman of LT Innovate. It's an industry association of about 200 language technology companies. Whatever deals with language technology is quite fragmented because it's a complex technology. Very often, you have a company which takes care of language technology in a certain language, so it's also fragmented over the language dimension. As a result, many LT companies are rather small and highly specialized in, let's say, sentiment analysis in Italian or speech recognition in Finnish or I don't know what. Many small, highly specialized companies with cool technology. But they I have a hard time selling the technology, especially today with so much noise out there in the market. In order to overcome that fragmentation, we thought it would be good to have an industry association so we can speak with one voice, that we can collaborate and do these kind of things. Also, when you put yourself into the shoes of a buyer, when you want to have a product which supports all European languages, you end up licensing very different technologies. You might license the Scandinavian languages from company A, the Baltic languages from company B and German from this company, you have to put a lot of technologies together. All these companies have different license agreements, different business models, they run on different operating systems, they have different APIs and so on. It's a nightmare. Very often, technologies are actually there, it's not that technology doesn't exist, but it's not really accessible, not really accessible technically but sometimes also business-wise. You end up with 20, 30 different license agreements. What we try to do is create a software service platform where, with one contract and one business model, companies who want to support multilingual can discover, test, prototype and then also license different technologies with one contract and under one platform and under one business model. That's of course beneficial to the people who use language technology, but also for the people who are offering it. Most of these companies, as I said, are rather small and typically have a hard time selling or offering their products with a software-as-a-service model. That's what the LTI Cloud is about, what LT Innovative is trying to push, so that it becomes easier to make use of language technology.
Just a final question: During the recent conference here in Brussels or during earlier iterations, there are these buyer challenges. Was there ever a project or a proposal, a challenge where you said wow, that's really innovative, I would never have thought of that or I would never have expected something like that?
Yeah. There were very interesting challenges, some of them are about being able to talk to your car, but not only in five languages, in all European languages. That's more about how to scale a rather known technology. But we had one buyer challenge from a company that wanted to automate the way people work with law firms or lawyers. You come to the office, you talk to the lawyer, you describe your case, and then they need to understand what's going on. They need to start a search and find similar cases and related laws, find contracts which are matching. That's a lot of searching and depending on how well your lawyer understands and knows where to search, it takes more time or less time. Imagine you would use, while you're talking, automatic speech recognition, couple it with a multilingual knowledge system like Coreon, which starts to understand where in the knowledge graph we are actually talking about and what kind of legal problem we are addressing. And already, while you're talking, it already links to reference cases and reference documents, documenting the whole thing while talking, and in this way automates the interaction between a customer - or somebody looking for legal advice - and the lawyer. Many of these things are cross-border, often they are cross-border, as long as you stay within a legal context, it's monolingual, but when you go across the border, you have cross-border issues, cross-border contracts, and then you very quickly come into cross-language search, into machine translation, into all these technologies which we intend to offer under the LTI cloud.
Interesting, that's a very nice project.
That's an example. And there you already see, if you want to build such a thing, these technologies are all there! You don't really have to invent language technology, most of the stuff is there. It's more a question of how do you put it together, what kind of database or knowledge system you have below so that you can process the information, and then how do you plug all this together so that it works smoothly and that it supports many languages if you are in a European context.
I think that the next conference is in November, LT Accelerate. Is that something related?
Yes, the next conference is in November, LT Accelerate. It has a slightly different focus. We do this together with a US expert on sentiment analysis, his name is Seth Grimes. He has an important blog and runs a sentiment analytics conference in the US. We do this together and it's more focused on small business, more solution-focused and focuses more on multilingual text analytics, you could say. A very cool conference, I really like that one, every time I go there, I get so many good ideas and very interesting pitches. It's really worth coming.
If people wanted to know more about these different projects you are working on, what's the best way for them to find out more?
If they want to know more about what I'm doing at Coreon and Esteam, our websites are a good place to go, and they can subscribe to our newsletter and also read our blog. When it comes to LT Innovate, we have a directory of all companies there, and the LTI cloud is now up and running. What we have done to make it easier to understand how these different technologies play together - because whenever you deal with language, it is complex, just the way that processing language is not easy - what we have done is for the most important use cases for e-commerce, we have created so-called solution templates or solution architectures. We show how to do product search, how to do customer support, where do you need voice, where do you need machine translation, where do you need terminology management. We show how these different components play together, explain it a bit and then link them to companies who can take care of this specific technology or that cover specific languages. LITcloud.eu solutions are a very good place to understand what can be done with language technology and how these different components interact. That would be a good first place to get started. If you want to know no more, well then, of course, you can drop me an email.
This has been episode 26 of the LangFM podcast, a conversation with Jochen Hummel of Trados and now Coreon fame. You can find more information and links in the show notes over at http://www.langfm.audio. I also recommend you listen to episode 21, where I talk to Australian-born and Brussels-based communication strategist Mathew Lowry about language technology and related topics. Again, that's http://www.langfm.audio. If you enjoy listening to my podcast, you should subscribe in iTunes or a podcast app of your choice, so you don't miss future episodes. And boy do I have great episodes coming up. While you're in iTunes, why not leave a rating or a review. That would be much appreciated and really helps other people find out about the podcast. Thank you - and talk to you soon, on LangFM.