Thoughts on Naver Papago with MT Engineer Lucy Park

At the Asia-Pacific Translation and Interpreting Forum—FIT (APTIF9)1 in Seoul, South Korea, in July, I listened to a very dynamic talk by Lucy Park, an engineer for the Korean machine translation (MT) engine Naver Papago. (By the way, Naver is the primary search engine in Korea, and Papago—“parrot” in Esperanto—is its MT engine.)

While Google and Microsoft’s search engines are often talked about as the dominant players in the market of easily accessible, generic MT engines, the proliferation of generic local engines is intriguing. Some European languages have seen the rise of DeepL2, Russian has Yandex.Translate3, Chinese has Baidu Fanyi4 (among others), and Korean has Naver Papago5. It’s no coincidence that, with the exception of DeepL, all of these MT engines are owned by the leading domestic search engines, which, by the very nature of their business, specialize in locating data, and, in the case of MT, reusing that data for their own purposes. (DeepL had a shot at being in the same league, because data location and collection is the core of their previous business, Linguee6.)

We all know that MT is hungry for lots of data, and I find it intriguing that providing reasonably successful MT suggestions requires little more than mere access to data. Local expertise does actually play a role. This is particularly true for Korean, which is especially challenging because of its honorific system (i.e., the nuanced forms of addressing others in relation to oneself). Many translators at APTIF9 FIT assured me that this and other difficulties are handled much more seamlessly by Papago than by its global competitors.

I asked Lucy whether she would be willing to be interviewed to explain this more thoroughly, and here’s the result.

Jost: Naver Papago is a neural MT between Korean and 14 other languages (English, Simplified and Traditional Chinese, French, German, Hindi, Indonesian, Italian, Japanese, Portuguese, Russian, Spanish, Thai, and Vietnamese). It’s very popular in Korea and—according to my impression from those attending APTIF9—widely used by many professional translators. It seems to me that it’s not particularly widely used outside Korea, but correct me if I’m wrong. I had a number of Korean translators tell me that they like Papago because it captures the subtleties of Korean much better than your big competitors, such as Google and Microsoft. Is that your impression also, and can you tell me why that is?

Lucy: The Papago MT team mainly focuses on translations of Korean, English, and other Asian languages such as Chinese and Japanese. So it’s quite flattering to hear that your fellow professional translators are complimenting Papago when translating the Korean language, because that’s actually what we’re trying to do best.

We put in a lot of effort to enhance the quality for Korean translations by acquiring as much bilingual and monolingual data as possible and cleaning the data thoroughly. There are also efforts on the modeling side. For example, last January we launched a new feature where the user can control the honorific level of Korean when it’s the target language.7 We also do various experiments that leverage the characteristics of languages or their corresponding scripts.

J: What about other language combinations without Korean? I (very unscientifically) looked at some translations between English and German, and the quality was not up to the standard of Google or DeepL. The first question is: Are those pivot translations with Korean as the pivot language? The second question is (or maybe it’s more like an assumption): Is your goal really to focus on Korean in combination with other languages and leave the other language combinations up to the “big boys (and girls)”?

L: When we pivot languages, we pivot with the best model available to our team, which doesn’t necessarily have to be Korean.

We focus mainly on Korean because Papago was first created in Korea, so it has many Korean users. That’s why we currently focus on Chinese, Japanese, and Korean characters and English. If our users start telling us they need English>German translation to work better, we’ll try our best to enhance performance for those language pairs as well.

J: I noticed that you’re not offering any ready-made app for using Papago’s MT engine within computer-assisted translation tools. Why? Is the market of Korean translators so small, or is the professional translation market not your focus?

L: We’ve been monitoring the professional translation market, and we think it’s an appealing market. However, we currently don’t have plans to approach this market yet due to other priorities, but this situation can easily change in the future.

J: Your big competitors are committing to not using any text that’s submitted for translation if one uses the paid application programming interface (API), which professional translators typically do. Do you have anything like that, or do you process the data that’s being committed for further training purposes?

L: We take user privacy seriously. We currently are not using API logs for analysis and/or model improvement, but if we do decide to use any data in the future, it will always be with permissions granted from the user.

J: According to your experience with Papago—and this brings us back to the first question—do you think there is a market for language-specific, generic MT engines?

L: Yes. According to my experience, the global translation market is full of diverse needs. Some need fast translation with “okay-ish” quality (as opposed to slow and high quality), while others need domain-specific translation (for medicine, shopping, etc.), all in different situations and for different translation requirements. It’s difficult for one vendor to excel in all areas and fulfill all those needs. Likewise, if the demand is large enough for translation between several languages, there are also opportunities if you can attract that market toward yourself.

J: Is there anything else you would like to share about Papago’s plans in the near future?

L: We’re planning to launch offline translation models soon, and many more features to come! Please keep an eye on us.

Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail Jost Zetzsche at jzetzsche@internationalwriters.com.

Notes
  1. Asia-Pacific Translation and Interpreting Forum—FIT, www.aptif9.org.
  2. DeepL, www.deepl.com.
  3. Yandex.Translate, https://translate.yandex.com.
  4. Baidu Fanyi, https://fanyi.baidu.com.
  5. Naver Papago, https://papago.naver.com.
  6. Linguee, https://www.linguee.com.
  7. See youtu.be/S5x9hS1hzXo.

Lucy Park is a research scientist working on machine translation models at Naver Papago, South Korea’s largest search engine. She received her PhD in data mining from Seoul National University in 2016, where she has pursued various studies on text mining in the fields of manufacturing, political science, and multimedia. Her research interests include multilingual text mining, representation learning, and the evaluation of machine learning algorithms. Contact: lucy.park@navercorp.com.

Jost Zetzsche is chair of ATA’s Translation and Interpreting Resources Committee. He is the author of Translation Matters, a collection of 81 essays about translators and translation technology. Contact: jzetzsche@internationalwriters.com.

The ATA Chronicle © 2019 All rights reserved.