TRANSLATION OF THE TEXTS USING MACHINE TRANSLATION SYSTEMS: LINGUISTIC ANALYSIS OF THE EXAMPLES FROM LITERARY AND SCIENTIFIC TEXTS

Download PDF

Current machine translation approaches include rule-based
machine translation (RBMT), statistical or corpora-based
machine translation (SMT or CBMT), hybrid machine translation
(HMT) (Nikolaev I.S., 2017: 158) and neural Machine
Translation (NMT). The architecture and algorithms represent
machine translation (MT) systems’ methods and ways to solve
the Natural Language Processing (NLP) tasks. To study results of
current state-of-the-art approaches, we performed a linguistic
analysis of two different texts, translated by three Russian MT
systems – PROMT Translator, Yandex Translator and ETAP-4
MT system. Text 1 is the first excerpt from the science-fiction
novel “Hitchhiker’s Guide to the Galaxy” by Douglas Adams
(Adams D., 1979: 12), and Text2 is the second excerpt from the
academic article “Asymptotically Optimal Contextual Bandit
Algorithm Using Hierarchical Structures” by Mohammadreza
Mohaghegh Neyshabouri et al. (Neyshabouri M.M., 2018: 923).
Observed results demonstrate the effectiveness of the different
approaches employed by the examined MT systems.
Text 1: “Far out in the uncharted backwaters of the unfashionable
end of the western spiral arm of the Galaxy lies a small
unregarded yellow sun. Orbiting this at a distance of roughly
ninety-two million miles is an utterly insignificant little blue
green planet whose ape-descended life forms are so amazingly
primitive that they still think digital watches are a pretty neat
idea”.
Text 2: “We propose an online algorithm for sequential learning
in the contextual multiarmed bandit setting. Our approach is to
partition the context space and, then, optimally combine all of the
possible mappings between the partition regions and the set of
bandit arms in a data-driven manner. We show that in our
approach, the best mapping is able to approximate the best arm
selection policy to any desired degree under mild Lipschitz
conditions”.
PROMT Translator is an RBMT system. This approach is based
on bilingual dictionaries, morphology and syntax rules of source
language (SL) and target language (TL). The translation of text1
reveals an accurate parsing, but some incorrect morphological
affixes and suffixes. Conversely, in text 2, the academic style is
misinterpreted due to the lack of specific scientific terms.
Therefore, these errors are explained by the lack of wordforms
and expressions in RBMT system dictionaries.
Yandex.Translator is an HMT system that combines SMT and
NMT approaches. In the beginning, this system trains on a
massive parallel corpora to make probabilistic language models
and translation models of SL and TL. In translating, the system
uses the CatBoost algorithm to calculate the context of sentences
and chooses the most probable translation model for each part of
the text. The linguistic analysis of two translations revealed errors
such as incorrect syntax and wrong translations of scientific
terms. Consequently, in MT systems based on a statistical
approach, the main cause of mistakes is the absence of translation
models in training parallel corporas.
ETAP-4 MT System use the lexical functions to systematize
semantic relationships between lexical units of SL and TL. In
addition, this MT system trains on syntactically marked texts
from the Russian National Corpus. As a result, several mistakes
are detected in translations: extra words, incorrect morphological
affixes, incorrect syntax and untranslated words. This results in
the system’s inability to correctly recognize several expressions,
due to the complexity of the original grammatical structures.
Thus, the linguistic analysis of MT systems translations reveals
that the quality of machine translation largely depends on the
appropriate choice of machine translation systems, based on the
user’s knowledge of the underlying methods used by various
systems to solve NLP problems.
Keywords: machine translation, NLP, analysis, machine
translation systems

Jane M. Zakovorotnaya
Southern Federal University
Rostov-on-Don, Russia
e-mail: haylin65@yandex.ru

Adams D. 1979. The Hitchhiker’s Guide to the Galaxy, Del Rey
Books, UK, 163 pp.
ETAP-4 MT System 2019. The automatic online translation
service // http://cl.iitp.ru/ru/etap4 [Accessed February 22 2019].
Neyshabouri M. M., Gokcesu, K., Gokcesu, H., Ozkan, H. 2018.
Asymptotically Optimal Contextual Bandit Algorithm Using
Hierarchical Structures. IEEE Transactions on Neural Networks
and Learning Systems 30 (3): 923-937.
Nikolaev I. S., Mitrenina O.V., Lando T.M. 2017. Applied and
computational linguistics, 2 ed. Moscow, 316 pp.
Promt Translator 2019. The automatic online translation service.
https://www.translate.ru [Accessed February 20 2019].
Yandex. Translator 2019. The automatic online translation
service. https://translate.yandex.ru [Accessed February 21 2019].