A Review and evaluation of Machine Translation methods for Lumasaaba

Peter Nabende

Cite: Nabende P. A Review and evaluation of Machine Translation methods for Lumasaaba. J. Digit. Sci. 2(1), 3 – 17 (2020). https://doi.org/10.33847/2686-8296.2.1_1

Abstract. Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network based models. Moreover the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Keywords: machine translation, Lumasaaba, data-driven machine translation, phrase-based statistical machine translation, Neural machine translation.

References

  1. Brown, G. (1972) Phonological Rules and Dialect Variation: A Study of the Phonology of Lumasaaba. Cambridge University Press.
  2. Purvis, J.B. (1907) A manual of Lumasaaba Grammar. William Clowes and Sons Limited. URL: https://archive.org/details/AManualOfLumasabaGrammar/page/n7/mode/2up
  3. Hutchins, W. J.: Machine Translation: History of Research and Applications, Routledge, 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN, chap 6, pp. 120—136. URL: www.hutchinsweb.me.uk/Routledge-2014.pdf
  4. Diesner, J. (2006) Part of Speech Tagging for English text data. Machine Learning Project Reports pp. 1—8. URL: https://course.ccs.neu.edu/cs4100sp12/resources/POStagging.pdf
  5. Jurafsky, D. and Martin, J.H. (2014) Speech and Language Processing, vol. 3. Pearson London.
  6. Taylor, A., Marcus, M., Santorini, B. (2003) The Penn TreeBank: An Overview. Springer Netherlands, Dordrecht, pp. 5—12. URL: https://link.springer.com/chapter/10.1007%2F978-94-010-0201-1_1
  7. Sleator, D.D. and Temperley, D. (1993) Parsing English with a Link Grammar. In Proceedings of the 3rd International Workshop on Parsing Technologies, pp. 277—292. URL: https://www.aclweb.org/anthology/1993.iwpt-1.22.pdf
  8. Forcada, M.L., Bonev, B.I., Rojas, S.O., Ortiz, J.A.P., Sanchez, G.R., Martinez, F.S., Armentano-Oller, C., Montava, M.A., Tyers, F.M. (2010) Documentation of the Open-Source shallow-transfer machine translation platform Apertium. URL:  http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf
  9. Ranta, A. (2011) Grammatical Framework: Programming with multilingual grammars. CLSI Publications, Stanford, California.
  10. Gupta, S. (2012) A survey of data-driven machine translation. URL: www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf
  11. Nagao, M. (1984) A framework of a mechanical translation between Japanese and English by analogy principle. Artificial and Human Intelligence, pp. 351—354.
  12. Somers H. (1999) Review Article: Example-based Machine Translation. Machine Translation 14 (2): 113—157.
  13. Brown, P.F., Cocke, J., Pietra, S.A., Pietra, V.J.D., Jelinek F., Lafferty, J.D., Mercer, R.L., Roosin, P.S. (1990) A Statistical Approach to Machine Translation. Comput Linguist 16 (2): 79—85. URL: https://www.aclweb.org/anthology/J90-2002.pdf
  14. Koehn, P. (2010) Statistical Machine Translation. Cambridge University Press, UK.
  15. Nabende, P. (2019) Towards data-driven machine translation for Lumasaaba. In: Antipova, T. and Rocha, A. (editors) Digital Science. DSIC18 2018. Advances in Intelligent Systems and Computing, vol. 850, pp. 3—11. Springer, Cham. URL: https://link.springer.com/chapter/10.1007/978-3-030-02351-5_1
  16. Katende, J. (2015) Phrase-based Machine Translation between Luganda and English. Masters thesis, Makerere University, Kampala, Uganda.
  17. Akello, C.K. (2017) Computational models for phrase-based statistical machine translation between Acholi and English. Masters thesis, Makerere University, Kampala, Uganda.
  18. de Pauw, G., Maajabu, N., Wagacha, P.W. (2010) A knowledge-light approach to Luo Machine Translation and Part-of-Speech tagging. In Proceedings of the second workshop on African Language Technology, European Language Resources Association, Valletta, Malta, pp. 15—20.
  19. Pa, W.P., Thu, Y.K., Finch, A., Sumita, E. (2016) A study of statistical machine translation methods for under-resourced languages. Procedia Computer Science 81:250—257, SLTU-2016 5th Workshop on spoken language technologies for under-resourced languages, Yogyakarta, Indonesia. URL: https://www.sciencedirect.com/science/article/pii/S1877050916300710
  20. Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Yepes, A.J., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., Zampieri, M. (2016) Findings of the 2016 Conference on Machine Translation. In Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pp. 131—198, Berlin, Germany. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W16-2301.pdf
  21. Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., Turchi, M. (2017) Findings of the 2017 Conference on Machine Translation (WMT17). In Proceedings of the second Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 169—214, Copenhagen, Denmark. Association for Computational Linguistics. https://www.aclweb.org/anthology/W17-4717.pdf
  22. Bojar, O., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., Monz, C. (2018) Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the third Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 272–303, Belgium Brussels. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W18-6401.pdf
  23. Barrault, L., Bojar, O., Costa-jussà, M.R., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., Malmasi, S., Monz, C., Müller, M., Pal, S., Post, M., Zampieri, M. (2019) Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the fourth Conference on Machine Translation (WMT), Volume 2: Shared Task Papers (Day 1), pp. 1—61, Florence, Italy. Association for Computational Linguistics. URL: https://aclweb.org/anthology/W19-5301.pdf
  24. Neubig, G. (2017) Neural Machine Translation and Sequence-to-Sequence models: A tutorial. CoRR abs/1703.01619, URL: http://arxiv.org/abs/1703.01619
  25. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M. (2017) OpenNMT: Open-source toolkit for Neural Machine Translation. ArXiv e-prints 1701.02810.
  26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., editors, Advances in Neural Information Processing Systems 30, pp. 5998—6008. Curran Associates, Inc.
  27. Christodouloupoulos and Steedman, M. (2015) A massively parallel corpus: the Bible in 100 languages. Language Resources and Evaluation, 49 (2): 375—395.
  28. Papineni, K., Roukos, S., Ward, T. and Zhu, W-J. (2002) BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311—318, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’02.

Published online 29.05.2020