Peter Nabende
Cite: Nabende P. A Review and evaluation of Machine Translation methods for Lumasaaba. J. Digit. Sci. 2(1), 3 – 17 (2020). https://doi.org/10.33847/2686-8296.2.1_1
Abstract. Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network based models. Moreover the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.
Keywords: machine translation, Lumasaaba, data-driven machine translation, phrase-based statistical machine translation, Neural machine translation.
References
- Brown, G. (1972) Phonological Rules and Dialect Variation: A Study of the Phonology of Lumasaaba. Cambridge University Press.
- Purvis, J.B. (1907) A manual of Lumasaaba Grammar. William Clowes and Sons Limited. URL: https://archive.org/details/AManualOfLumasabaGrammar/page/n7/mode/2up
- Hutchins, W. J.: Machine Translation: History of Research and Applications, Routledge, 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN, chap 6, pp. 120—136. URL: www.hutchinsweb.me.uk/Routledge-2014.pdf
- Diesner, J. (2006) Part of Speech Tagging for English text data. Machine Learning Project Reports pp. 1—8. URL: https://course.ccs.neu.edu/cs4100sp12/resources/POStagging.pdf
- Jurafsky, D. and Martin, J.H. (2014) Speech and Language Processing, vol. 3. Pearson London.
- Taylor, A., Marcus, M., Santorini, B. (2003) The Penn TreeBank: An Overview. Springer Netherlands, Dordrecht, pp. 5—12. URL: https://link.springer.com/chapter/10.1007%2F978-94-010-0201-1_1
- Sleator, D.D. and Temperley, D. (1993) Parsing English with a Link Grammar. In Proceedings of the 3rd International Workshop on Parsing Technologies, pp. 277—292. URL: https://www.aclweb.org/anthology/1993.iwpt-1.22.pdf
- Forcada, M.L., Bonev, B.I., Rojas, S.O., Ortiz, J.A.P., Sanchez, G.R., Martinez, F.S., Armentano-Oller, C., Montava, M.A., Tyers, F.M. (2010) Documentation of the Open-Source shallow-transfer machine translation platform Apertium. URL: http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf
- Ranta, A. (2011) Grammatical Framework: Programming with multilingual grammars. CLSI Publications, Stanford, California.
- Gupta, S. (2012) A survey of data-driven machine translation. URL: www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf
- Nagao, M. (1984) A framework of a mechanical translation between Japanese and English by analogy principle. Artificial and Human Intelligence, pp. 351—354.
- Somers H. (1999) Review Article: Example-based Machine Translation. Machine Translation 14 (2): 113—157.
- Brown, P.F., Cocke, J., Pietra, S.A., Pietra, V.J.D., Jelinek F., Lafferty, J.D., Mercer, R.L., Roosin, P.S. (1990) A Statistical Approach to Machine Translation. Comput Linguist 16 (2): 79—85. URL: https://www.aclweb.org/anthology/J90-2002.pdf
- Koehn, P. (2010) Statistical Machine Translation. Cambridge University Press, UK.
- Nabende, P. (2019) Towards data-driven machine translation for Lumasaaba. In: Antipova, T. and Rocha, A. (editors) Digital Science. DSIC18 2018. Advances in Intelligent Systems and Computing, vol. 850, pp. 3—11. Springer, Cham. URL: https://link.springer.com/chapter/10.1007/978-3-030-02351-5_1
- Katende, J. (2015) Phrase-based Machine Translation between Luganda and English. Masters thesis, Makerere University, Kampala, Uganda.
- Akello, C.K. (2017) Computational models for phrase-based statistical machine translation between Acholi and English. Masters thesis, Makerere University, Kampala, Uganda.
- de Pauw, G., Maajabu, N., Wagacha, P.W. (2010) A knowledge-light approach to Luo Machine Translation and Part-of-Speech tagging. In Proceedings of the second workshop on African Language Technology, European Language Resources Association, Valletta, Malta, pp. 15—20.
- Pa, W.P., Thu, Y.K., Finch, A., Sumita, E. (2016) A study of statistical machine translation methods for under-resourced languages. Procedia Computer Science 81:250—257, SLTU-2016 5th Workshop on spoken language technologies for under-resourced languages, Yogyakarta, Indonesia. URL: https://www.sciencedirect.com/science/article/pii/S1877050916300710
- Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Yepes, A.J., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., Zampieri, M. (2016) Findings of the 2016 Conference on Machine Translation. In Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pp. 131—198, Berlin, Germany. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W16-2301.pdf
- Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., Turchi, M. (2017) Findings of the 2017 Conference on Machine Translation (WMT17). In Proceedings of the second Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 169—214, Copenhagen, Denmark. Association for Computational Linguistics. https://www.aclweb.org/anthology/W17-4717.pdf
- Bojar, O., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., Monz, C. (2018) Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the third Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 272–303, Belgium Brussels. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W18-6401.pdf
- Barrault, L., Bojar, O., Costa-jussà, M.R., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., Malmasi, S., Monz, C., Müller, M., Pal, S., Post, M., Zampieri, M. (2019) Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the fourth Conference on Machine Translation (WMT), Volume 2: Shared Task Papers (Day 1), pp. 1—61, Florence, Italy. Association for Computational Linguistics. URL: https://aclweb.org/anthology/W19-5301.pdf
- Neubig, G. (2017) Neural Machine Translation and Sequence-to-Sequence models: A tutorial. CoRR abs/1703.01619, URL: http://arxiv.org/abs/1703.01619
- Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M. (2017) OpenNMT: Open-source toolkit for Neural Machine Translation. ArXiv e-prints 1701.02810.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., editors, Advances in Neural Information Processing Systems 30, pp. 5998—6008. Curran Associates, Inc.
- Christodouloupoulos and Steedman, M. (2015) A massively parallel corpus: the Bible in 100 languages. Language Resources and Evaluation, 49 (2): 375—395.
- Papineni, K., Roukos, S., Ward, T. and Zhu, W-J. (2002) BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311—318, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’02.
Published online 29.05.2020