MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI

  • Type: Project
  • Department: Management
  • Project ID: MGT0079
  • Access Fee: ₦5,000 ($14)
  • Pages: 114 Pages
  • Format: Microsoft Word
  • Views: 391
  • Report This work

For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African languages is the mother tongue. Therefore, with content in Swahili, a better and deeper understanding of subject matter takes place. Much of the academic content already exists albeit in English. Therefore, translating this content is the most practical method of getting the content in Swahili. This is especially so since the content is not necessarily new, but just needs to be interpreted.

There already exist machine translation engines, such as Microsoft Translator and Google Translate, which aim to make this task easier. However, African languages are generally under-represented in these engines. The translation results they produce are comparatively inaccurate when it comes to translating content to African languages. They are even more inaccurate when translating academic type of content. This can largely be attributed to the source of data used to train the translation engines. Many machine translation engines make use of corpora made up of phrases that are found in every day speech, into which academic terms are not adequately incorporated.

Wikipedia, an on-line crowd sourced encyclopedia, offers very good sources of data for purposes of translation works.  This study has shown that using Wikipedia as  a corpus can provide a viable source of data for academic related translations and specifically so when it comes to African languages.

Therefore, this project modeled an English to Swahili translation engine that uses Wikipedia as a source of translation corpus data. As an emphasis, this study did not set out to create yet another translation engine altogether, but to just improve on, and complement, a small aspect of the current existing engines. The approach that was used was to compare same language articles in Wikipedia and build a parallel corpus which is then used to create a translation database. It is worth noting that Wikipedia on its own cannot provide a comprehensive data set for

any machine translation engine. As proof of concept this model shows English to Swahili translations and presents preliminary results here. Indeed, further work is required for more accurate output alignment and combining the output to ensure fluency and accuracy.

This study was further motivated by the directive of the Communications Authority of Kenya that aims towards having at least 60% of the media content being local. This content therefore needs to be translated into local languages for presentation purposes. The study proposes a solution that can be scaled to learn and translate other local languages.

Finally it is worth noting that Kenya, like many other developing countries, imports numerous products from foreign countries. Many of these products have their labels and instructions written in these foreign languages, more-so English. This poses a potential threat to consumers who do not understand these languages for example in the case of medical drugs. 

MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI
For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

Share This
  • Type: Project
  • Department: Management
  • Project ID: MGT0079
  • Access Fee: ₦5,000 ($14)
  • Pages: 114 Pages
  • Format: Microsoft Word
  • Views: 391
Payment Instruction
Bank payment for Nigerians, Make a payment of ₦ 5,000 to

Bank GTBANK
gtbank
Account Name Obiaks Business Venture
Account Number 0211074565

Bitcoin: Make a payment of 0.0005 to

Bitcoin(Btc)

btc wallet
Copy to clipboard Copy text

500
Leave a comment...

    Details

    Type Project
    Department Management
    Project ID MGT0079
    Fee ₦5,000 ($14)
    No of Pages 114 Pages
    Format Microsoft Word

    Related Works

    The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African... Continue Reading
    Abstract Fossil Fuels are currently classified as some of the leading producers of greenhouse gases which are major agents of Global warming. This study establishes the benefits Liquified Natural Gas (LNG) would have when used as a fuel in a hybrid-electric vehicle. The study establishes a Parallel-Hybrid vehicle model equipped with a control... Continue Reading
    ABSTRACT Yoruba language is gradually going into extinction because most speakers don't know how to write it despite that it is being taught in Primary and Secondary schools in Nigeria. This therefore call for the need of modern day processing tools such as machine translators for the language to catch up with the technological growth the world... Continue Reading
                             ABSTRACT Machine Translation system is an automated system that translates text from a source language to target language. The source language is the main language upon which the target language is derived, while target language is the semantic equivalence of the source language. The source language and target... Continue Reading
                             ABSTRACT Machine Translation system is an automated system that translates text from a source language to target language. The source language is the main language upon which the target language is derived, while target language is the semantic equivalence of the source language. The source language and target... Continue Reading
    E-learning has enhanced the way students think and learn over time. Different studies show that  students appreciate facilities offered through the e-learning process. It has also proven to be a  very useful tool by bringing education closer to the learner, hence, learner-centred. A point to  note about e-learning is the ability to offer a wide... Continue Reading
    ABSTRACT This was a prospective cross-sectional survey study conducted in the Radio diagnostic Department of National Hospital Abuja from June 2013 to January 2014 on 210 cases (111 males and 99 females). The specific objectives were to determine the: (i) biometric values of corpus callosum in an adult Nigerian population, (ii) differences in... Continue Reading
    Topic: A Comparative Study of Affixation Processes in Swahili and Hausa Languages,  is a research project compiled by Usamatu Suleiman Maiyama a student from Usmanu Danfodiyo University Sokoto, Nigeria. The research is at aiming to findout the possible distinctions,  relationship and similarities there present in both languages (Swahili and... Continue Reading
    Topic: A Comparative Study of Affixation Processes in Swahili and Hausa Languages,  is a research project compiled by Usamatu Suleiman Maiyama a student from Usmanu Danfodiyo University Sokoto, Nigeria. The research is at aiming to findout the possible distinctions,  relationship and similarities there present in both languages (Swahili and... Continue Reading
    ABSTRACT This work was done purposefully to concretely shed light on the translation of meaning of hyponyms and hyperonyms without compromising the quality of the translated text. The work was based on the theory of semantic relation based on the translation of “la Question Educative au Burkina-Faso, Regards Pluriels” and the analysis of the... Continue Reading
    Call Us
    whatsappWhatsApp Us