|Trans||Internet-Zeitschrift für Kulturwissenschaften||16. Nr.||Mai 2006|
Presley A. Ifukor (Institute of Cognitive Science, University of Osnabrueck)
"Things should be made as simple as possible but no simpler." - Albert Einstein
The field of Natural Language Processing (NLP) is one of the areas of technological advancement that demonstrates man’s ability to engineer artificially intelligent systems to codify, analyse, transform and process human languages creatively and dynamically. NLP represents the attempts of modern man to universalise the ideals of the western world in creating a ‘global language’, at least from the utility point of view. To what extent are African languages catered for in the global digital village? Are the African socio-cultural ideals incorporated into this globalisation drive? In this paper, we assess several fundamental issues and propose innovative ideas for African languages’ text technology, as they affect the African culture and linguistic consciousness.
It is imperative to commence this discussion by recalling d’Orville’s (1996:484) prediction a decade ago:
Information and knowledge...have strategic consequences for the global power constellation....The one country that can best lead the information revolution will be more powerful than any other. Comparative advantages are henceforth expressed in the ability of countries to collect, process, tie together, act upon, and disseminate information through communications, information processing technologies and complex information systems. (emphasis in italics, this writer’s)
Proper information dissemination serves as the bedrock of a holistic knowledge acquisition. There is hardly any aspect of life in the 21 st Century that is not either positively or negatively affected by information and communication technology (ICT). The means and media of information transfer, therefore, become crucial from the perspective of the ‘source’ as well as that of the ‘recipient’. What is being transmitted constitutes another issue entirely. For the sake of this discourse, we shall examine the roles played by both the ‘source’ and the ‘recipient’ of text (or data) processing technologies. By text technology, we refer to the means of making spoken languages written and thereafter making written languages machine-readable (or computer-readable). It involves encoding and processing the character sets of any human language. Beyond the teething problems of font and hardware, we shall explore the influence of the technologically-empowered West and what Africans can do to benefit from altruistic gestures from the West. The goal is that Africans should come up with initiatives on how to effectively utilise what is available while striving to evolve a truly indigenous African text processing technology. A good starting point is "the transfer of cultural consciousness into a computer system, making the computer a natural extension of the society it serves" (Yacob, 2004: 1) but the ontologies should incorporate policies, principles, parameters and practices that cut across national, regional, cultural, linguistic and socio-economic boundaries.
It is a truism that he who pays the piper calls the tune. As Africans, the impacts of colonialism are still visible in every aspect of our national life. It is being human to guard against the recurrence of that ugly past in our history. While not underrating the educational and technological assistance Africa has received from the West so far, there is the tendency of another possible type of modern imperialism or colonialism using the same means of education and technology. One can call this ‘civilised colonialism’ or ‘digitized dominance’. Call it d(igital)-imperialism, d-colonialism, e(lectronic)-colonialism or e-imperialism, we are referring to one and the same set of ICT-related issues as they affect African languages and the African cultural ideals. D-imperialism or e-colonialism includes the imposition of digitized data processing techniques in one language on another (irrespective of clear language-internal peculiarities). According to Venkatesan & Nambiar (2003:778, 779), "The history of several Asian and African nations attests to the effects of industrial colonialism and highlights the potential for information to become a tool in spawning a new breed of colonialism. ...The impact of e-colonialism can potentially be just as devastating as that of mercantile colonialism in the nineteenth century." The edge-of-the-wedge technology (Vijay 1999) of the World Wide Web is centred in the US. "Most data flows out of the US than flow in. Small wonder that Internet backbone operators reap the whirlwind by building more bandwidth to the US" (Vijay, 1999). Europe and the US have engaged each other on who should control data flow on the internet. But the fact remains that supremacy of rights over what must or must not appear on the internet is in the West.
Taylor (2000) aptly captures the nightmarish reality of typesetting some African languages. He classifies African languages into five groups depending on the relative difficulty of handling character sets in each language. Taylor (2000: 6 - 7) represents the degrees of complexity ranked from levels 1 to 5 (1 being the least complex while 5 is the most complex).
Group 1 Languages: These languages employ the same Roman alphabet as most Indo-European languages. There are no accents marked on the letters. This means that these languages can be processed with existing fonts and software, and can be represented on the web. Examples of African languages in this category are Swahili and Somali. There is little or no difficulty processing these languages.
Group 2 Languages: Languages in this category require diacritics (i.e. accents over vowels) akin to some Indo-European languages such as French, Spanish or Portuguese. This means that they can be represented using existing standard fonts and software but the user has to be familiar with how to access the diacritics. Compared to Group 1 languages, Group 2 poses a slight learning difficulty. Tswana is one example of languages in this group.
Group 3 Languages: These are languages that use diacritics and sub-dots beneath vowels. Some nasal consonants are also marked with tonal accents. The difficulty stems from the unconventional use of conventional fonts. However, the combination of tonal marks and sub-dots cannot be effectively handled without special fonts and software. Examples include Yorùbá and Igbo.
Group 4 Languages: These are languages which clearly require a number of special characters that do not exist in the standard fonts of European languages. A popular example is the Hausa ‘hooked consonants’. Hausa is one African language that has been extensively represented on the web. Other examples of languages in this group are Twi and Krio.
Group 5 Languages: This group is the most difficult to handle because of the non-Latin nature of the characters. Standard European fonts are incompatible with these languages. Indeed, these are ‘special’ languages requiring ‘special’ solutions. Amharic, Arabic, Tigré and Tigrinya are examples of such languages.
We find Taylor’s (2000) classification plausible. Besides, there are other hurdles that compound the problems of text technology in African languages. The major problem is the non-indigenous official languages of most African countries (thanks to the dividends of Euro-centric colonialism). As a result, literacy level in African languages is still very low compared to the number of people proficient in the official European languages. The non-standardisation of orthographies for some African languages does not help matters. The plethora of ICT materials in non-African languages discourage ‘willing’ patriotic Africans coupled with the high cost of training. We address possible ways of remedying these problems in the last section of this paper.
The efforts of several groups of Africans at home and in the Diaspora are worthy of commendation for the struggle to preserve African languages digitally and the attempts to nativise NLP. Osborn (2006) reports efforts in this direction. A few of these are briefly mentioned:
Improvisation of keyboard characters for some African languages (e.g. Tavultesoft’s Keyman program, African Languages Alternative initiative’s (ALT-I) Ibadan Yorùbá keyboard etc).
The presence of African languages on the internet (predominantly in Latin character sets).
Indigenous web browser in Luganda (cf. Otter, 2004).
Open source software for some South African languages (e.g. Zulu, Sepedi, Afrikaans).
Corpora projects in Swahili on Unix via the internet (cf. Helsinki Corpus of Swahili (HCS) ).
Ontologies provide the formal specifications for knowledge representation, intelligent information integration, and efficient information sharing practices between humans and machines as well as the possibility for re-usability of tools. Essentially, ontologies harmonise eclectic data models for homogenous applicability. According to Gruber (1995), an "ontology is an explicit specification of a conceptualization. ... A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose." For Uschold & Gruninger (1996), "ontologies enable shared understanding and communication between people with different needs and viewpoints arising from their particular contexts. ...An [explicit] ontology may take a variety of forms, but necessarily it will include a vocabulary of terms and some specification of their meaning (i.e. definitions)." Quoting the Shared Re-usable Knowledge Bases (SRKB) email list, Uschold & Gruninger (1996) summarise the nature and function of ontologies:
Ontologies are agreements about shared conceptualizations. Shared conceptualizations include conceptual frameworks for modelling domain knowledge; content-specific protocols for communication among inter-operating agents; and agreements about the representation of particular domain theories.
Adapting ICT to the African context, the ontologies should embed the philosophy of ethnocomputing. Ethnocomputing is an eclectic computing framework that incorporates a cultural perspective in the problem solving methods, conceptual categories, structures, and models used to represent data or other computing practices (cf. Tedre et al, 2006). Such an approach will make the emerging tools to be of use to the highest number of researchers possible and complement the multilingual character of the tools. Broadly, the ontologies should weave together the following:
Relevance to the existing infrastructure;
Relevance to indigenous demands;
Relevance to local users; and
Relevance to the African culture and society.
At a more formal level, the design of ontologies should meet the following minimum requirements (cf. Gruber, 1995):
Clarity: In simple, clear, understandable natural language, ontologies should effectively communicate the intended meaning of defined terms.
Coherence: The concepts, axioms and definitions expressed in ontologies should be coherent and consistent with established theories and practices.
Extendibility: Ontologies should be designed to be dynamic and not static. That is, based on the existing vocabulary, ontologies should offer a conceptual foundation for accommodating anticipated tasks.
Minimal encoding bias: The design of ontologies should be as neutral as possible for implementing different representation systems and diverse styles of representation.
Minimal ontological commitment: Ontologies should make as few claims as possible about the world being modelled.
The place of Unicode and XML in ICT applications for African languages cannot be overemphasised. "With Unicode, the information technology (IT) industry has replaced proliferating character sets with data stability, global interoperability and data exchange, simplified software, and reduced development costs" ( http://www.unicode.org/versions/Unicode4.1.0/ ). XML is a mark-up language which allows for ease of information transfer and exchange across diverse computer hardware, operating systems and applications. While Unicode does not satisfy all the current needs of data processing in African languages, it is still the best available. Developers and ICT researchers should represent their data in XML to facilitate re-usability and extendibility.
Our proposed global digital data ontologies are those which integrate policies, principles, parameters and practices that cut across national, regional, cultural, linguistic and socio-economic boundaries. The practical side of our proposal is premised on Venkatesan & Nambiar’s (2003:779) stance that:
What is required are strategies to confront the issues presented by e-colonialism by bridging the technological divide created by the information revolution through improved technological access to the developing world.
We advocate strategies that have long-term benefits for the technological development of Africa with respect to data processing in African languages. Our approach follows the logic of ‘self survival’ ethics as our suggestions are down-to-earth too. The crux of our suggestions is that ‘charity should begin at home.’ We address issues from the grassroots. A wholesale importation of digital tools from the West makes us to be perpetually at the mercies of the West. It is high time African governments and ICT researchers began to ‘look inwards’ and localize available ICT resources that satisfy indigenous yearnings. The dream of a digitally-developed Africa is the task of all. No one would build our house for us if we do not have a burning desire to change the status quo. Desire, of course, is not sufficient without the technical know-how. Interestingly, the traditional African worldview is neither capitalistic nor exploitative. The highly-skilled should educate or train the semi- or less-skilled no matter how insignificant the effort is viewed to be. After all, it is little drops of water that make an ocean. If the training chain continues, with time Africa would "arrive" there!
There has to be genuine commitment and sincere (non-parasitic) supports: commitment, on the part of Africans and African governments; and support from the West. The following should be seriously considered as they apply to the local needs of individual African communities:
It should be obligatory for all undergraduate African students in African universities to take a graded course in an African language as one of the requirements for graduation. We are aware that many universities in Nigeria make it obligatory for undergraduates to take a mandatory course on General African Studies. Language is, perhaps, the most pragmatic vehicle of transmitting African cultural values from one generation to another. It beats one’s imagination to see Africans being groomed to communicate only in ‘borrowed’ languages.
Undergraduate students whose major course of study is Linguistics should spend one semester (at least) to carry out a documentation research on a lesser-investigated African language (which is preferably not their mother tongue).
Undergraduate students of Indo-European languages and literature should be encouraged to take stipulated courses on African languages and linguistics.
Provisions should be made for all tertiary institutions’ students to have the ‘feel’ of data processing in a major African language.
Governments should provide incentives for instructors of African languages. If teachers of African languages are made to know that they are no less important than their counterparts who teach European languages, the former would be motivated to do more. In our view, dedicated teachers of African languages contribute more to the well-being and future of Africa.
Governments should officially (via legislation and committed enforcement) support and provide funding for research activities on African languages.
African languages should never be projected as playing second fiddle to Indo-European languages. No language is more important than another. However, literacy in indigenous languages alongside Europeans languages should be encouraged and promoted.
There should be concerted efforts at the standardisation of orthographies (that is, in languages where more than one orthographic convention is presently in use).
Excellent performance in African languages by students should be rewarded.
More bilingual (or multilingual) literacy materials should be produced for the African general public.
It is only when these suggestions are accepted and the plans vigorously implemented that any talk about technological assistance from the West can really be meaningful and productive. Then such technical aids will have long-lasting impacts. At the global level, modern data processing tools will be able to cater for as many languages as possible if the following strategies are incorporated into the ontologies of digital resources:
Right from the outset (i.e. at the earliest planning stage), the envisioned digital tools and software should have a multilingual design in scope and application. For instance, it may be optimal to make the tools available in the following languages, for a start: English, French, Spanish, German, Chinese, Japanese, Hindi-Urdu and Swahili. We believe that ICT researchers in any part of the globe would be proficient in at least one of these languages. Thereafter, translation into other languages can be done.
The tools should be simple and easy to learn (ideally with the aid of tutorials and guided-examples).
In the process of developing software and data processing applications, the morphosyntactic peculiarities of African languages should be taken into consideration.
As far as possible and practicable the tools should be free and open source. Whenever payment is required, this should be subsidised for researchers in developing countries.
The design of the tools should allow for flexibility and an extensible mark-up language (XML) should be used.
The applications should be portable and platform-neutral.
The resources should be highly Unicode-compatible.
If possible, the tools can integrate a simple scripting language (either Python or Perl) for ease of data manipulation and transformation.
An active global collaborative research structure should be put in place for harmony of ideas and computing practices.
Perhaps there is no better justification for why attention should be focused on a digitally-equipped Africa other than that which Taylor (2000: preface) echoes: "Today, Black Africa is in urgent need of better means for transmitting vital information about health issues, agricultural techniques and other means of improving life in the African countryside and the rapidly-growing cities." When supported, African ICT researchers are up to the task. Microsoft Corporation is already doing something worthwhile in this regard but we solicit more of such assistance from other private corporations, governments and the international community.
© Presley A. Ifukor (Institute of Cognitive Science, University of Osnabrueck)
African Languages Alternative initiative (ALT-I) . URL: http://www.alt-i.org/home.htm Last accessed on 2006-05-21
d’Orville, Hans. "Tackling Information Poverty". In: Hans d’Orville (ed.) Beyond Freedom: Letters to Olusegun Obasanjo. New York: Irene Connors, Collage Press. 1996: 483 - 494.
Extensible Markup Language (XML) 1.0 (Third Edition). URL: http://www.w3.org/TR/REC-xml/ Last accessed on 2006-05-21
Gibbon, Dafydd, Inge Mertins & Roger Moore (Eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. Dordrecht: Kluwer Academic Publishers, 2000.
Gruber, Tom. "Towards Principles for the Design of Ontologies Used for Knowledge Sharing". In: International Journal of Human-Computer Studies, 43(5/6). 1995: 907 - 928.
Gust, Helmar, Presley Ifukor, Kai-Uwe Kühnberger, Danish Nadeem, Mario Negrello, Svetlana Polushkina, Udo Wächter & Carsten Wittenberg. Constructing an Analogical Intelligent Tutoring System, Technical Report, Institute of Cognitive Science, University of Osnabrueck, Germany. 2006.
Helsinki Corpus of Swahili (HCS). URL: http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Last accessed on 2006-05-21
Ifukor, Presley. " Enhancing Manpower Development with Computer Literacy and Cyber Skills". In: Ekiti Kopa , A Publication of the National Youth Service Corps (NYSC) Secretariat, Ekiti State, Nigeria. 2002: 31.
Ifukor, Presley. "Modelling the Mapping Mechanism in Metaphors". In: Journal of Cognitive Science 6 (2005) : 21 - 44.
Ifukor, Presley. "Sociolinguistic Variation in Second Language Competence: English Prepositional Usage among Nigerian Bilinguals. In: M. Vliegen (ed.), Variation in Sprachtheorie und Spracherwerb. Frankfurt: Peter Lang Verlag. 2006: 133-144.
Microsoft Corporation . "Microsoft Enables Millions More to Experience Personal Computing Through Local Language Program." Microsoft Corporation March 16, 2004. URL: http://www.microsoft.com/presspass/press/2004/mar04/03-16LLPPR.mspx Last accessed on 2006-05-21 .
Osborn, Donald. "African Languages and Information and Communication Technologies: Literacy, Access, and the Future". In: John Mugane, John Hutchison & Dee Worman (Eds.) Selected Proceedings of the 35 th Annual Conference on African Linguistics. Somerville, MA: Cascadilla Proceedings Project. 2006: 86 - 93.
Otter, Alistair. "Uganda gets indigenous language browser." Tectonic , Sept. 15, 2004.
URL: http://www.tectonic.co.za/view.php?action=view&id=342&topic=Linux Last accessed on 2006-05-21
Taylor, Conrad. Typesetting African Languages: Report of an Investigation. 2000. URL: http://www.ideography.co.uk/library/afrolingua.html Last accessed on 2006-05-21
Tedre, Matti, Erkki Sutinnen, Esko Kähkönen & Piet Kommers. " Ethnocomputing: ICT in Social and Cultural Context". In: Communications of the ACM 49 (1), 2006: 126 - 130.
The Unicode Standard, Version 4.1.0 . URL: http://www.unicode.org/versions/Unicode4.1.0/ Last accessed on 2006-05-21
This Day . "Microsoft Endorses Due Process in IT." This Day (Lagos), Dec. 15, 2004. URL: http://www.mscenter.edu.cn/laputa/article/2004-12/1/16/10241.xml Last accessed on 2006-05-21
Uschold, Mike & Michael Gruninger. "Ontologies: Principles, Methods and Applications". In: Knowledge Engineering Review 11(2), 1996: 93 - 155.
Venkatesan, V. S. & Neetha Nambiar. "E-Colonialism - The New Challenge of the 21 st Century". In: Mehdi Khosrow-Pour (ed.)Information Technology and Organizations: Trends, Issues, Challenges and Solutions. Philadelphia, USA: Idea Group Publishing. 2003: 778 - 779.
Vijay, Srinivas. "Who Serves the Net Game ?". Computers Today, July 16 - 31, 1999.
Yacob, Daniel. "Localize or be Localized: An Assessment of Localization Frameworks". Paper presented at The International Symposium on ICT: Education and Application in Developing Countries, Addis Ababa, 19-21 October. 2004:1 - 9.
8.2. Weltbürgertum und Globalisierung
Sektionsgruppen | Section Groups | Groupes de sections
Inhalt | Table of Contents | Contenu 16 Nr.