Trans Internet-Zeitschrift für Kulturwissenschaften 6. Nr. September 1998

Some aspects of linguistic information (to be) extracted from the computerised Lexicographical-Terminological Database at the
Foreign Language Centre of the Kossuth Lajos University

Ferenc Rovny (Debrecen)

Part 1.

The Software and the Complex Structure of the Computerised Lexicographical-Terminological Database

http://www.flc.klte.hu/cltd

Introduction:

The Computerised Lexicographical-Terminological Database (hence CLTD) was initiated as a subproject within the framework of a World Bank Project, namely "Catching up with (Western) European Higher Education Fund (CEF), Foreign Language Teaching Development Program, Hungary". The initial planning and implementation took place in the second half of 1993.The project itself was planned to be carried out in the years 1994 and 1995.However, for various reasons, the present version of the software was only developed in 1997.

The main idea of CLTD has been that the prospected users of our database, mainly the lecturers and students of our university need to have complex information, so our entries have a complex structure: the bilingual entries include both terminological and lexicographical data.

 

I. Overview of the Software:

After long deliberations we finally opted for an Intranet solution.

When planning the system we had the following main goals:

Having these goals in mind, we decided to develop an application based on today‘s popular and well-understood Internet technologies. The data is stored on a Microsoft SQL Server. The base of the application is the integrated Web server of Windows NT Server supplemented by the Active Server Pages component of Internet Information Server 4.0,which provides server-side scripting features. Any up-to-date Web Browser can serve as a client.

After this short overview of the environment let us focus on the application itself. It is very important (and at the same time most problematic, tiresome and time consuming) to put already existing entry forms (in Word for Windows 2.0, 6.0, 7.0 format with different "internal" versions [1.1, 1.12, 1.3, 1.5]) into the database. There is no difficulty with the new Web-style entry forms as they are in direct connection with the SQL Server database.

As users belonging to a lot of different "categories" will access the system it has been necessary to ensure the protection of data against illegal use. Our considerations regarding this have been the following:

All the users have to identify themselves by the login process.

Login screen

rovny1.gif (92397 Byte)

Then they can work within the framework of their allotted rights. We have tried to avoid the use of totally different entry forms. So the input, the corrections (by the collector) and the checking/correcting/editing by the ESP lecturer and the specialist lecturer, respectively, are made with the help of more or less similar entry forms. Entries can be made in ANY LANGUAGE PAIRS (of course, this is true only for the languages already existing in the database). Drop-down list boxes are given everywhere where possible to make the collector‘s/user‘s work easier. So it is possible to choose from the items that are already present in the database or are given as pre-set lists. The collector‘s/user‘s personal data fields have to be filled in only once (a feature not present in the Word format form). During queries we can search for ALL the ATTRIBUTES of EACH ENTRY WORD as can be expected of an up-to-date system.

The software is also capable of executing CONCORDANCE and COLLOCATION SEARCHES.

Besides the above-mentioned features our software has been designed to have the possibility of incorporating MULTIMEDIA, that is not only text but audio and visual information as well. (At the moment this part of the software is not functioning yet).

 

II. Detailed Description of the Components of the CLTD Software and their Functions

Software developed for the Computerised Lexicographical-Terminological Database at the Foreign Language Centre, Kossuth Lajos University, Debrecen has three main parts developed by three programmers under the guidelines of the project‘s Software Development Team and the project‘s manager.

A., Transferring the former "old" entry forms filled in by students, stored in Word-style files into the SQL database

In order to solve the above task a program called "Trash Collector" has been created in Visual Basic 5.0 development environment.

The program has the capability of fulfilling the following tasks:

B., WWW Administration, Input, Modification, "Lektoring" (Checking data for linguistic and scientific correctness), Language Versions, Transfer of Data.

This part of the software deals with the database structure of the system and the Web administration.

1., Administration via Internet.

It makes it possible for the system administrators (wherever they may be in the world) that they can fully change the contents of the data tables, e.g. new languages, special fields (domains) etc. can be introduced; or they can modify or cancel data (data.asp).

2., The input, modifying and checking ("lektoring") of the entry form.

These functions are practically performed by an active HTML document (urlap.asp) which facilitates the execution of later changes in the software. The system puts the users into different user groups according to their allotted rights. Some user groups can only search the database, others can input and modify their own data, others can act as "lektors", e.g. they can check the entries from the point of view of linguistic and scientific correctness or some persons (lecturers) belonging to this group can even cancel some entries, if necessary. During the input and modification of the entry forms the sample sentences are broken into words and some statistical examinations are also made. (In the case of the WWW format entry forms this is true for the sample sentence and for a certain part of the sentence BEFORE and AFTER the sample sentence (if any); actually depending on the distance from the "keyword" (=headword) "the span".

3., It has been solved that for the creation of the different language versions

(e.g. English, German, Hungarian etc.) and the so-called "user interfaces" the HTML pages need not be re-written. It is enough to give the translations only. Naturally, it is decided by the user on which language surfaces s/he wishes to work (koll.asp).

4., The procedure of the transfer of the old "Word-style" entry forms stored in the temporary SQL data tables cannot be performed fully automatically, so a very important part of the program accessible only for the administrators transfers the temporary records into their final place and form (i.e.: the right sub-field) by using several conversion tables (atrak.asp).
Software used: Microsoft SQL Server 6.0, 6.5, Internet Information Server 3.0, 4.0 (Active Server Pages, ADO, VBScript), FrontPage’97, ‘98, Visual Internet Develpoment 1.0,Internet Explorer 3.02, 4.0

C., SEARCHES

1., SEARCH.

A "simple" search can be performed for the given word. Here only those data appear in connection with the word that give the meaning of the word (lemma) and make the interpretation of the word easier. These are as follows: English/Hungarian meaning, pronunciation, part of speech, definition and sample sentence. Naturally, the program can handle multiple scores as well.

2., ADVANCED SEARCH.

With the help of this option the content of ANY FIELD in the entry (form) can be accessed and FILTERING conditions can be given for the content of ANY FIELD.

3., COLLOCATION/CONCORDANCE.

The program contains a collocation calculating and concordance option, too. The sorting of data is based on the input sample sentences (and partly on the sentence before/after, if any). The collocation part of the program – besides the necessary statistical items (full occurrence of the word, collocational word occurrence) – is supplied with T-score and MI-score calculating procedures as well.

The program can handle some of the so-called wildcard characters:

The above parts of the CLTD software are the work of Karoly Kecskemeti,Endre Csato (with some initial contribution of Laszlo Pusztai) and Tamas Pusztai, respectively.

 

III. The Complex, Multi / Bilingual, Terminological and Lexicographical Structure of our CLTD can best be illustrated by an entry form:

rovny2.gif (80732 Byte)

rovny3.gif (44611 Byte)

rovny4.gif (58877 Byte)

rovny5.gif (71144 Byte)

rovny6.gif (55970 Byte)

IV. Different Search Tables.

Simple Search

rovny7.gif (112762 Byte)

Collocation Table

rovny8.gif (38207 Byte)

Concordance Table

rovny9.gif (49783 Byte)

Part 2.

A.) Linguistic Information present in the CLTD entry from itself

  1. Pronunciation of the headword

  2. Part of speech of the headword

  3. Stylistic classification of the headword

  4. Extended grammatical information

  5. Stylistic classification of the sample sentence

  6. Possible derivatives etc.

B.) Linguistic Information to be extracted from the CLTD

Besides providing intranet/Internet services it is our aim that – with its gradually increasing data – the CLTD can and should also be used for different research purposes exploiting its complex integrated structure (lexicographical, terminological bi-(multi)lingual etc.) More specifically it can be used for the linguistic research of the "sub-language" of different sciences, especially computing and later chemistry, biology etc. The main purposes to help non-native speakers/writers access and use not only "dictionary-like" information but correct and relevant linguistic information that can be extracted only by sophisticated software tools. Even native speakers/writers have the problem of "looking for words". This is all the more true for non-native ones. And by "looking for words" we actually mean looking for syntactic patterns and semantic fields too. A paper dictionary can hardly give this kind of help. But by looking for concordances of a given word or term one can have the missing information, e.g. which verb goes with which noun? And these findings can be further verified by looking at the collocation table: is the co-occurrence typical and significant, or just a "once combination"?

(example)

AT PRESENT it can be used for

1.1. the detailed examination of the terminology of different sciences and there fields

1.2. making word statistics

1.3. creating concordances and collocations

PLANS FOR THE FUTURE

2.1. Comparative / contrastive terminological examination

2.2. Morphological and syntactic research

2.3. Semantic research

2.4. Creating conceptual network (actually it has been partly realised by the special field classification scheme of our entry form)

2.5. Translation studies

3. Soundly based lexicographical-terminological work; creating (paper / CD) dictionaries of different sciences and branches of sciences and upgrading them.

4. Creating a substantial and balanced terminological corpus and selecting up-to-date, relevant texts for readers for educational purposes.

5. Much firmly based terminological training for specialist translations and average science students.

© Ferenc Rovny (Debrecen)

home.gif (2030 Byte)buinst.gif (1751 Byte)        Inhalt: Nr. 6


Webmeisterin: Angelika Czipin
last change 29.11.1999