Computerization in Finno-Ugric Studies
(Based on a talk presented at the symposium Access to Information on Finno-Ugric Studies at the 10th Congressus Internationalis Fenno-Ugristarum in Yoshkar-Ola, Mari El, 19 August 2005.)
Preamble: In the planning meeting of the international organizing committee in Yoshkar-Ola in 2002, which no representative of Finland or Estonia was able to attend to, themes of symposia and roundtable discussions were decided obviously in a fairly impressionist manner. To my surprise, I heard afterwards that I had been designated to lead a roundtable discussion on the computerization, whatever that was supposed to mean, of Finno-Ugric languages. I have a vague suspicion that this idea was put forward simply because it sounded and looked good, without any thought at the reality behind these words. This leads on to the main problem of this congress, which seems to be used similarly for keeping up a positive appearance, in order to conceal the existing tensions and problems with the situation of Mari language and culture. I would like to explicitly distance myself from this way of using the Congress as an ethnopolitical figleaf.
After receiving this task, I started looking for potential participants. Most of those I contacted were obviously not interested in participating. For numerous practical reasons, I was not able to establish any cooperation with a related enterprise organized by Estonian colleagues, www.ugri.info. Some answered that they found the idea interesting but would never come to a congress organized in this way, with too little information available. In 2004, I informed the organizers of the Congress that I had to give up the roundtable project, due to lack of interest.
On our way from Finland to Yoshkar-Ola, I quite incidentally heard from Johanna Lilja, chair of this symposium, that I was supposed to participate in her symposium, with a talk in Russian entitled Computerization in Finno-Ugric libraries (!). (The list of symposium participants, of course, had been compiled by the congress organizers, and Ms Lilja could not suspect that I was unaware of this assignment or incompetent in questions of libraries and informatics.) Uninformed of this plan as I was, I had no other alternative but to speak of some general questions that had been in my mind when planning the roundtable discussion. Unfortunately, I had no time to have my paper translated into Russian (nor a sufficient command of Russian to do the translation myself). The translation in situ from English into Russian consumed time and did not allow me to express my critical thoughts on the congress (the criticism I tried to voice in the beginning seemed to cause serious problems for the translators). For this reason, I thought it might be justifiable to publish my talk online – even if a great part of what I was going to say has already been written in my paper Finno-Ugristics in cyberspace (2001) in the Vienna electronic journal WEB-FU (webfu.univie.ac.at).
In my original call for the computerization roundtable, I asked for comments, ideas and questions for a general discussion. What I explicitly did not want were descriptions of individual projects: My (or Our) Parser, My Font, My Language Teaching Software, My Database. These, of course, are very important; as for databases, I would particularly like to refer to a project for a Database of Uralic Language Typology (www.univie.ac.at/urtypol/) initiated in Yoshkar-Ola. However, individual projects are not only just part of the solution – they can be part of the problem. There are many, perhaps too many, individual solutions, tailored fonts, project-specific transliteration or conversion systems, personal(ized) software etc., and far too few ways of sharing them. This means that all over the (Finno-Ugric) world people are inventing the wheel over and over again. In a discipline such as Finno-Ugric studies, with very limited resources, we cannot afford this. But what are the roots and reasons of this problem?
Of course, there are personal background factors (it may be that a greater-than-average percentage of people in the humanities suffers from technophobia) and purely technical problems. All of us do not have adequate computers and up-to-date software at their disposal. However, there are also some problems that are particularly typical of our discipline.
- Many traditions of Finno-Ugric studies have been overwhelmingly empirical, setting material before methods. Complaints of butterfly collecting instead of finding rules and regularities, as found time and again in critical methodological discussions in the humanities, have been rare or non-existent in the history of Finno-Ugric studies. Instead, Finno-Ugrists, in many classical cases working on material collected by strenuous fieldwork, found it more important to save every morsel of information and leave theoretical questions for later study. For this reason, there were very few efforts to create compatible methods and techniques or ways of sharing them.
- Finno-Ugric studies, like other low-volume disciplines in the humanities, do not show a demonstrable progress comparable to that in nature sciences. Material from the late 19th century may be fully adequate or usable, in some unfortunate cases a study that was written a hundred years ago represents both the first and the last word about a certain phenomenon. This means that – as long as large masses of text are not made electronically accessible – paper rules. Practically everything of importance is available on paper, most of it only on paper. And although electronic journals and article databases already exist in the humanities, too, Finno-Ugric studies are – to put it mildly – underrepresented there. Most relevant journals and periodicals already have their tables of content or even keywords and search engines or English-language article abstracts online, but very few offer a complete digital parallel version. (For a short survey, see my German-language Quellenkunde page, homepage.univie.ac.at/Johanna.Laakso/qk.html.)
- Because of the traditional domination of paper media, no discipline-specific culture of electronic publishing has evolved. The structure of Finno-Ugristic publishing is based on traditional commercial principles of printing and selling paper, supplemented by state support for non-profit or small-profit scholarly publications. There are no general Internet portals to Finno-Ugric studies – those existing are based on private enthusiasm, such as the excellent Hungarian-language website Rokonszenv (fu.nytud.hu) maintained mainly by László Fejes, or restricted to non-scholarly goals of general and cultural information (like the highly informative Estonian website www.suri.ee ). Dictionaries and handbooks possibly existing in digital form are bound to copyright regulations and cannot be freely accessed.
A case in point is the Estonian KeeleWeb portal (ee.www.ee) which served users interested in the Estonian language free of charge from 1998 to 2004, offering dictionaries, databanks of personal names and toponyms, links to text corpora and language software etc. Mere enthusiasm was not enough, and KeeleWeb was shut down and replaced by a commercial enterprise, Keelevara (www.keelevara.ee; however, part of KeeleWeb's less profitable (?) material survives in a freely accessible form, Keeleveeb at keeleveeb.edu.ee).
The remedies for this problem could be, correspondingly
- creating a critical mass of freely accessible material, also by digitalizing important paper sources, handbooks, dictionaries and text collections. We have, for instance, the classical Russian dictionary by V. Dal' online at vidahl.agava.ru, and the Finno-Ugric Electronic Library project (library.finugor.ru) offers access to literature in and about most Finno-Ugric languages of Russia, including the great Komi-Russian dictionary in PDF format – when will we have online access to Lönnrot’s Finnish or Wiedemann’s Estonian dictionary?. Covering the costs of this enterprise is, of course, a serious problem.
- overcoming the copyright problems and creating structures independent of the traditional mechanisms of buying and selling printed paper, a commercial world that has very little to do with Finno-Ugric studies. In modern scientific publication, the texts are in most cases already edited by the author in digital or even camera-ready form, so that the role of the publishing house is restricted to printing paper and distributing the prints. In digital publishing, thus, editors are needed (almost) only for scientific expertise, and this is traditionally not done for money but for scientific prestige. Otherwise, publishing digital material online is very cheap, once the basic costs for maintaining the website have been covered. Then why pay?
- agreeing upon standard formats. PDF (Portable Document Format), virtual paper with the assets (platform-independent transferability) and drawbacks (restrictions on computer processing of the text, such as automatized searching) of paper publishing, is already widely used in Finno-Ugric digital publishing (for instance, the ELEKTRA project [www.lib.helsinki.fi/elektra/english.html], or the above-mentioned Finno-Ugric Electronic Library). The next step is UNICODE (www.unicode.org), a world-wide standard designed to encode every character in every language of the world in an unambiguous way for any computer, software or language. The Finno-Ugric phonetic alphabet as well as the characters needed in individual FU languages are officially included as well, and fonts containing all or at least most of these characters and diacritic signs are already becoming available – for instance, the Gentium font (www.sil.org > What we provide > Fonts and writing systems) distributed free of charge. The time of individual tailored fonts for producing my personal version of half-long slightly labialized a on my personal computer, forgetting the rest of the world, should be over.
- creating means of, to quote a famous commercial slogan, connecting people. The philosophical problem underlying all the technical difficulties is the tradition of lone heroes, each working on his individual project, as the supreme expert of this particular question. In a discipline such as Finno-Ugric studies, this is often inevitable; however, working alone does not preclude constant contact with other colleagues, an exchange of methods and techniques which is absolutely vital. What we need are channels of communication (one attempt being the URA-LIST e-mail list [listserv.linguistlist.org/archives/ura-list.html], together with national and more specialized lists), Internet portals and, above all, a new culture of networking, communicating and cooperating, instead of defending our personal claims in the field of Finno-Ugric scholarly expertise. As long as I am content with having my own research published, never mind for whom and with what kind of technologies, there will be no real incentives for developing more accessibility and compatibility.
- creating structures that favour network thinking. As mentioned above, there is no general Internet portal for Finno-Ugrian studies, not even a national philological one. The Estonians at least did have KeeleWeb for some time, while neither Finland (proud of her position as a hi-tech connecting people country!) nor Hungary has managed to create anything like that – the information and links lie scattered on individually maintained WWW pages, some highly expert, some less trustworthy, amateurish or even esoteric, all impossible to evaluate for an outsider in need of information.
To sum up: the real problem with computerization is not a technical one. Technologies exist and can be applied to many kinds of uses. The real problem is the vicious circle of poor accessibility (individually processed and stored, copyright- or licence-protected materials), poor compatibility (no need to make your material accessible to others) and lacking contacts (no others needing access to your material).
Breaking this vicious circle requires the collective effort of the whole scientific community. Finding ways and resources should be discussed within as general a framework as possible. The next Congressus Internationalis Fenno-Ugristarum at Piliscsaba in 2010 could be a good context, provided that a critical mass of representatives of national philologies will be present. Anybody willing to convene a symposium?
Updated 29 September 2005
johanna.laakso@univie.ac.at