Knowledge Organisation


Kim H. Veltman, Alexander Churanov, Vasily Churanov, Andrey Kotov

New Combinations of Lectures, Books and Databases

EVA 2004 Moscow, 29 November- 3 December 2004


Lectures are probably as old as language. Through Gutenberg, printed books became important in the 15th century. Databases emerged as a new form in the 20th century. These three forms have traditionally been completely separate. The Internet is bringing a new convergence of forms and methods. This essay outlines developments in lectures, electronic books, and databases whereby these three forms are becoming interconnected.

Lectures are being linked with other databases as well as books. The nature of books is changing through the advent of hyper-linked, hyper-illustrated, omni-linked and dynamic books. Databases are shifting from monolithic, static structures which impose a top-down set of standardized fields, and naming conventions to dynamic frameworks which use authority files to permit local, regional, national and international variants. Such new databases foster cultural and historical diversity. This Internet convergence which is linking lectures, books and databases will foster new connections between personal, collaborative and enduring knowledge.

The ideas for these innovations and the text of this paper are the work of the principal author. The programming for these innovations and the lecture will be presented by three Russian students from Smolensk who are co-authors in carrying out this vision.

1. Introduction

Traditionally lectures, books and databases have been separate realms entailing different media. Lectures have been oral, books have been printed and databases have been electronic. The three have also entailed different kinds of knowledge. Lectures have been been about new personal knowledge; books about enduring knowledge and databases have frequently entailed collaborative knowledge with contributions from numerous individuals often in distributed environments.

Oral lectures, printed books and electronic databases will no doubt continue to have a life of their own. Nonetheless, this essay explores how the Internet is bringing about new combinations of these three forms of knowledge organization. Some aspects of how these combinations are also changing the possibilities of lectures, books and databases respectively will be considered. Implicit in these developments is a new integration of personal, collaborative and enduring knowledge.

2. Lectures

Lectures were traditionally oral. The 20th century brought the use first of black/white slides, then colour slides and occasionally other multi-media elements such as sound (especially music); video, film clips, animations etc. The past decade has seen a veritable invasion of Powerpoint as an alternative that threatens to replace traditional slides. Online powerpoint lectures have become quite common. Meanwhile various efforts are underway to link collections of slide lectures in distributed databanks [1].

Both traditional slides and Powerpoint “slides” typically had captions. Online versions of such images mean that these captions can be consulted. Even so this information has remained isolated and out of context. This is changing. Minimal information in captions concerning Artist/Author and Title of Painting/Book can be linked with databases on artists, authors, paintings and books. Indeed, such lists of images can serve as much more than illustrations of lectures. They can become starting points into more detailed research into the authors or subjects of the images. Methods of searching through databases can thus extend to lectures.

The illustrative materials of such lectures are also very relevant for books but their inclusion was traditionally not possible due to prohibitive costs of printing. In electronic form this changes as we shall note below under hyper-illustration (3.1.i). A net result of such changes has been to link the personal knowledge of lectures with collaborative knowledge through distributed databases and enduring knowledge of books and other materials in memory institutions (libraries, museums and archives).

3. Books

The term book comes from the Greek biblos (ßίβλος) from which comes the word Bible. In China, the Diamond Sutra, the oldest, dated printed book (868 A.D.) heralded a new form which became important during the European Renaissance of the 15th century. As McLuhan made us aware printed books contributed greatly to making knowledge more accessible, but at the same time imposed on limitations on knowledge in terms of a linear, static mode of presentation.

Early electronic books were largely electronic copies of printed originals. During the 1970s and 1980s the advent of markup languages, notably SGML (Standard Generalised Markup Language), opened the promise of separating marked up content from different forms of presentation. This global vision was brilliant but too complex for ordinary users. [2] Technologists promised to create a much simpler version that would address elementary features, eXtensible Markup Language (XML), and to which one could then add more specialized markup languages in specific disciplines and fields such as mathematics or chemistry. XML was in turn to become part of a larger vision towards a Resource Description Framework (RDF). Alas this solution remains too complex for most users. So there has been increasing attention to alternative metadata solutions particularly in terms of links.

3.1. Links

Initially the notion of hypertext and hyperlinks as envisaged by Doug Engelbart and developed by Ted Nelson remained largely in the tradition of the footnote/reference whereby a given word in a text was linked with another set of words either at the end of the text or to some other site. The hypertext and hypermedia communities have been exploring more complex variants on this theme. More recently there are trends towards at least three novel kinds of links, namely, hyper-illustrations, omni-links, and dynamic links.

3.1.i. . Hyper-Illustration

Nothwithstanding many advances in the field of printing, coloured images remain extremely expensive and are therefore kept to a minimum in most books. Hence the use of such coloured images remains one of the serious limitations of the printed book. An electronic version of the same book can provide a simple replica of the printed book, but it can also go much further. By using hyper-illustration an author can potentially introduce different series of images to their book, whereby an amateur might have a few simple illustrations, while an expert is provided with a series of more complex illustrations. These series can in turn be coupled with further series. Hence, while the printed version of a book discusses virtual reality and offers perhaps one illustration, a hyper-illustrated book might provide five examples and allow an interested reader to access an entire lecture with 100 illustrations of virtual reality. This achieves much more than simply overcoming the limitations of printing. It introduces the possibility of layers of images which become more diverse and complex to meet the needs of more advanced readers. Such series of images linked with books can in turn become lectures which in turn lead back to written discussions in articles and books via new multilingual databases.

3.1.ii Omni-links

Hyper-links establish connections between some words in a text and some knowledge/information either elsewhere in the text or in other sources. Such links are particularly useful because they can serve to highlight a specific set of connections (e.g. all the authors in a text, or a given theme, or significant keywords). However, like the beam of light which illuminates some parts, it leaves the rest in darkness.

The SUMS Corporation has developed a prototype for omni-links where every word in a text is hyper-linked without a need to highlight individual words with the customary blue font. This means, for example, that every word in a book on Leonardo can be linked with a database recording Leonardo’s uses of that word in his manuscripts. While this is admittedly of limited consequence in the case of prepositions (e.g. in, out, by) or copulas (is, was), it is extremely useful in the case of significant terms such as his four powers of nature (force, motion, percussion, weight). It is also extremely efficient in that a simple algorithm allows one to make such links automatically rather than needing to make each link manually.

Hyperlinks have typically been one-to-one correspondences between a word in a text and another note or site. Omni-links can function at different levels of knowledge: i.e. the same omni-linked word in a text can be connected with: 1) a term in a classification; 2) a definition in a dictionary; c) an explanation in an encyclopaedia; d) a title in a catalogue or bibliography; e) partial contents in the form of a abstract or a review or f) the full contents of an article or book. Hereby omni-links introduce access to meaning at multiple layers.

Initially such links with definitions will simply be made with standard dictionaries (e.g. Oxford in English, Larousse in French) This can be extended to etymological dictionaries (e.g. Gaudefroy in French and Grimm in German). Eventually this approach points to new kinds of dictionaries that distinguish between ostensive, nominal and real definitions. [3] As such the semantic web will bring access to meanings that vary geographically and historically.

In time this approach in terms of different levels of knowledge can be integrated with different strategies in searching for knowledge: 1) The simplest strategy is a guided tour; 2) A direct strategy searches for only one word; 3) Personal terms opens the search to a cluster of related terms around that word; 4) Database fields expands the search to include topics used by professionals in a given field; 5) Subject headings expands the search to include the official categories of a memory institution; 6) Classification refines that search by showing how such subjects fit into a standard organisation of knowledge; 7) Comparative Classifications expands this process to include alternative systems for organising knowledge; 8) Relations explores the details of a topic in terms of subsumptive, ordinal and determinative relations; 9) Comparative Ontologies extends this analysis diachronically. 10) Creative Emergence applies this principle to new knowledge where precise definitions of fields and domains is still in flux. Using such an approach knowledge in individual books becomes increasingly “intertwingled”, as Ted Nelson would say, with the collective memory of mankind.

3.1.iii. Dynamic Links: Past and Future

Marshall McLuhan made us aware of the paradoxes of print media. They had the enormous advantage of “fixing” a text in the sense of establishing an authoritative version which did not change with every scribe as had been the case with manuscripts. At the same time this fixed version of text meant that one could not simply erase and rewrite a passage. Any changes meant a new printing and usually an entirely new edition.

The association of computers with dynamic ideas goes back at least to 1968, the year the Internet began in Britain and when Alan Kay conceived the idea of a Dynabook. [4] Since then there has been much hype about dynamic links and dynamic link libraries. Today even Microsoft Word has a “Fields” feature that allows one to trace dates as they change.

Just as the world of industry speaks of self-healing products, the world of scholarship envisages self-updating publications. In this scenario, rapidly changing statistics such as what is the fastest computer or how many persons are on the Internet (200 million in 2000 and over 800 million in 2004) would be updated automatically every time that standard websites devoted to these themes are brought up to date.

While intuitively easy to imagine, serious attempts to create self-updating dynamic books will require a considerable adjustment in writing practice. In the past, authors typically focussed on precise information which, ironically, is the most likely to become quickly dated. In future authors may typically focus on more general claims in their texts which are then substantiated by links to standard sites which update “volatile” statistics and information.

4. Databases

Databases emerged as a serious new mode of knowledge organisation in the 1970s as people realised that their multiple fields could be combined in many ways without needing a complete reorganisation of the text as would be the case with a printed list. In the 1980s relational databases, typically based on simple entity relationships became the fashion. In the 1990s, those at the frontiers (e.g. Mylopoulos) [5] expanded attention to other subsumptive relations such as partition and abstraction. More recently there is attention also to non-relational databases.

Amidst these developments has been a more fundamental shift in the role of databases. Three decades ago they were typically proprietary software whereby data in one system could not readily be shared with data in another system. A generation of efforts in the direction of interoperability, the rise of XML and the rise of the open source have fundamentally transformed these limitations. Increasingly there is sharing among databases and also among databases, electronic books and electronic lectures.

5. Conclusions

Discussions of convergence frequently focus on trends whereby the worlds of Internet, telephony and television are coming together; whereby what are still different pipelines today will become interconnected in a single system within the next two decades. This essay has focussed on another dimension of convergence whereby different forms of organising and sharing knowledge, namely lectures, books and databases are becoming increasingly interdependent. While the paper has discussed these trends as if they were merely a theoretical possibility, the accompanying lecture will provide concrete examples from a System for Universal Media Searching (SUMS), which is a prototype for a future System for Universal Multi-Media Access (SUMMA). It is planned that this system will be used for at least some courses at the new European University of Culture which will have branches in Berlin, Bologna, Madrid and Paris. [6]

[1] Cf. the Prometheus project. See:
[2] Even if everyone could be an author, everyone cannot be the equivalent of an editor, a typesetter, a layout person, and the many other professions entailed in producing documents such as books.
[3] For a more detailed discussion see the author’s “Towards a Semantic Web for Culture,” JoDI (Journal of Digital Information, Oxford, Volume 4, Issue 4, Article No. 255, 2004-03-15, p. 19 (Special issue on New Applications of Knowledge Organization Systems.)
[4] On Alan Kay and the Dynabook see:
[5] John Mylopoulos. See:
[6] Cf. the author’s lecture at EVA Moscow 2004 on International Trends in Cultural Repositories.

Download the article