TERMINOLOGY

Literature and Glossary

Page Index

Literature

Shiri A., Powering search. The Role of Thesauri in New Information Environments, New Jersey 2012

Aitchison J., Gilchrist A., Bawden D., Thesaurus Construction and use: A Practical Manual, 4th. ed. London 2000

Hodge G., Systems of knowledge organization for digital libraries: Beyond traditional authority files, Washington 2012

ISO 25964-2, Information and documentation - Thesauri and interoperability with other vocabularies – Part 2: Interoperability with other vocabularies, Geneva, March 2013

BS 8723: Structured Vocabularies for Information Retrieval, London 2005

ANSI/NISO Z39: Guidelines for the construction, format, and management of monolingual controlled vocabularies, Bethesda, 2005

SKOS: http://www.w3.org/2004/02/skos/

Glossary

Controlled vocabulary
A controlled vocabulary is a structured list of descriptors. Each descriptor is a preferred term with an unambiguous, non-redundant definition. Descriptors in a controlled vocabulary can have hierarchical, equivalent or associative relations. A controlled vocabulary is managed by an authority, this can be a thesaurus manager or a centralized organization responsible for managing the vocabulary. Controlled vocabularies allow a standardized way of indexing collections in a local database or online catalogue. It is also a powerful tool for web search queries and for sharing data on the web. Thesauri, classification systems, taxonomies and subject headings are types of controlled vocabularies. They are also referred to as authority lists.
Information retrieval
Information retrieval is the activity of obtaining information using information retrieval systems. These systems can be collection databases, library indexing files, web browsers etc. Studies on information retrieval focus mainly on the effectiveness of applications specifically developed for information management and retrieval, such as controlled vocabularies in databases.
Interoperability
Interoperability is the ability of systems and software to exchange information. Interoperability can be achieved by following standardized procedures, e.g. by using standards written by the International Organization for Standardization (ISO) for the development of thesauri. When organizations use the same set of rules for a certain activity, they can inter-operate or work together more efficiently, e.g. for creating mutual information systems such as online catalogues.
ISO-norms
ISO-norms are created by the International Organization for Standardization. The organization has published more than nineteen thousand international standards covering all aspects of technology and business. The standards are developed by topic, such as information and documentation. They are written and supervised by a committee of experts and offer internationally acclaimed rules and procedures. The ISO 25964-1:2011 (part 1) for example, contains valuable information on thesauri and interoperability with other vocabularies.
Linked Data
Linked data or linked open data (LOD) is a network of information (or digital objects) on the world wide web. This network of digital objects can be obtained when documents, images, thesaurus concepts etc. are represented by URIs. When publishing data in URIs, the data can be shared and reused on the web. Computer systems can easily make links between different resources. The goal of linked open data is to optimize accessible information on the web. URIs can be linked using RDF, a computer language developed by the World Wide Web-consortium (W3C). The basic principles of LOD were coined by Tim Berners-Lee (computer scientist and “inventor” of the world wide web):
  1. Use URIs to denote things.
  2. Use HTTP URIs so that these things can be referred to and looked up by people.
  3. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL.
  4. Include links to other related things (using their URIs) when publishing data on the Web.
Mapping
A procedure where elements in a structured dataset (e.g. in a metadata scheme) are linked to elements in another dataset.
Thesaurus
A thesaurus is a type of controlled vocabulary. It is considered the most elaborate form of vocabulary, as it contains a large amount of information. Terms in a thesauri are related to each other by hierarchical, equivalent and/or associative relations. A hierarchical relation means that one term is considered broader or narrower than another, expressing for example a “sort of” relation: a guitar is considered a narrower term of a musical instrument because a guitar is a “sort of” musical instrument. It is a vertical relation. An equivalent relation means that several terms are considered equal, but one term is to be preferred to another. For example, house and dwelling are synonyms, but in a thesaurus one term will be preferred and the other will be alternative. This relation is horizontal. An associative relation represents non-direct relations: the term is not a narrower nor a broader term, nor is it a synonym, but there is a relation anyhow. Guitar can be a narrower term of musical instruments and guitar tabs can be a narrower term of sheet music. Even though they do not have a parent-child relationship, guitar can be linked to guitar tabs via an associative relation.

Terms in a thesaurus are considered unique and can have a unique identification number (reused in a URI). Their meaning and use are described in scope notes.

In the SKOS-model, developed by the World Wide Web Consortium (W3C), terms in a thesaurus are considered concepts. This is because, in SKOS, not the term is important, but its hierarchical, equivalent and associative relations, as well as all the additional information it contains, expressed in URIs. A term refers to the lexical string of syllables and vowels, whereas a concept refers to a unit of thought expressed in a formal computer language. Because of the formal characteristics of concepts, language barriers can be overcome when linking and retrieving resources.
URI, URL
URIs are references to digital objects. These objects can be images, texts, movies, but also metadata-records in a collection management system. There are two types of URIs. A URL (Uniform Resource Locator) is an identifier of the place where something is located and a URN (Uniform Resource Name) give the record a fixed name. The URIs should also be persistent identifiers.
Query
A query is a search in a search engine of a local database, online catalogue, web browser etc.
RDF
RDF is short for Resource Description Framework, a format developed by the World Wide Web Consortium (W3C) for exchanging data on the web. It is based on the principle of object and subject and the relation between them. The relation is a predicate. Object, subject and predicate are RDF triples. SKOS, developed to express knowledge information systems such as controlled vocabularies and exchangeable in RDF, is completely built on triples. If guitar is a narrower concept of the concept musical instruments, this will be translated in RDF as: guitar (=subject) → narrower as (=predicate) → musical instrument (=object). Guitar and musical instruments are concepts expressed in URIs. The predicate is expressed by a SKOS property, in this case skos:narrower.
SKOS
SKOS or Simple Knowledge Organization System is a formal data model developed by W3C to enhance linked open data in the (semantic) web. It is a standard that translates knowledge information systems such as thesauri, classification systems etc. in RDF-triples (SKOS/RDF). Controlled vocabularies structure information via hierarchical, equivalence and associative relations and contain scope notes, translations and other additional information on specific terms. This information can be made accessible on the web when the controlled vocabulary is converted to SKOS. In SKOS the term and all the information it contains is expressed in URIs. This is why in SKOS they are called concepts. In a controlled vocabulary, the term is important, whereas in SKOS, the URI is important. URIs form the basis of linked data on the web.

Conversion to SKOS requires some technical knowledge of RDF and SKOS. This is why the Linked Heritage and AthenaPlus projects developed a Terminology Management Platform (TMP), an open-source tool where controlled vocabularies can be imported and link them to other resources using SKOS.
XML
XML or Extensible Markup Language is a computer language standard developed by W3C that defines a set of rules for encoding documents in a format that is human-readable and machine-readable. RDF/XML is an application of XML, created to express RDF as an XML-document.
Semantic Web
The Semantic Web is a collaborative movement led by the international standards body, the World Wide Web Consortium (W3C). The semantic web is built on the principle of sharing and reusing data on the web to achieve better search results, irrespective of language. This can be done by automatically linking “separate” data on the web. When every concept is interlinked as an equivalent, synonym, broader or narrower concept or via any other relator, the web can optimize search results. This will engage greater visibility and easier access to information.