TERMINOLOGY
2/4 – Introduction to SKOS
As a Semantic Web compliant format, SKOS is concept-oriented. This means that the fundamental element of a terminology designed in SKOS is the concept and not the term that expresses this concept. The SKOS data model consists of a basic structure that can be extended by specific classes for detailing lexical parts or semantic relations between the concepts of the terminology. The SKOS reference publication summarizes the main features of the SKOS model as follows:
“Using SKOS, can be identified using URIs, with lexical strings in one or more natural languages, assigned (lexical codes), with various types of note, and organized into informal hierarchies and association networks, aggregated into, grouped into, labeled and/or ordered, and to concepts in other schemes.”
SKOS data are expressed as RDF triples. This means that concepts may be subject or object and related via a SKOS property which would be the predicate. As RDF triples, SKOS concepts van be identified using URIs. These URIs can be defined according standard persistent identifier systems. The SKOS data model doesn’t require the use of persistent identifiers but in a Linked Open Data perspective, their use is highly recommended. Persistent identifiers will be described more precisely in the following sections.
The SKOS datamodel consists in three main components: classes, properties and relations. These three components always start with the prefix “skos:”. The distinction between a class and a property is done through the case: the element following the “skos:” prefix starts with an upper-case character when it is a class, e.g. skos:Concept and skos:ConceptScheme are classes; if the element following the “skos:” prefix starts with a lower case character, this means that the element is a property and not a class. For example skos:prefLabel is a property.
Content sections:
SKOS main features
SKOS: concept
SKOS is a concept-oriented model therefore the concept is the central element of the terminology. From a terminology point of view a concept can be defined as an idea, notion or unit of thought. A concept in SKOS is introduced as a class skos:Concept.
SKOS concepts can be brought together into two classes:
- SKOS concept scheme
- SKOS collections
SKOS: concept scheme
A concept scheme is a way to bring together several concepts. A concept scheme is introduced by the class skos:ConceptScheme. An individual concept scheme roughly corresponds to the notion of an individual thesaurus, classification scheme or any other knowledge organization system. It is important to mention that a same concept can be part of more than one concept scheme.
SKOS: collections
A collection is a group of SKOS concepts. A collection is introduced by the main class skos:Collection. Although another class skos:OrderedCollection can also be used in the case where the order of the concepts within the collection has an importance. The notion of collection is different from the concept scheme. For the migration of a thesaurus for example, the whole could be considered as a concept scheme where several thematic groups of concepts could be designed as collections.
Labels
The SKOS model focuses on concepts therefore there is a distinction between the concept itself and the terms that may used to express this concept. Terms referring to a concept can be expressed via lexical labels according to the SKOS data model. A lexical label is a string of Unicode characters which allows you to have a term in any language with or without Latin characters. The SKOS data model defines 3 types of lexical label:
- Preferred label, introduced in the SKOS data model as the skos:prefLabel property, corresponds to the notion of descriptor from the standards for the elaboration of thesauri. The SKOS data model does not allow there to be more than one preferred label in the same language.
- Alternative label, introduced as skos:altLabel property, are mainly used to give synonyms to the preferred label or other ways to refer to this preferred label, e.g. different spellings or acronyms. The SKOS model does not forbid the exclusive use of alternative labels instead of one preferred label and many alternative labels.
- Hidden label, introduced by the skos:hiddenLabel property, may be used for mentioning the misspellings of preferred or alternative labels but also for mentioning obsolete forms of a term. Alternative and hidden labels correspond roughly to the USE and UF (Used For) indicators defined in the ISO standards for thesauri. By definition, hidden labels are not visible but are very useful for the retrieval. Obviously the SKOS data model does not allow the use of the same string of characters as a preferred, alternative or hidden label in the same language. An extension to the SKOS model, SKOS-XL, is proposed for modeling more precisely the labels and including morphologic or syntactic information on labels.
- Notation: symbols or codes that are not recognizable or understandable in any natural language. Notations are different from labels which usually are words or expressions understandable in any natural language. The skos:notation can then be used for example in the case of classifications where a code refers to a term referring itself to a concept. The notation can be more convenient than using an alternative label since it is considered as unambiguous and language independent.
The use of these different types of label enables the understanding of the concept and is useful for human-readable knowledge representation. The use of labels is not mandatory in the SKOS datamodel but is highly recommended especially for maintenance purposes.
Documentation properties
The SKOS model offers a variety of possibilities to provide information related to concepts. Different types of notes can be used to give the most accurate information. These notes can be of different natures (plain text, image, quotes …) and be used without any restriction.
The different types of notes that can be used to document a concept are:
- Note (skos:note)
- Change note (skos:changeNote)
- Definition (skos:definition)
- Editorial note (skos:editorialNote)
- Example (skos:example)
- History note (skos:historyNote)
- Scope note (skos:scopeNote)
The skos:note can be used to provide general documentation on a concept. All the other types are specializations of this general property. The skos:changeNote and editorialNote are mainly useful for the purpose of administration and maintenance. The skos:definition, skos:example, skos:historyNote are useful for providing information on the concept for a better understanding of its meaning. As for labels, documentation properties can be provided in different languages by using language tags with the xml:lang attribute.
Semantic relations
The power of the SKOS model lies in the semantic relations that can be used to connect between different concepts. These semantic relations play a crucial role for defining concepts. There are two different categories of semantic relation:
- Hierarchical relations:
Hierarchical relations are introduced via two properties, skos:broader and skos:narrower. The skos:broader property is used to assert that a concept has more general meaning. skos:narrower is the inverse property used to assert that a concept has a more specific meaning. One concept can have more than one broader concept or more than one narrower concept.
It is important to note that these two properties only assert direct/immediate hierarchical link between two concepts. In order to enable non-immediate link between two concepts, the SKOS model provides two other properties that are transitive.
As for the skos:broader and skos:narrower, the properties skos:boaderTransitive and skos:narrowerTransitive are the inverse of each other. - Associative relations:
The property skos:related is used to assert an associative link between two concepts. This property may be useful to make a link between a concept and another one which is neither an equivalent nor a broader/narrower concept. It is important to note that the skos:related property is symmetric.
skos:related is not a transitive property.
It is very important to keep in mind that, according to the guidelines provided in ISO 2788 and BS8723, mixing associative relations and hierarchical relations is not consistent with the SKOS data model. Therefore a special attention must be paid to the semantic relationships between concepts.
Mapping
The power of the SKOS datamodel relies on the mapping features it offers. The SKOS data model provides several mapping properties for making alignment between concepts from different concept schemes. These properties are:
- skos:closeMatch
- skos:exactMatch
- skos:broadMatch
- skos:narrowMatch
- skos:relatedMatch
As for semantic relations between concepts, the mapping properties can be associative or hierarchical. The skos:broadMatch and skos:narrowMatch properties are used for a hierarchical mapping link between concepts whereas the skos:relatedMatch property is used for an associative one. Exactly as for semantic relations, skos:broadMatch is the inverse property of skos:narrowMatch.
The properties skos:closeMatch and skos:exactMatch are used to make a mapping link between concepts that are very similar or equal so they can be used interchangeably. The skos:exactMatch property is transitive and symmetric. Mapping properties are used rather than semantic relations in order to make mapping links between concepts from different concept schemes. In the case of a same concept scheme semantic relationships will be used instead of mapping properties.
As for semantic relations, there may be some conflicts in mixing hierarchical mapping properties with associative ones.
Guidelines for SKOSification
By SKOSification, we mean the process of conversion or transformation of a terminology into SKOS. We list below some guidelines for proceeding to this conversion from a technical and organisation point of view. From the technical point of view, many of the guidelines provided here are inherent to the SKOS model but a special attention must be paid to these points in order to enable the general consistency with the netwok of terminologies.
Evaluate the main features of the terminology to be migrated
Before starting any procedure for converting a terminology into SKOS, the institution must have defined the purpose of its terminology (e.g. indexing and retrieval, only indexing, or only retrieval). As a second step, and a consequence of the definition of the purpose, the institution must evaluate if SKOS is the appropriate format considering the content of its terminology. In the case of authority files for instance, SKOS may not be the most appropriate format. Here are some features that can help for this evaluation:
- Concepts: Is the terminology dealing with objects and abstract things that could be assimilated to concepts? Is the terminology dealing with persons? => if the terminology is dealing with persons and not objects or abstract things, a standard like FOAF (Friend Of A Friend)FOAF: http://www.foaf-project.org would be more apropriate
- Semantic relations: Are the descriptors (then concepts) of the terminology can be linked together via semantic relations. => if the terminology only contain independent descriptors without any semantic relations, a SKOS modelization is not absolutely necessary, an RDF representation may be more convenient.
- Interoperability: Can the terminology be linked to another resource dealing with the same subject/domain or scope? => if the terminology can be linked to other resources, all the potential links should be considered before the transformation process in order to implement these links in a most efficient way.
Identify your concepts
Interoperability: Can the terminology be linked to another resource dealing with the same subject/domain or scope? => if the terminology can be linked to other resources, all the potential links should be considered before the transformation process in order to implement these links in a most efficient way.
- Use of a Persistent Identifying System for the definition of the URIs
As we described them above, we recommend the use of standards for the identification of the concepts. Indeed, as the identification of concepts is achieved with the definition of HTTP URIs, these URI must be declared to persistent identification systems such as PURL which is normalised. This will also be of a great benefit since it is location-independent, e.g. if the terminology is moved from one location (housing server) to another, the URIs identifying the concepts of this terminology will not have to be modified. - Use of non-explicit URIs
It is highly recommended to use non-explicit URIs in order to avoid the reuse of a same URI for identifying two different concepts. Indeed as natural languages are by definition ambiguous and polysemous, it is possible that two different concepts might have two similar labels. The use of explicit URIs supposes that the choice of one specific natural language has been made during the definition or the migration of the terminology which cannot be convenient in a multilingual context.
See the booklet on Persistent identifiers: recommendations (PDF).
Define with precision the labels expressing concepts
- Preferred labels must be unique within a concept scheme
As it is required by the SKOS data model, no two concepts from a same concept scheme should have the same preferred label in a given language. However as natural languages are highly polysemous and full of homographs, the SKOS data model does not forbid that one concept can have two same preferred labels in two different languages.
Each concept must be expressed with one preferred label per language (mandatory)
As we saw above, the SKOS data model does not forbid the absence of preferred label, but labels are meant to help the understanding and refining the meaning of a concept. This is especially true in a multilingual context and it is helpful for purposes of administration and maintenance. Therefore we recommend using one preferred label per language. It is important to note that this also means that is not possible to have several preferred labels in the same language.
Avoid the concatenation of several words for a same label
In order to get the most accurate description, we recommend avoiding several values as a preferred term. For example, double concepts such as “dwelling/houses” must be considered as two different concepts that are linked by a semantic relation. The use of scope notes can help to reinforce the closeness of these two concepts. The link between the two terms must be defined in order to provide the best description. We can state that “dwelling” and “houses” are synonyms; then the double concepts can be modelled as follows: Dwelling: preferred label and houses: alternative label
Another possibility in the case of double concepts is to model the two concepts as related concepts.
-
Privilege the use of the lemma for the preferred label and possibly the other labels
The preferred label should consist in a single word term or a compound words term in natural language. This means that no artificial word or code must be used to label a concept. Such code must be defined using the skos:notation property. The lemma of a word represents its canonical form. We strongly recommend this form of terms to be used as preferred label. For instance, in English or in French, the usual form of a lemma in the case of nouns is the singular for the number and the masculine for the gender.
Privilege the typography in use by convention in the languages involved
The labels should respect the typographical rules that are usually in use in the languages of the labels. For instance, in English all the words referring to a language or nationality starts with an upper-case character whereas in French, these words will be in lower case characters. Thus we recommend respecting the conventions that are in use for each language involved. Any exception to this guideline must be documented via documentation properties of the model.
For verbal forms, infinitive forms will be privileged. Thus the forms of terms should be based on the conventions in the languages involved. If the concept is only expressed with labels in specific forms that do not correspond to the lemma, this must be documented via the documentation properties (skos:note, skos:changeNote, skos:editorialNote or skos:historyNote) In the case of compound terms, if possible, the addition of adjectives or verbs to a noun phrase should be limited. In the same spirit, the use of articles and prepositions should be avoided in order not to extend the length of the label. From the computing systems point of view, these guidelines can help the efficiency of a retrieval system. - Avoid the duplication of information
The SKOS data model consists of classes and properties as we saw above. Meanings are to be deduced by an efficient use of these properties. As some of the properties available in the SKOS model are proposed as pairs (inverse or symmetric), this supposes that the use of one property implies the opposite or the reverse. Therefore it is better to avoid duplication and not to repeat the same information in different ways. SKOS terminologies are processed by machines. So the less redundant information there is, the faster the results of a query can be retrieved. The main properties to pay attention to in order to avoid duplication of information are:- Inverse properties
The use of the skos:broader or skos:narrower property implies the inverse meaning. Asserting that A has a broader concept B implies that B has a narrower concept A. This is true also for the skos:broaderTransitive and skos:narrowerTransitive property. - Symmetric properties
The skos:related property is symmetric then if an assertion that A is related to B is made, there is no need to make the following assertion, B is related to A.
- Inverse properties
Provide precision to the semantic relations of your concepts
- Non-immediate hierarchical relations
In some cases, semantic relations between concepts have to be described with precision in order to avoid a loss of meaning or information and also avoid designing information which will not make any sense. For example the skos:broaderTransitive/skos:narrowerTransitive pair of properties allows to describe with precision relations between concepts when two levels of hierarchy are impacted. Then the use of these transitive properties is preferred in order to assert a non-immediate hierarchical relationship between two concepts. However there is a possibility to use an extension to the SKOS data model in order to remove the symmetry of a property if this creates confusion in the meaning of the concepts. - Consistency of the semantic relations
In order to ensure consistency, mixing hierarchical relationships with associative ones should be avoided. For example, a concept A cannot be related to another concept B if this concept A is the narrower concept of a concept C. Therefore a special attention must be paid when designing the semantic relations between concepts.
Ensure the documentation of concepts and the terminology
- Provide documentation for each change that may occur to a concept and its labels
The SKOS data model provides number of documentation properties in order to refine the meaning of a concept or keep track of the changes on the label(s) of a concept and/or its meaning. For the purposes of administration and maintenance of the terminology, each change must be reported in the SKOSified terminology using change notes (skos:changeNote) or editorial notes (skos:editorialNote). - Provide as much as possible documentation to concepts with scope notes
As mentioned above, documentation on concepts helps to refine the meaning of a concept. The use of scope notes (skos:scopeNote) can be very helpful in enabling a better understanding of the concepts with contextual information. Examples may also be provided via skos:example property. Documentation of concepts is especially needed in the case of homographs/homonyms in the same language or different languages for the labels expressing the concept. Then scope notes and examples can provide the user with a semantic disambiguation.
Guidelines for mapping
Mapping is an inherent part of the SKOSification of a terminology. The following guidelines emphasize some aspects of the mapping process that may be crucial for general consistency of the terminology and the meanings of concepts.
Pay attention to the identification of your concepts during the mapping process
- Use only absolute URIs
This guideline follows on from the one referring to the identification of concepts in the SKOSifcation part above. The terminology is made available in a machine-readable format by the SKOSification process. In order to make easily computable the identification of concepts and linking between concepts, it is recommended to use absolute URIs rather than relative ones.
For example:
<rdf:Description rdf:about="http://www.athenaeurope.org/athenawiki/AthenaThesaurus/RMCA _Keywords#architecture"> is an absolute HTTP URI
<rdf:Description rdf:about="RMCA_Keywords#architecture"> is a relative HTTP URI. - Respect the URIs of the original sources
As URIs are defined in order to identify the concepts uniquely, during the mapping process from a concept scheme to another, the URI defined within each concept scheme must be respected in order to enable the interoperability between the different resources involved.
Avoid the duplication of information
We saw that the structural properties for defining the semantic relations between concepts are either inverse or symmetric. This is also true for the mapping properties.
- Inverse properties
The mapping properties skos:broadMatch and skos:narrowMatch are each other’s inverse therefore there is no need to repeat twice the same mapping link using both properties for the same subject and object. - Symmetric properties
The mapping property skos:exactMatch and skos:closeMatch are symmetric. So repeating the mapping link can be avoided. The property skos:exactMatch is also a transitive property then there is no need to repeat the mapping link on several levels.
For instance: A skos:exactMatch B B skos: exactMatch C
The assertion A skos:exactMatch C can be inferred from the preceding statement.
Provide precision to the semantic relations of your concepts
- Use the appropriate properties to make links between concepts
The SKOS data model provides semantic relations and mapping properties, and does not restrict the use of these properties. However we strongly recommend to model in a homogenous way the relations between concepts in order to ensure the semantic consistency of the terminology. We recommend to:
- Use mapping properties to make a link between concepts from different concept schemes
- Use semantic relations properties to make a link between concepts within a same concept scheme
Enable the multilingualism
- Manage multilingualism of the terminology through mapping of concepts and terms
The mapping process can be useful in a monolingual context but is especially relevant in a multilingual context. Equivalences can be stated from the mapping links made between several terminologies in different languages. Equivalencies in a multilingual context can be of three kinds: semantic, cultural or structural. The semantic aspect refers to the meaning of the concept; the cultural aspect refers to the use of a term in a given language or culture; and the structural aspect refers to the semantic relations between concepts. This last aspect deals with the mapping and allows defining complete equivalence (synonymy) or partial equivalence (quasi synonymy) or nonequivalence. As it was the case for the first version of the ATHENA Thesaurus, equivalences between concepts in languages that were not initially involved in the source terminology can be deduced from correct mapping links without translating the concepts.
Ensure the documentation of concepts and the terminology
- Make explicit with notes the purpose of a relation
For the purposes of maintenance and administration, it is important to explain the choices of modelling that have been made for making links between concepts. The use of scope notes can help making explicit these choices. Documentation properties can also keep track of history of mapping links. Validation is an important part of the SKOSification process and mapping also. Therefore a special attention must be paid to this final step of the SKOSification. From a technical point of view, in order to check the consistency of your converted terminology to the SKOS model, we recommend using the online web service [Party]. Pool Party offers a free online tool for validating SKOS files that may be already online or stored on your local repositories. This tool checks the consistency of the SKOSified terminology according to the following points which refer to our guidelines:
- Valid URIs: the tool checks if there is not any unauthorised character in the URI. Although if an URI is used twice for identifying two different concepts, there won’t be any alert or warning.
- Missing language tags: the tool checks if all the labels and notes have a language tag
- Missing labels: the tool checks that each concept has at least one preferred label.
- Loose concepts: all the concepts that are isolated and not linked to other concepts are pointed out as loose concepts
- Disjoint OWL classes: the tool checks the eventual consistency with OWL elements that may be in the SKOSified terminology
- Consistent use of labels: the rules for the use of labels are checked by the tool in order to avoid the use of a same label as a preferred label and alternative or hidden label, and to avoid the use of two preferred labels in a same language, ...
- Consistent usage of mapping properties: the tool checks the consistency in the mapping relations.
- Consistent usage of semantic relations: the tool checks that there is no mix between hierarchical and associative semantic relationships.