PERSISTENT IDENTIFIERS:
COMMERCIAL AND HERITAGE VIEWS

4/9 – CASE STUDY 2: The ISBN-A – making ISBNs part of the Web

Summary

This case study explores how the ISBN system is still useful to the world of E-books and the Web, and how new possibilities are starting to open up for finding and selling books online with ISBN-A.

The ISBN system was developed in the 1960s and 1970s to allow reliable, efficient exchange of information for bibliography, stock control, ordering and sales reporting.

This was a triumph of standardisation in the early days of computerised commerce; however, it was never designed for use in a global network of computers like the Internet.

Yet numbering system can still work in the context of the World Wide Web. In fact, recently:

  • New ebook guidelines for Registration Agencies made clearer when electronic publications should get new ISBNs, so that both sellers and end-purchasers can distinguish between products that have a significant difference for someone in the supply chain;
  • Fitting ISBNs into the larger GTIN-13 namespace means that more numbers will be available; these are needed because
    • More publications will be produced with digital technology, and
    • More different versions and formats of each publication will be made, especially different file formats and “packages” of allowed uses for ebooks.

ISBNs are still mainly used through a network of agencies whose organisation and equipment pre-dates the Internet. In many cases, services to look-up product information and ordering (as the ISBN’s ISO Standard recommends) have not been fully developed.

ISBN-A, or “Actionable ISBN”, aims to give publishers and others a central, standardised service platform to offer these additional services based on ISBN. It takes advantage of the Internet’s capabilities by integrating ISBN into the DOI system.

Background to the DOI system

To explain what the DOI system is, first a little Internet anatomy:

  • The Internet itself is an infrastructure of computers connected together to exchange information. It is made up of
    • Computer hardware belonging to public and private organisations, individuals, Governments etc.
    • a standard set of rules (called protocols, such as HTTP, FTP…) for how the computers share digital information (“ones and zeroes”). It is neutral in terms of what the information means, and how it is encoded.


      One simple analogy could be the system of rails and railway stations connecting cities and countries together. All types of rail transport use it, but it does not tell you what type of trains it carries, when they are scheduled, and what is in the trains (e.g. it could be passengers, freight, or both). Connecting the systems depends on e.g. the gauge of the rails, signalling rules, avoidance of collisions in scheduling…
  • The World Wide Web (WWW) is made up of documents (and other digital files) available through HTTP requests on the Internet, and the links (URLs) between them.


    In the railway train analogy, this is the “web” of all the passenger journeys through the railway network you and others could make, and all the meetings you might have at each stop or change – but the passenger analogy is not perfect, because railway trains transport unique persons, but the Internet and the WWW only transmit copies of digital files between computers.
  • Just as there are both passenger trains and freight trains on the rails, the WWW is not the only thing that uses the internet. Other services use the internet at the same time (for example, e-mail services).
  • A URL (“Uniform Resource Locator”) in the WWW is an identifier pointing to the location (only) of a digital file (i.e. a piece of digital storage on a computer on the network). It can also contain a command to software running at that location, but we will not consider this complex case.

    This leads to the two most basic problems with retrieving information through the Web:
    • How do I know this specific file (or any file) claimed to be at this URL will actually be at this location?
    • What information about the file (e.g. its meaning and intellectual content, its encoding format, history of changes) can I reliably get before I download the actual file?
    Clearly the answer to these questions depends on some level of organised management of files and information about them (and their contents), and only in the second place, on technical solutions for implementing it.

It is important to start with a clear understanding of the limitations of HTTP. There are two basic types of “things” that can be identified on the Web:

Things identified on the Web
Things you can get through HTTPThings you cannot get through HTTP
Copies of digital files made available on the Internet. That’s all!Concepts (e.g. theories, histories, names, subjects of documents, instructions for use… this includes concepts describing digital files!)

Unique physical things (e.g. you, me, the Taj Mahal, my laptop, Einstein’s brain, the Mona Lisa…)

Also: digital files not made available on the Internet…
N.B. some of these digital files could be about other digital files, or about things that are not digital files – but you don’t know that until you get them and inspect them.N.B. most identifiable things in the world are not available through HTTP!


In the early days of the Internet and the WWW, there was a plan to use three distinct types of structured identifiers for digital files (“objects”) and related information:

  • URN – Uniform Resource Name: A persistant, location independent identifier for an object
  • URL – Uniform Resource Location: The address of an object, contains enough information to identify a protocol and retrieve the object
  • URC – Uniform Resource Characteristics: Any combination of one or more URNs or URLs with meta information 1

So: a URN is the identifier for what I am trying to find, the URL should be the location I need to get it from, and URCs make the link between the two, telling me about the file and where to get it.

The network location (URL) of a digital file (identified by a URN) is itself a “characteristic” (URC) of the file.

In the end, URCs didn't work out, and we now use the term URI (Uniform Resource Identifier) to cover both URNs and URLs. The important point to grasp is that URNs are about the identity of a document or resource, independent of its location, and the URL is about the location of a document or resource (somewhat independent of its identity).

The aim is to link network locations (URLs) of copies of a file to a single, unique name (URN) for that file, and keeping the location information (which would have been the URC) up to date.

  • The first two, URN for names and URL for file locations, have been linked by national libraries for their digital collections and national bibliographies, as in the URN case study from the Hungarian National Library.
  • This “resolution” is like a catalogue, recording a book’s ISBN and the fact that a copy is held at a particular library.

More details about the file, beyond just “where to find it”, are hard to standardise because they need cooperation across a whole sector of activity, and between sectors.

The DOI system

To offer a solution to this problem, the DOI system was launched, offering:

  • A service for maintaining unique, persistent identifiers, featuring
    • Persistent names – like ISBNs, DOIs cannot be changed or re-assigned once assigned
    • Resolution to one or more URLs, with optional additional metadata based on cross-sector cooperation

DOIs are unique numbers created and managed by registration agencies in a similar way to the ISBN. Their service platform makes them automatically resolvable on the Web in a much more immediate way:

Diagram illustrating what explained below. DOI Handbook.  http://www.doi.org/doi_handbook/3_Resolution.html
2

A DOI user sends a request to the DOI system at http://dx.doi.org - the DOI system responds with some of the metadata stored for that DOI, or with a copy of the file identified by the DOI, depending on the type of object the DOI identifies, and the appropriate kind of access available (for example, if a payment is necessary, the user might see a page requesting subscription). A Web browser can automate some of this process.

To keep a DOI resolving to the latest information about the thing it refers to, even when the actual location of the thing is changed, the owner of that DOI has to actively manage these links.

Domain-specific implementations include:

  • ISBN-A for books (see below)
  • DataCite for scientific data sets (see case study #3)
  • CrossRef for academic publications (normally journal articles)
    • Identifies academic publications at the journal article level: extremely convenient for precise citations
    • Can identify academic books and chapters of books, and parts of content (e.g. tables, diagrams)
    • Citations can be Web-linked through their metadata to become part of the research process
  • EIDR for film and TV assets
    • A linking ID for existing audiovisual data and identifiers, with links;
    • Identifies films and TV programmes (and series) at any level of abstraction:
      • defines the relevant characteristics of each level
      • different versions (e.g. the “director’s cut”) and encodings (e.g. a DVD or digital download) can be linked together

Current DOI implementations are used for published (usually commercial) creative works. But other types of object could be identified and looked up using DOI.

DataCite is a case in point since research data sets do not neatly fall into the category of commercial publication.

  • The technical, organisational and financial model of DOI is flexible enough to be used by non-profit organisations:
    • Software platform based on CNRI’s open-source Handle system;
    • Some functions of a Registration Agency can be delegated or shared;
    • The business model of a DOI Agency can incorporate for-profit or cost-recovery pricing
  • The core data model for DOI metadata is demonstrably compatible with those for heritage data (using e.g. CIDOC-CRM).

DOI could be used to identify museum objects (see the “Heritage Sector Web Identifiers” section).

ISBN-A and ISBN

ISBN–A (“actionable ISBN”) is an innovative marketing tool to use ISBNs on the Web. ISBN-A initiatives in Europe so far are led by the ISBN agencies of Italy and Germany; the DOI Registration Agency that supports these is Linked Heritage partner, mEDRA (multilingual European DOI Registration Agency).

ISBN-A provides a platform for managing and exploiting the metadata for marketing books in the Web environment. Each ISBN Agency provides the service according to its own business model and marketing strategy. Creating ISBN-As for books enables services to be built on the metadata and links that publishers provide for them.

To make ISBNs into identifiers in the DOI system, the ISBN is simply incorporated into the DOI syntax.

Let’s take an example number: 978-88-07-70168-9. In the DOI system as an ISBN-A it appears as:

The structure of an ISBN-A. http://www.medra.org/
3

A Web link is created from the number simply by adding the HTTP address of the central DOI resolver service, to form http://dx.doi.org/10.978.8807/701689 and this link can then be resolved by any Web browser or Web-based application. Of course, what this resolves to - whether to a web page describing the book, or a page where you can buy the book, or the author's home page - depends on how the publisher of the book manages the DOI data.

ISBN-A in Italy

A DOI Registration Agency is domain-based; an ISBN Agency is usually national. So the ISBN-A system (at least in Europe) is based both in Italy (where the DOIs are registered) plus another country if the ISBNs are registered outside Italy.

  • mEDRA is the technological provider of the Italian ISBN Agency so in Italy the ISBN-A has been fully integrated with ISBN
  • ISBN-A registration tools developed by mEDRA for the Agency:
    • Web service to enable publishers to submit ISBN-A registration messages in ONIX format;
    • Online web platform integrated with the Agency bibliographic database to enable publishers to add resolution metadata to ISBN bibliographic records.
  • An indirection service is managed by mEDRA, allowing links for multiple retailers to be found on resolving the book’s ISBN-A:
    • mEDRA designs the user interface (or “landing page”)
    • a “view metadata” service is built on bibliographic data maintained in the local ISBN Agency database
    • book covers can be displayed based on images uploaded to the ISBN Agency platform
  • Prices for registration and maintenance depend on how many products are registered: from around €100 for 10, to €1500 for 500.

The ISBN-A platform has potential primarily as a business-to-business (B2B) service. Its value depends on:

  • Investment of resources by publishers to maintain and develop the marketing data and links
  • Added-value development of the range of options offered by the basic “title page” provided by mEDRA – perhaps partnerships with:
    • social media channels,
    • Web analytics
    • or even content distributors (ebook wholesalers)?

As a single Web link per book with a vast number of optional features possible, the advantages of embedding such identifiers in the context of Web pages, search engine results, social media and mobile applications are clear.

The existing presentation of data to end users is very simple, as in the example below, and primarily text-based:

In principle, the ONIX metadata format supported by mEDRA’s resolver could offer links to almost any type of media, depending on the context of resolution, as long as these were provided and managed by the publishers.

ISBN-A in Germany

The first full adoption of ISBN-A outside Italy is in Germany. The German publishing association has a services subsidiary, MVB, which runs the German ISBN agency and a variety of online platforms, including a retail Web site and a books-in-print database.

Because MVB also runs the Web bookshop “Libreka!” it can easily add retail links to ISBN-A book data.

The marketing materials it can assemble as part of running Libreka! and the books in print service “VLB” put MVB in a very strong position to develop bespoke book marketing channels which add value to the data.

Click on this ISBN-A maintained by MVB to see the system in action:

10.978.37639/35147

Lessons from the case study

  • The successes of standard identifier systems like ISBN can carry over into the Web era
  • All the same costs that come with “traditional” identifiers are implied by managing identifiers on the Web
  • But – it is possible to offer cost-effective, persistent identification through central service platforms like DOI
  • DOIs can be used to Web-enable existing systems like ISBN
  • Web services based on actionable identifiers like DOI really need extra management effort and investment to provide value
  • This may carry over to new areas like cultural heritage if there is demand from that sector
Explore further

DOI Handbook and topical Factsheets
The Handbook gives technical overviews of DOI, and the fact sheets introductory outlines on specific topics

DOI Factsheet on ISBN-A
A summary of the ISBN-A system as a DOI implementation that incorporates an existing standard