DIGITISATIONLIFE CYCLE

2/5 – Workflow

Selection

The selection of documents will normally take into consideration the growth of the collection, added value, protection, technical feasibility, and ability to sustain the long term costs of digitisation. In reality, the best selection will be based on a combination of criteria.

Selection criteria will generally take account of:

  • historic and cultural value
  • uniqueness and rarity
  • high demand
  • documents free of legal constraints, or with permission to digitise already secured
  • limited access due to state of preservation, value and location
  • value added by providing access on line; creation of virtual collections
  • increased level of interest generated in little-known or unknown material

In certain cases it may be worth conducting a survey of the material so as to determine the quantity, type and format of the documents, and their state of preservation. This information may come in useful during subsequent preservation, cataloguing and digitising operations.

Legal aspects

When digitising documents, serious attention must be given to issues concerning copyright, in respect both of original material and of digital resources.

Points to examine are: characteristics of the work to be processed, rights ownership (who owns the rights – is the work protected – what type of protection?), the actions to be performed on the work (what are they – what rights are involved – has authorization been obtained?), likely critical areas and possible solutions.

Works that must be excluded are those subject to copyright and those digitised in other collections and accessible to the public on the web, in this instance so as to avoid duplication and minimize costs.

Preservation of items

Digitisation is no substitute for commitment to care and preservation of original documents.
It is important to assess the state of preservation of original documents before proceeding with digitisation, and to ensure that any treatment of original specimens is carried out only after they have been inspected by experts.

Digitisation

To guarantee the safety of originals and ensure good digitising quality, particular care must be taken over the choice of acquisition methods and equipment (capture system, lighting, software).

The nature and the dimensions of the originals will determine the selection of the capture system and the lighting system.

The hardware and software requirements of the capture system are determined by the image quality expectation, as are the timescales for acquisition and processing of the images, and the amount of storage space occupied in allocated memory locations.

As a general rule, the key to quality does not lie in scanning at the maximum resolution obtainable, but performing the scan at a level commensurate with the information contained in the original.

In general, the aim of digitisation is to produce master files suitable for long-term preservation. Files for viewing on the web are derived from the master files.

In-house or outsourced digitisation

The decision as to whether documents should be digitised by the institution (in-house), or entrusted to an outside service provider (outsourcing) will depend on the advantages and drawbacks of the two methods.

In-houseOutsourcing
Advantages
  • keeping direct control over the entire process
  • learning by doing
  • improving standards as work proceeds, rather than setting targets beforehand
  • ensuring safety, proper handling and accessibility of materials
  • institution pays for the end product, generally on the basis of an agreed price per image
  • costs are kept down, and risks limited
  • service provider can handle large quantities of material
  • rovider shoulders the costs of specialization, training and technological obsolescence
  • wide range of options and services available
Drawbacks
  • rather than paying for the product, institution shoulders the costs, including training, technological obsolescence and down time
  • outlay on purchase and maintenance of equipment
  • need for skilled human resources
  • cost per image not defined
  • by eliminating a step of the process, institution does not develop thorough knowledge of digitisation
  • problems with safety, transportation and handling of original specimens
Recommendations In-house is best if:
  • the collection cannot be moved outside of the institution
  • the digitising process is a very simple one
  • reliance can be placed on specialist human resources and equipment already on site
Outsourcing is best if:
  • original specimens cannot be digitised in-house for whatever reason
  • schedule involves processing large quantities of material in the short term
  • there are constraints in terms of space, infrastructure and human resources

If the decision is made to entrust the service to a company, the institution must:

  • determine the digitisation parameters
  • draw up a detailed invitation to bid
  • evaluate the products and services offered
  • define the contractual responsibilities of the institution and of the company
  • carry out a final quality control on the product

The cost of digitisation depends on a number of variables, namely the size, type and nature of the document for digitisation and the envisaged use of the digital object, hence an assessment of the costs can be requested from the digitisation service provider, or alternatively, based on previous digitisation projects. It may help to consult existing literature on the topic.

Selection of equipment

General indications on the capture system:

  • Flatbed scanners are used for single sheet documents, or bound documents that can be opened out flat without difficulty, of dimensions up to paper size A3 (420 x 297 mm).
    These documents include: printed matter (e.g. flyers, posters, brochures), manuscripts (e.g. letters), maps in good condition, sheet music, prints (e.g. engravings, etchings, lithographs), pen-and-ink drawings with no added water colour or tempera (e.g. cartoons), photographic material (e.g. black-and-white and colour gelatine prints, albumen prints).
  • Scanners for films and transparencies are used for films, negatives and transparencies.
  • Planetary scanners or Digital cameras are used for bound documents, documents of a special nature, and documents larger than size A3.
    These documents include: bound volumes (e.g. books, albums, sheet music, atlases), fragile documents, oil paintings, most works of art on paper (e.g. watercolours, drawings), graphic material and works of art created with flaky and friable substances (e.g. crayons, charcoals, soft pencil), watercolours applied thickly, with tempera or varnishes, large or fragile maps, manuscripts (e.g. bound diaries, folded documents), parchments, photographic material (e.g. large size prints; historic photographic processes such as daguerreotypes or ambrotypes), three-dimensional material (e.g. fabrics, sculptures, objects).

In the case of antique and fine art originals, the lighting system must be fitted with lamps emitting cold light and ultra-low levels of IR and UV radiation.

Digital acquisition

Bearing in mind the resources available, the decision on image quality should be based on the needs of users, on the method of delivery and use of images, and on the nature of the materials being digitised (size, format, type of material, colour, etc.).
There are various reasons for creating a high quality master: preservation, access and cost, and the need to ensure that the digitisation process will not need to be repeated in future. The master can be used to prepare files in smaller sizes or alternative formats for the different uses envisaged. Standard formats should always be used.

Indications on the master file:

  • this is the file in which the single digital object is created and preserved, and from which derivatives can be generated (JPEG, PDF etc.); enables high quality printing
  • the master file represents the informative content of the original, as closely as possible
  • the original must be captured in its entirety. A border must be left around the document, so that the outline of the image can be identified
  • if the original is mounted on a backing that carries information, the digitisation should also include the backing
  • the master file is archived exactly as reproduced by the acquisition tool
  • the file should be in a standard format, such as TIFF
  • the title of the file should incorporate a colour profile
  • if the original is digitised and accompanied by colour scale, grey scale and gauge, these shall be located outside the borders of the reproduced image and within the overall perimeter of the surround

Indications on derivative files:

  • these are used in place of the master for the purposes of LAN or WAN access, and accordingly, the dimensions will depend on the envisaged uses
  • derivative files should be of suitable proportions for fast download, without requiring a high speed connection, of acceptable quality for general research purpose, and presented in a compressed format for speed of access
  • the usual formats are JPEG or PDF

File naming

Before commencing any acquisition procedures, file-naming criteria must be established. In general terms, the name of each file will consist in a string of characters that must contain the information needed to identify the element of the collection to which the image belongs, uniquely and unambiguously. Filenames will be completed with the appropriate extension, such as ".tif", “.jpg”, etc.

Storage of data

The collection of the images, consisting in directories and files, will be memorized on optical or magnetic storage media such as CDs, DVDs, and external hard disks.
It is important that data should be saved to at least two such elements of storage media, preserved at two distinct locations, and that the data should be checked and refreshed periodically. The life of the storage media is in any event influenced by a variety of factors (ISO 18923:2000 and 18925:2002 standards indicate the parameters for proper preservation of storage media).

Quality control

Quality control should be documented and conducted throughout the entire digitising process on all material captured, and in particular on master files.

Planning of the quality control system should include:

  • appropriate preparation of the environment (hardware configuration, viewing software, viewing conditions, etc.)
  • a priori definition of “acceptable” and “not acceptable” characteristics
  • verification procedure (entire collection or sample, all files or master files only, visual quality on screen, in print, etc.)

Metadata

Metadata is structured information relating to any type of resource, used to identify, describe, manage or allow access to the resource in question.

There is no metadata standard that meets all the needs of all types of collections and repositories.
Generally considered, metadata models include the following information:

  • Descriptive metadata: data describing the content of a resource and allowing its retrieval
  • Administrative metadata: data containing information on the management and administration of a resource (e.g. rights management, preservation metadata, technical metadata)
  • Structural metadata: data describing the relations between digital objects (e.g. page order in a digitised book)

From Good practices handbook (edited by the Minerva Working Group 6)

“Appropriate Meta-data Standards

Issue Definition

Certain important standards already exist for meta-data. In the bibliographic domain (and increasingly in non-library cultural domains), the Dublin Core standard is of great importance.

Pragmatic Suggestions

  • Review existing meta-data models and standards before creating your own.
  • Creating a totally new meta-data model for cultural collections should be avoided.
  • The meta-data work carried out by similar projects in the past is likely to be relevant to your project – meta data models travel well between projects in the cultural area.
  • Unless your project has good reason not to do so, the Dublin Core fields should be included in the meta-data model. While museums may find the CIMI model better fits their holdings, a common core set of attributes should be aimed for, which will enable cross-collection searching.
  • If a proprietary meta-data model is to be used, a mapping from this model to the Dublin Core should also be developed.
  • While a naming scheme or national naming convention may be very useful, a full meta-data model is better, both in terms of the amount of data that can be stored about an item, and also to enable more powerful searching and interoperation with other projects and other countries.”

Digital preservation

In any digitisation project, it is essential to maintain digital resources created over time in such a way that burdensome repeats of digitising operations will be avoided. Accordingly, the institution must adopt procedures to ensure that digital objects will remain usable and accessible, irrespective of technological changes in the future.

The usability and accessibility of digital objects over time is guaranteed by the file format (standard for formats, file sizes, web transmission rate, methods of viewing images…), and by the archiving media and digital repository (digital objects with associated metadata will be archived and managed in a digital repository). It is fundamentally important to use open standards, thereby facilitating interoperability with other systems, and allowing access to metadata through other service providers (e.g. Europeana).