NA 3 – Information Networks (SYNTHESYS 2)

Consolidating the Information Network of European Natural History Collection

In the first SYNTHESYS project, NAD established the basic, core mechanisms for sharing and working with existing datasets held within infrastructures. Standards and protocols were developed (e.g. ABCDEFG schema for specimen-based earth science collection data) to ensure standardised access to collection data via the specifically designed interfaces (e.g. SYNTHESYS BioCASE Portal).

NA3 will remove barriers to the electronic access and sharing of collections information by providing the state-of-the-art tools for specialist Users and improve the quality of content supplied by the global scientific community through implementation of best practice and the setting of global data standards. NA3 will ensure the integration of the SYNTHESYS IA Beneficiaries into the emerging virtual biodiversity information facilities on the European level.

In this section

NA 2 (SYNTHESYS 3) – Improving collections management and enhancing accessibility

Collections Self-Assessment Tool (CSAT)

To significantly improve the degree of mobilisation of specimen information and to create focussed interfaces to meet the increasing demands from both researchers and the public
To assist institutions in implementing the developed “best practice” data standards for electronic data throughout the virtual research communities and thus achieve the highest degree of interoperability possible for the European Research Area
To facilitate the technical access to specimen data for specialised User groups in research and society, that are not adequately covered by other initiatives such as GBIF

The development of information technologies has provided the means to mobilise data (textual or image) of the objects in natural history collections in radically new ways. It provides the tools needed to work with existing data and the great volume of new data; e.g. that generated by the exponential growth in DNA research. SYNTHESYS has been a driving force in this process, continuing the advances made by previous EU projects. These projects have provided key technology, global standards and content to the emerging Global Biodiversity Information Facility (GBIF), the internationally established infrastructure for serving specimen data to the world.

The focus of effort will now shift from building the basic connections, language and standards of the infrastructure to achieving full and wide acceptance and usage of the ‘information network’. On the one hand we need to offer significantly increased, more focused, and better content, i.e. mobilising more data in focus areas, and also improve standardisation of the data. On the other hand we need to adapt User access interfaces to the needs of specific User groups. A specimen information portal used to gather information for a taxonomic revision needs to be different from one offering distribution information to ecologists and other researchers to assess the effects of climate change.

The first objective will be achieved by investigating and rationalising the technical mechanism for data capture and for standardised data storage, by means of helpdesk functions, and by realising the possibilities for synergies e.g. in the area of duplicate data capture. The second objective will be pursued by means of focussed training (e.g. for IT managers) and direct help for technology updates and standard implementation. The third objective will be achieved mainly by relating the specimen information to specialised thesauri already available, but not yet used in the context of specimen access portals and by configuring the portals accordingly.

Rationalisation of data capture

Evaluate, document, and (where possible) instigate rationalisation possibilities (i) through mechanical automation, (ii) by means of automated data capture mechanisms (OCR, feature identification etc.), (iii) by exploiting networked data resources (e.g. existing duplicates), and (iv) voluntary data capture (Web 2.0 approach involving data capture from label or ledger images through volunteers including an online reward system and data quality assurance through multiple entries).

Evaluation and training: collection data capture and management software

Focus on existing collection software solutions, document their usability using the EDIT BD-Tracker site, provide information about training opportunities, and catalyse necessary improvements in the software (e.g. standardised globally unique identifiers like LSIDs, build-in ABCD support).

Implementation of improved Annotation mechanisms

Based on the technical annotation mechanisms prototyped under SYNTHESYS NA D (i.e. mechanisms to manually or automatically correct or add structured data to published specimen records on the Web): (i) specify a storage mechanism for such annotations referring to specimens held by European institutions, (ii) investigate the implementation of an “ABCD reverse wrapper” (i.e. an import software which can be used to incorporate accepted annotation data into the institutional databases) (iii) investigate the sociological aspects of moving towards virtual annotations of specimens, and (iv) develop recommendations for their usage in the collection work process.

Implementing the TAPIR standard

Most European data providers to the GBIF infrastructure are using BioCASE or DiGIR protocols for distributed information retrieval. TAPIR (TDWG Access Protocol for Information Retrieval) is the successor to these standards, which further increases the adaptability of the network to different communication demands, e.g. in the context of Web 2 applications. The NA3 Helpdesk will assist provider institutions to install the TAPIR software on their servers and to move existing BioCASE installations to take full advantage of the new standard.

Extending content cover with ABCD

The Helpdesk will also assist in moving remaining Users of ABCD 1.2 to the internationally agreed version 2 and at the same time improve the richness of the data by advising providers how to better map their digitised content to the standard. This will include e.g. the provision of organismic interactions documented by specimens, such as host-parasite/pathogen relationships and information on pollinators. The Helpdesk will also seek to involve further institutions in the network, focussing on those European countries that are not members of GBIF and thus do not have a national node promoting data provision.

Metadata on European collections

Documentation of European natural history collections was pioneered by the BioCASE network, which developed a first documentation standard and provided information on several thousand European collections. As a result of further work under SYNTHESYS NA D the NCD (Natural Collection Descriptors) Standard was developed. GBIF now collaborates with TDWG and RBGE to implement it as the Biodiversity Collections Index (BCI). The aim of this Task is to (i) assert correctness of the data on members of the network within that Index and (ii) instigate the update of the data on European collections in order to have them adequately covered by the new global information service.

Specialised Access

To develop and promote the use of components that allow the construction of highly specialised User interfaces, an interface and distributed query mechanism specifically for taxonomists will be developed building on the EDIT platform. This Task includes the identification of specialised needs (also of differences between taxonomic sub-disciplines) by means of the analysis of existing highly specialised Web interfaces and the implementation of the User interface based on the SYNTHESYS/BioCASE User interface including the thesaurus functionality.

Implementation of improved Annotation mechanisms - a report has been produced on storage mechanishms: Annotation systems report

The ABCDEFG schema is complete

Approaches for Involving Volunteers into the Process of Metadata Capture from Specimens Report

NA3 deliverables include:

Report: technical possibilities for the rationalisation of data capture (Deliverable 3.1 of the SYNTHESYS2 project).

Collection software on BD Tracker delivered (Deliverable 3.2).

NA 3 outcomes course to be integrated into NA2 (Deliverable 3.3).

Storage system specification report (Deliverable 3.4)

Reverse Wrapper software delivered (Deliverable 3.5).

Sociological implications of virtual annotations report (Deliverable 3.6).

Annotation workflow in collections report (Deliverable 3.7).

Rich data progress report (Deliverable 3.8).