FAIR Data: The New Default


Wouter Beek (wouter@triply.cc)
January 22nd, 2020

We always show the cost…

…but never the benefit.

“The Semantic Web has lacked an essential element which the WWW had from the start: the immediate gratification for information providers to see the results of their efforts on a screen.”
Tim Berners-Lee “Tabulator Redux” (2007)

FAIR data

“The FAIR Guiding Principles for scientific data management and stewardship”

www.go-fair.org

Findable

  1. [F1] (Meta)data are assigned a globally unique and eternally persistent identifier.
  2. [F2] Data are described with rich metadata.
  3. (Meta)data are registered or indexed in a searchable resource.
  4. Metadata specify the data identifier.

Accessible

  1. (Meta)data are retrievable by their identifier using a standardized communications protocol.
  2. [A2] The protocol is open, free, and universally implementable.
  3. The protocol allows for an authentication and authorization procedure, where necessary.
  4. Metadata are accessible, even when the data are no longer available.

Interoperable

  1. [I1] (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
  2. (Meta)data use vocabularies that follow FAIR principles.
  3. (Meta)data include qualified references to other (meta)data.

Re-usable

  1. Meta(data) have a plurality of accurate and relevant attributes.
  2. (Meta)data are released with a clear and accessible data usage license.
  3. (Meta)data are associated with their provenance.
  4. [R4] (Meta)data meet domain-relevant community standards.

[F1] Linked Data cost/benefit

Cost

(Meta)data are assigned a globally unique and eternally persistent identifier.

Benefit

Zero-cost data integration.

[F1] Zero-cost data integration

Combining energylabels and buildings.

[F2] Linked Data cost/benefit

Cost

Data are described with rich metadata.

Benefit

  • [F2a] Findable in popular search engines.
  • [F2b] Sharable in social media platforms.

[F2a] Findable in popular search engines

Search for “gemeentegeschiedenis”.

[F2b] Sharable in social media platforms

Sharing identifiers in Twitter.

Linked Data cost/benefit

Cost

[A2] The protocol is open, free, and universally implementable

Benefit

Connect lots of external applications

[A2] Data linking with Silk

silkframework.org

[A2] Vocabulary visualization & editing with WebVOWL

vowl.visualdataweb.org

[R4] Linked Data cost/benefit

Cost

(Meta)data meet domain-relevant community standards.

Benefit

Data can immediately be visualized based on semantics.

[R4] Geospatial information (community standard: GeoSPARQL)

Kadaster Knowledge Graph: key registries, company registry, monument registry, energy labels, etc.

[R4] Statistical information (community standard: Data Cube)

Life expectancy for the countries of the world over time (dataset: Gapminder)

[R4] Temporal information (community standard: OWL Time)

Timeline of cars produced by Ford, Chevrolet, Porsche, and Toyota between 1903 and 2016 (dataset: DBpedia).

A complete FAIR data use case

CoDa: Cooperation Databank

Current deployments

Netwerk Digitaal Erfgoed
Muziekweb
IISG
Kadaster Data Science

Current deployments

AdamNet
CLARIAH
PLDN
KR&R

FAIR Data: The New Default


Wouter Beek (wouter@triply.cc)
January 22nd, 2020