How to Set Up a Linked Data Project?


Triply Customer Team (info@triply.cc)
July 6th, 2020

Triply B.V.

Generic Project Structure: Variant 1

Generic Project Structure: Variant 2

Generic Project Steps

  1. Naming Strategy (Location Strategy)
  2. Metadata Creation
  3. Instance Data Modeling of Example Records
  4. Vocabulary Creation
  5. Feedback Cycle 1: Terminology
  6. Data Transformation (ETL)
  7. Data Linking
  8. Data Quality Analysis
  9. Feedback Cycle 2: Instance Data
  10. Setup RFC Procedure (Request For Change)
  11. Data Publication
  12. Maintenance

(1) Naming Strategy

  • Name = location (IRI)
  • Online lookup (IRI dereferencing)

(2) Metadata Creation

  • Specifies the conditions under which data can be used.
  • Determines how and whether data is findable.
  • Standards-compliant (DCAT2, VoID) & de-facto (Schema.org, OGP)

(3) Instance Data Modeling


                  pokémon:pikachu a def:Pokémon;
                    def:knows pokémon:mew;
                    def:name "ピカチュウ"@ja;
                    def:weight 60.

                  pokémon:mew a def:Pokémon;
                    def:name "ミュウ"@ja;
                    def:weight 40.
                

(4) Vocabulary Creation

Open/closed data model: blue=open, red=closed

(4) UML → RDFS+OWL

  • UML composition → qualified part-of relation (functional, physical, geospatial, etc.)
  • UML association → cross-class links
  • UML generalization → subclass hierarchy

(4) Vocabulary hierarchy

SPARQL examples

(6) Data Transformation

Traditional Extract-Transform-Load (ETL) Pipeline.

(6) Data Transformation

LD Wizard Pipeline (link)

(7) Data Linking

  • Full identity (owl:sameAs, owl:equivalentClass)
  • Approximate linking (skos:related)
  • Geospatial linking (geo:sfWithin, geo:sfOverlaps)
  • Temporal linking (time:before)

(8) Data Quality Analysis

  • Quality Dashboard (FacetCheck)
  • Quality Overview (Data Story)
  • Quality Analysis (Semantic Notebook)

(10) Setup RFC Procedure (Request For Change)

(11) Data Publication

  • Closed Data: access rights, closed license
  • Open Data: indexing, open license
  • LOD Cloud, Linked Open Vocabularies (LOV)

(12) Maintenance

  • Stay up-to-date w.r.t. source systems (delta)
  • Automatic validation (SHACL)
  • Monitoring/notifications
  • Continuous Integration (CI)
  • Environments: Acceptance → Production
  • Versioning (OWL Annotations)

How to Set Up a Linked Data Project?


Triply Customer Team (info@triply.cc)
July 6th, 2020

Triply B.V.