A Travel Guide to the LOD Cloud

Wouter Beek (, Joe Raad (, Erman Acar (, Frank van Harmelen (

Triply B.V.
VU University Amsterdam

Linked Data Reuse

“Include links to other URIs, so that [data clients] can discover more things.”
4th Linked Data Rule (source)
“Link your data to other data to provide context.”
5th Linked Data star (source)

Linked Data is not possible without owl:sameAs


Linked Data is not possible without formal logic.

Let's traverse the
LOD Cloud…

  1. Start at node dbr:President_Barack_Obama.
  2. Follow an incoming owl:sameAs link to go to node fb:m.05b6w1g.
  3. Follow an outgoing owl:sameAs link to go to node dbr:Barack_Obama_Cabinet.

Oops! President Obama ≠ The Obama Cabinet

We need a Travel Guide that informs us where we can go (and what the risks are).

The percentage of incorrect owl:sameAs statements is…

Detecting incorrect identity statements

Similarity of textual descriptions
Violations of UNA heuristic
Detection of logical inconsistencies
Network metrics
Overview paper
MetaLink Requirements

    Applicable to the whole LOD Cloud (billions of triples).
    Distinguish between subject/from and object/to nodes.
    Plug it into any Linked Dataset.
    LD clients must be able to use it.
    Broadly applicable
    Cover a broad range of research goals and applications.
    Every researcher should be able to affort it.

MetaLink Concepts

  • Equivalence Sets
  • Communities (Louvain algorithm, J. Raad, Beek, F. van Harmelen, N. Pernelle, F. Saïs. 2018. “Detecting Erroneous Identity Links on the Web using Network Metrics” ISWC)
Identity Statements
  • Explicit: appear in the LOD Cloud.
  • Implicit: derived from the explicit identity statements.
Community links
  • Intra-community links (within)
  • Inter-community links (from/to)

Explicit identity statements for ‘Barack Obama’

‘Barack Obama’ after community detection

Communities correspond to roles:

  • person
  • senator
  • president
  • government

MetaLink vocabulary

MetaLink Quantified

Class№ instances
№ implicit identity statements35,201,120,188
Property№ triples

Online publication

MetaLink (example)
LOD-a-lot (Fernandez 2017)
LOD-a-lot + MetaLink

Use case 1: Follow-Your-Nose


fb:m.05b6w1g owl:sameAs dbr:President_Barack_Obama. # ←
fb:m.05b6w1g owl:sameAs dbr:Barack_Obama_Cabinet.   # →
Oops, a lightweight client bumps into inconsistency after 2 hops…


select ?error {
  [ rdf:subject fb:m.05b6w1g;
    rdf:object dbr:President_Barack_Obama;
    meta:error ?error ].
With MetaLink, a lightweight client can probe how safe an owl:sameAs link is (run online).

Use case 2: Question Answering

Through which countries does the Yenisei river flow? (Lopez et al. 2013)
select distinct ?uri ?string {
  dbr:Yenisei_River owl:sameAs*/dbp:country/owl:sameAs* ?uri.
  optional {
    ?uri rdfs:label ?string.
    filter(lang(?string) = "en")

Returns over 30K results, including hundreds of unrelated geographic places, the concept of creative writing, and the mythical creature Gorgon.

Setting error value < 0.3 only returns identifiers for Russia and Mongolia (the correct answers).

Use case 2: Question Answering

What are the band members of ABBA? (Buikstra et al. 2011)
Result≤ 1.0≤ 0.8≤ 0.6≤ 0.4≤ 0.2≤ 0.0
Björn Ulvaeus (band member)2888322
Agnetha Fältskog (band member)2644211
Anni-Frid Lyngstad (band member)933211
Benny Andersson (band member)622111
Ola Brukert (drummer)322111
Agnetha Ulvaeus (Agnetha F. married name)200000
Stig Andersson (band manager)944111
Gert van der Graaf (stalker of Agnetha Fältskog)200000
Benny Anderssons Orkester (new band)533000
Stig Andersson (sportsman)222000
Results of the ABBA band member query using different error degrees in MetaLink.

Use case 3: Fuzzy Reasoning

Fuzzy identity function (t-conorm): commutative, monotonic, associative (compositionality).

This still allows for Linked Data reuse: see i1

This even allows for better Linked Data reuse: see i3

Use case 4: Link Error Detection for Subsets

Apply other, more computationally intensive, approaches for links with error values [0.3,0.7].

Use case 5: Link Error Benchmarking

Current sets are small / dataset-specific: DBpedia-based, OAEI.


