MetaLink

A Travel Guide to the LOD Cloud


Wouter Beek (wouter@triply.cc), Joe Raad (j.raad@vu.nl), Erman Acar (erman.acar@vu.nl), Frank van Harmelen (frank.van.harmelen@vu.nl)

Triply B.V.
VU University Amsterdam

Linked Data Reuse

“Include links to other URIs, so that [data clients] can discover more things.”
4th Linked Data Rule (source)
“Link your data to other data to provide context.”
5th Linked Data star (source)

Linked Data is not possible without owl:sameAs


Corellary

Linked Data is not possible without formal logic.

Let's traverse the
LOD Cloud…

  1. Start at node dbr:President_Barack_Obama.
  2. Follow an incoming owl:sameAs link to go to node fb:m.05b6w1g.
  3. Follow an outgoing owl:sameAs link to go to node dbr:Barack_Obama_Cabinet.


Oops! President Obama ≠ The Obama Cabinet

We need a Travel Guide that informs us where we can go (and what the risks are).

The percentage of incorrect owl:sameAs statements is…

2.8%
Hogan, A., Zimmermann, A., Umbrich, J., Polleres, A., Decker, S. 2012. “Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation Over Linked Data Corpora”. Web Semantics: Science, Services and Agents on the World Wide Web Vol. 10, pp. 76–110.
4%
Raad, J. 2018. Identity Management in Knowledge Graphs. Ph.D. thesis, University of Paris-Saclay.
20%
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S. 2010. “When owl:sameAs isn't the Same: An Analysis of Identity in Linked Data”. International Semantic Web Conference, pp. 305–320.

Detecting incorrect identity statements

Similarity of textual descriptions
  • J. Cuzzola, E. Bagheri, J. Jovanovic. 2015. “Filtering Inaccurate Entity Co-references on the Linked Open Data” International DEXA Conference, pp. 128–143.
Violations of UNA heuristic
  • G. de Melo. 2013. “Not Quite the Same: Identity Constraints for the Web of Linked Data” AAAI.
  • A. Valdestilhas, T. Soru, A. Ngomo. 2017. “Cedal: Time-efficient Detection of Erroneous Links in Large-scale Link Repositories” International Conference on Web Intelligence, pp. 106–113.
Detection of logical inconsistencies
  • A. Hogan, A. Zimmermann, J. Umbrich, A. Polleres, S. Decker. 2012. “Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation over Linked Data Corpora” Web Semantics, Vol. 10, pp. 76–110.
  • L. Papaleo, N. Pernelle, F. Saïs, C. Dumont. 2014. “Logical Detection of Invalid sameAs Statements in RDF Data” EKAW, pp. 373–384.
Network metrics
  • C. Guéret, P. Groth, C. Stadler, J. Lehmann. 2012. “Assessing Linked Data Mappings Using Network Measures” Extended Semantic Web Conference, pp. 87–102
Crowd-sourcing
  • M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, S. Auer, J. Lehmann. 2013. “Crowd-sourcing Linked Data Quality Assessment” International Semantic Web Conference, pp. 260–276
Overview paper
  • J. Raad, N. Pernelle, F. Saïs, W. Beek, F. van Harmelen. 2019. “The sameAs Problem: A Survey on Identity Management in the Web of Data” (under review) (link)

MetaLink Requirements

    Scalable
    Applicable to the whole LOD Cloud (billions of triples).
    Ordered
    Distinguish between subject/from and object/to nodes.
    Modular
    Plug it into any Linked Dataset.
    Standards-compliant
    LD clients must be able to use it.
    Broadly applicable
    Cover a broad range of research goals and applications.
    Low-cost
    Every researcher should be able to affort it.

MetaLink Concepts

Partitions
  • Equivalence Sets
  • Communities (Louvain algorithm, J. Raad, Beek, F. van Harmelen, N. Pernelle, F. Saïs. 2018. “Detecting Erroneous Identity Links on the Web using Network Metrics” ISWC)
Links
Identity Statements
  • Explicit: appear in the LOD Cloud.
  • Implicit: derived from the explicit identity statements.
Community links
  • Intra-community links (within)
  • Inter-community links (from/to)

Explicit identity statements for ‘Barack Obama’

‘Barack Obama’ after community detection

Communities correspond to roles:

  • person
  • senator
  • president
  • government

MetaLink vocabulary

MetaLink Quantified

Class№ instances
meta:IdentityStatement556,152,454
meta:Community55,697,160
meta:EquivalenceSet48,999,148
№ implicit identity statements35,201,120,188
Property№ triples
meta:error556,152,454
meta:community410,706,139
meta:fromCommunity145,446,315
meta:toCommunity145,446,315
meta:cardinality48,999,148
meta:equivalenceSet55,697,160
meta:cardinality48,999,148
Total4,352,602,480

Online publication

MetaLink
https://krr.triply.cc/krr/metalink (example)
LOD-a-lot (Fernandez 2017)
https://krr.triply.cc/krr/lod-a-lot
LOD-a-lot + MetaLink
https://krr.triply.cc/krr/lod-a-lot-metalink

Use case 1: Follow-Your-Nose

Problem

fb:m.05b6w1g owl:sameAs dbr:President_Barack_Obama. # ←
fb:m.05b6w1g owl:sameAs dbr:Barack_Obama_Cabinet.   # →
Oops, a lightweight client bumps into inconsistency after 2 hops…

Solution

select ?error {
  [ rdf:subject fb:m.05b6w1g;
    rdf:object dbr:President_Barack_Obama;
    meta:error ?error ].
}
With MetaLink, a lightweight client can probe how safe an owl:sameAs link is (run online).

Use case 2: Question Answering

Through which countries does the Yenisei river flow? (Lopez et al. 2013)
select distinct ?uri ?string {
  dbr:Yenisei_River owl:sameAs*/dbp:country/owl:sameAs* ?uri.
  optional {
    ?uri rdfs:label ?string.
    filter(lang(?string) = "en")
  }
}

Returns over 30K results, including hundreds of unrelated geographic places, the concept of creative writing, and the mythical creature Gorgon.

Setting error value < 0.3 only returns identifiers for Russia and Mongolia (the correct answers).

Use case 2: Question Answering

What are the band members of ABBA? (Buikstra et al. 2011)
Result≤ 1.0≤ 0.8≤ 0.6≤ 0.4≤ 0.2≤ 0.0
Björn Ulvaeus (band member)2888322
Agnetha Fältskog (band member)2644211
Anni-Frid Lyngstad (band member)933211
Benny Andersson (band member)622111
Ola Brukert (drummer)322111
Agnetha Ulvaeus (Agnetha F. married name)200000
Stig Andersson (band manager)944111
Gert van der Graaf (stalker of Agnetha Fältskog)200000
Benny Anderssons Orkester (new band)533000
Stig Andersson (sportsman)222000
Results of the ABBA band member query using different error degrees in MetaLink.

Use case 3: Fuzzy Reasoning

Fuzzy identity function (t-conorm): commutative, monotonic, associative (compositionality).

This still allows for Linked Data reuse: see i1

This even allows for better Linked Data reuse: see i3

Use case 4: Link Error Detection for Subsets

Apply other, more computationally intensive, approaches for links with error values [0.3,0.7].

Use case 5: Link Error Benchmarking

Current sets are small / dataset-specific: DBpedia-based, OAEI.

MetaLink

A Travel Guide to the LOD Cloud

https://krr.triply.cc/krr/metalink


Wouter Beek (wouter@triply.cc), Joe Raad (j.raad@vu.nl), Erman Acar (erman.acar@vu.nl), Frank van Harmelen (frank.van.harmelen@vu.nl)