Digging Into the Knowledge Graph

Triply B.V.
VU University Amsterdam

A global Knowledge Graph emerges

A global Knowledge Graph emerges (1/2)

  1. Digitization of artifacts.
  2. Online publication of artifacts.
  3. Unique naming of artifacts.
  4. Relating/linking of artifacts.

Web of Documents

A global Knowledge Graph emerges (2/2)

  1. Digitization of artifacts.
  2. Online publication of artifacts.
  3. Unique naming of artifacts.
  4. Relating/linking of artifacts.

Web of Data

If everybody starts doing this,
a global Knowledge Graph emerges…

Digging Into the Knowledge Graph

Indexing the LOD Cloud

  • A self-organising process of knowledge creation
  • Current indexing approaches are manual.
  • Difficult to grasp what is there, what it means, and what to reuse.
  • Use knowledge classification schemes (BCC, UDC) as reference systems to develop generic principles of indexing.


  • Findability of vocabularies.
  • Recommendations for vocabulary reuse.
  • Guidelines for archiving vocabularies.

Research questions (1/3): KOS

  • Which vocabularies are used in the LOD Cloud?
  • What are the features of vocabularies in the LOD Cloud?
  • How can high-level features of the LOD Cloud be described using general schemes of classification?

Research questions (2/3): SW

  • How to index & search for existing vocabularies?
  • How to automatically find candidates for reuse?
  • How to efficiently store and query LOD vocabularies?

Research questions (3/3): Data Curation

  • What are best practices for curating LOD?
  • How to identify ‘endangered’ vocabularies?
  • How to organize provenance and version control for LOD?

The global Knowledge Graph is diverse


The global Knowledge Graph is diverse

  • Folksonomies
    • SHOE, Microformats/Microdata/RSS
    • SKOS
  • Frame languages
    • KL-ONE, LOOM
    • KIF, SUO-KIF
  • Formal languages
    • CycL, DAML+OIL
    • Common Logic, RDF/OWL/RIF

This diversity is needed

The Knowledge Graph is (even more) diverse

  • Authority files
  • Prototypes
  • Thesauri
  • Taxonomies
  • Topic maps
  • Typology

Tap Into the Knowledge Graph

The cost of studying/hosting/archiving the global Knowledge Graph


Time data scientists spend on finding & cleaning data.

The global Knowledge Graph is not FAIR

Most data is:

  • not Findable
  • not Accessable
  • not Interpretable
  • not Reusable

LOD Laundromat

Published at DANS

>65K datasets, >38B facts

Academic use cases

Reproducible research
  • L. Rietveld, W. Beek & S. Schlobach, 2015. “LOD Lab: Experiments at LOD Scale”, ISWC 2015. Best Paper Award.
Large-scale data cleaning
  • W. Beek, F. Ilievski, J. Debattista, S. Schlobach & J. Wielemaker, “Literally better: Analyzing and Improving the Quality of Literals”, Semantic Web Journal 2017.
Semantic search engines
  • F. Ilievski, W. Beek, M. Van Erp, L. Rietveld & S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016. Best LOD Application Award.
Large-scale querying
  • J. Fernández, W. Beek, M. Martínez-Prieto & M. Arias, “LOD-a-lot: A Queryable Dump of the LOD Cloud”, ISWC 2017.
  • W. Beek, J. Fernández & R. Verborgh, “LOD-a-lot: A Single-file Enabler for Data Science”, 13th Int. Conf. on Semantic Systems 2017.
  • W. Beek, L. Rietveld, S. Schlobach & F. Van Harmelen, “LOD Laundromat: Why the Semantic Web Needs Centralization (Even If We Don't Like It)”, IEEE Internet Computing 2016.
  • L. Rietveld, R. Verborgh, W. Beek, M. Vander Sande & S. Schlobach. 2015. “Linked Data-as-a-Service: The Semantic Web Redeployed”, ESWC 2015.
Erroneous link detection
  • W. Beek, J. Raad, J. Wielemaker & F. van Harmelen “ The Closure of 500M owl:sameAs Statements”, ESWC 2018. Best Resource Paper Award.
  • J. Raad, W. Beek, F. Van Harmelen, N. Pernelle & F. Saïs, “Detecting Erroneous Identity Links on the Web using Network Metrics”, ISWC 2018.

Make it affordable to publish/host/archive KOS

We need to significantly lower the cost of storing and disseminating structured data.



Fernández, J.; Beek, W.; Martínez-Prieto, M.; Arias, M. “LOD-a-lot: A Queryable Dump of the LOD Cloud” In: International Semantic Web Conference 2017 (link).

Indexing the global Knowledge Graph

(Data Observatory)

Current indexing approaches

Example: boiling water in LOV

  • No quantification of (re)use.
  • No compositionality: boiling ⊕ water
  • No understanding of the encoded knowledge (e.g., subsumption relation)

Indexing dimensions

  • versions
  • size
  • (re)use
  • distribution
  • naming
  • structure
  • expressiveness

Studying the global Knowledge Graph

(Empirical Semantics)

2 notions of meaning

What formal semantics prescribes

What people do with it in practice

Jim Hendler, ESWC 2016

What is Empirial Semantics?

The empirical (i.e., non-analytic) analysis of meaning.

(We still use model theory and other formalisms in order to describe the outcomes of our analyses, but we do not use formalisms in order to prescribe what a given expressions ought to mean.)

Why is Empirical Semantics needed? (1/2)

Some aspects of meaning cannot be captured by formal meaning, but we still want to study them.

(We must observe these non-formal aspects of meaning empirically.)

Formal semantics cannot capture all aspects of meaning

Graph G₁

id:store def:sells id:tent.
id:tent  def:costs "¥150,000".
id:tent  rdf:type  id:Product.

Graph G₂

fy:aHup   pe:ko9sap_ fy:jufn12.
fy:jufn12 pe:oao9_   "Ufou".
fy:jufn12 rdf:type   fyufnt:tmffqt.

Graphs G₁ and G₂ are true in the same models.

Social Meaning

In the early days of the Semantic Web (2003) the non-formal aspects of meaning were actively discussed: link.

“An RDF graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF information.”
“Human publishers of RDF content commit themselves to the mechanically-inferred social obligations.”
“The meaning of an RDF document includes the social meaning, the formal meaning, and the social meaning of the formal entailments.”

Why is Empirical Semantics needed? (2/2)

Some aspects of meaning could (theoretically) have been captured by formal meaning, but are observed to not be captured as such in common practice.

(We must observe what ‘common practice’ is empirically.)

Formally incorrect, but not meaningless

bpo:has_event rdfs:domain bpo:person.
bpo:has_event rdfs:domain bpo:event.
bpo:has_event rdfs:domain bpo:disease.
Examples from BioPortal.

Empirical research fields require infra

Like other empirical research fields, Empirical Semantics requires a serious investment in infrastructure.

LOD Observatories are needed to observe and analyse the large-scale use of Knowledge Graphs in practice.

Network Structure as a Proxy for Meaning

Network structure visually corresponds to aspects of meaning.



Theory 1: Descriptivism (

<div vocab="" typeof="Movie">
  <h1 property="name">Avatar</h1>
  <div property="director" typeof="Person">
    Director: <span property="name">James Cameron</span>
    (born <time property="birthDate" datetime="1954-08-16">August 16, 1954</time>)
  <span property="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" property="trailer">Trailer</a>

Theory 2: Rigid designation (Linked Data)

Thank you for your attention!

Triply B.V.
VU University Amsterdam