Digging Into the Knowledge Graph


Wouter Beek (wouter@triply.cc)
October 21st, 2019

A global Knowledge Graph emerges

A global Knowledge Graph emerges (1/2)

  1. Digitization of artifacts.
  2. Online publication of artifacts.
  3. Unique naming of artifacts.
  4. Relating/linking of artifacts.

Web of Documents

A global Knowledge Graph emerges (2/2)

  1. Digitization of artifacts.
  2. Online publication of artifacts.
  3. Unique naming of artifacts.
  4. Relating/linking of artifacts.

Web of Data

If everybody starts doing this,
a global Knowledge Graph emerges…

Digging Into the Knowledge Graph

Indexing the LOD Cloud

  • A self-organising process of knowledge creation
  • Current indexing approaches are manual.
  • Difficult to grasp what is there, what it means, and what to reuse.
  • Use knowledge classification schemes (BCC, UDC) as reference systems to develop generic principles of indexing.

Objectives

  • Findability of vocabularies.
  • Recommendations for vocabulary reuse.
  • Guidelines for archiving vocabularies.

Research questions (1/3): KOS

  • Which vocabularies are used in the LOD Cloud?
  • What are the features of vocabularies in the LOD Cloud?
  • How can high-level features of the LOD Cloud be described using general schemes of classification?

Research questions (2/3): SW

  • How to index & search for existing vocabularies?
  • How to automatically find candidates for reuse?
  • How to efficiently store and query LOD vocabularies?

Research questions (3/3): Data Curation

  • What are best practices for curating LOD?
  • How to identify ‘endangered’ vocabularies?
  • How to organize provenance and version control for LOD?

The global Knowledge Graph is diverse

LOD Cloud 2020-05-20 (lod-cloud.net)

The global Knowledge Graph is diverse

  • Folksonomies
    • SHOE, Microformats/Microdata/RSS
    • Schema.org
    • SKOS
  • Frame languages
    • KL-ONE, LOOM
    • KIF, SUO-KIF
  • Formal languages
    • CycL, DAML+OIL
    • Common Logic, RDF/OWL/RIF

This diversity is needed

The Knowledge Graph is (even more) diverse

  • Authority files
  • Prototypes
  • Thesauri
  • Taxonomies
  • Topic maps
  • Typology

Tap Into the Knowledge Graph

The cost of studying/hosting/archiving the global Knowledge Graph

80%

Time data scientists spend on finding & cleaning data.

The global Knowledge Graph is not FAIR

Most data is:

  • not Findable
  • not Accessable
  • not Interpretable
  • not Reusable

LOD Laundromat


  • 650K datasets; 38B triples
  • Best Linked Data Application Award 2015
  • Best Paper Award International Semantic Web Conference 2015
  • WDS Data Stewardship Award 2018

Published at DANS

https://doi.org/10.17026/dans-znh-bcg3

>65K datasets, >38B facts

Academic use cases

Reproducible research
  • L. Rietveld, W. Beek & S. Schlobach, 2015. “LOD Lab: Experiments at LOD Scale”, ISWC 2015. Best Paper Award.
Large-scale data cleaning
  • W. Beek, F. Ilievski, J. Debattista, S. Schlobach & J. Wielemaker, “Literally better: Analyzing and Improving the Quality of Literals”, Semantic Web Journal 2017.
Semantic search engines
  • F. Ilievski, W. Beek, M. Van Erp, L. Rietveld & S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016. Best LOD Application Award.
Large-scale querying
  • J. Fernández, W. Beek, M. Martínez-Prieto & M. Arias, “LOD-a-lot: A Queryable Dump of the LOD Cloud”, ISWC 2017.
  • W. Beek, J. Fernández & R. Verborgh, “LOD-a-lot: A Single-file Enabler for Data Science”, 13th Int. Conf. on Semantic Systems 2017.
  • W. Beek, L. Rietveld, S. Schlobach & F. Van Harmelen, “LOD Laundromat: Why the Semantic Web Needs Centralization (Even If We Don't Like It)”, IEEE Internet Computing 2016.
  • L. Rietveld, R. Verborgh, W. Beek, M. Vander Sande & S. Schlobach. 2015. “Linked Data-as-a-Service: The Semantic Web Redeployed”, ESWC 2015.
Erroneous link detection
  • W. Beek, J. Raad, J. Wielemaker & F. van Harmelen “sameAs.cc: The Closure of 500M owl:sameAs Statements”, ESWC 2018. Best Resource Paper Award.
  • J. Raad, W. Beek, F. Van Harmelen, N. Pernelle & F. Saïs, “Detecting Erroneous Identity Links on the Web using Network Metrics”, ISWC 2018.

Make it affordable to publish/host/archive KOS

We need to significantly lower the cost of storing and disseminating structured data.

LOD-a-lot

Fernández, J.; Beek, W.; Martínez-Prieto, M.; Arias, M. “LOD-a-lot: A Queryable Dump of the LOD Cloud” ISWC 2017 (paper) (dataset)

Indexing the global Knowledge Graph

(Data Observatory)

Current indexing approaches

Example: boiling water in LOV

  • No quantification of (re)use.
  • No compositionality: boiling ⊕ water
  • No understanding of the encoded knowledge (e.g., subsumption relation)

Indexing dimensions

  • versions
  • size
  • (re)use
  • distribution
  • naming
  • structure
  • expressiveness

Studying the global Knowledge Graph

(Empirical Semantics)

Two forms of meaning

What formal semantics prescribes
What people do with it in practice

Borrowed from Jim Hendler's ESWC 2016 keynote.

What is Empirial Semantics?


The empirical (i.e., non-analytic) analysis of meaning.


(We still use model theory and other formalisms in order to describe the outcomes of our analyses, but we do not use formalisms in order to prescribe what a given expressions ought to mean.)

Why is Empirical Semantics needed?

ES-1

Some aspects of meaning cannot be captured by formal meaning, but we still want to study them.

ES-2

Some aspects of meaning could have been captured by formal meaning, but are observed to not be captured as such in common practice.

ES-1: Non-formal aspects of meaning

Graph A


                id:store def:sells id:tent.
                id:tent  def:costs "¥150,000".
                id:tent  rdf:type  id:Product.
            

Graph B


                fy:aHup   pe:ko9sap_ fy:jufn12.
                fy:jufn12 pe:oao9_   "Ufou".
                fy:jufn12 rdf:type   fyufnt:tmffqt.
            

Graphs A and B are true in the same models.

ES-1: Social Meaning

Non-formal meaning was discussed in the early days of the Semantic Web (2003) (link)


“An RDF graph may contain information that is opaque to logical reasoners.”
“Human publishers of RDF content commit themselves to the mechanically-inferred social obligations.”
“The meaning of an RDF document includes the social meaning, the formal meaning, and the social meaning of the formal entailments.”

ES-2: Formally incorrect, but not meaningless



                  bpo:has_event rdfs:domain bpo:person.
                  bpo:has_event rdfs:domain bpo:event.
                  bpo:has_event rdfs:domain bpo:disease.
              
Examples from BioPortal.

Empirical research fields require infra

Like other empirical research fields, Empirical Semantics requires a serious investment in infrastructure.

LOD Observatories are needed to observe and analyse the large-scale use of Knowledge Graphs in practice.

Network Structure as a Proxy for Meaning

Network structure visually corresponds to aspects of meaning.

skos:exactMatch
foaf:knows
osspr:contains
geopolitics:hasBorderWidth

Naming

Theory 1: Descriptivism (Schema.org)

<div vocab="http://schema.org/" typeof="Movie">
  <h1 property="name">Avatar</h1>
  <div property="director" typeof="Person">
    Director: <span property="name">James Cameron</span>
    (born <time property="birthDate" datetime="1954-08-16">August 16, 1954</time>)
  </div>
  <span property="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" property="trailer">Trailer</a>
</div>
            

Theory 2: Rigid designation (Linked Data)

http://www.imdb.com/title/tt0499549/

Digging Into the Knowledge Graph


Wouter Beek (wouter@triply.cc)
October 21st, 2019