Querying Knowledge Graphs


Wouter Beek (wouter@triply.cc)
Thomas de Groot (thomas.de.groot@triply.cc)


https://triply.cc

Planning

  • 13:45-14:30
  • 14:45-15:30
  • 15:45-16:30
  • 16:45-17:30

Linkjes

SPARQL Forms

4 SPARQL forms

ask
Graph → Y/N
construct
Graph → Graph
describe
IRIs → Graph
select
Graph → Table

select

Graph → Table

select query

  • RDF data is stored in a graph.
  • A select query creates a tabular view by matching a pattern against the graph data.

select query concepts


TableSPARQL
columnvariable
cellbinding
rowresult

select query components


Prologue
Declarations used throughout the query.
Projection
Specifies the variables that make up the columns of the table.
Pattern
Specifies the bindings that make up the cells of the table.
Modifier
Performs operations on the rows of the table.

select

Triple Patterns

Our first select query

Projection (columns)
select ?s ?p ?o
Pattern (cells)
{ ?s ?p ?o. }
Modifier (rows)
limit 10

Table of Pokémon

<https://triplydb.com/academy/pokemon/vocab/colour>
Match specific arcs in the graph.
?pokemon and ?color
Use descriptive names for variables.

Table of yellow Pokémon

"yellow"
Only return results that have this specific value.

Bind: introduce a new variable

bind(VALUE as VARIABLE)
Add a column with values that are not matched in the graph.

Bind: introduce a geo literal

"LEXICAL-FORM"^^<DATATYPE-IRI>
Notation for literals.

Bind + calculations

bind(?x * ?y as ?z)
Bind the result of multiplying ?x and ?y to ?z.

Abbreviation: IRI prefix notation

Abbreviated query notation
vocab:colour
Abbreviated result set notation
Example: pokémon:flareon

Projection: column order

?color
The first column contains colors.
?pokemon
The second column contains Pokémon IDs.

Projection: add/remove columns

?color
Only return the column for colors.
?pokemon
A hidden variable: one whose bindings are not returned.

Projection: generic

select *
Return columns for all variables. Columns appear in unspecified order.

HTML templating

{{VARIABLE}}
Use ?VARIABLE in a template string.

Limit the number of rows

limit 250
Return at most 250 rows.

Skip a number of rows

offset 250
Skip the first 250 rows, returning the 251st through the 275th row.

Exercise

Write a SPARQL select query over your own data.


Suggested endpoints

Summary

Construct Purpose Examples
Prefix Abbreviate syntax prefix ex: <https://example.com/>
Projection Select columns select ?x ?y
select *
Pattern Match cell values { ?s ?p ?o. }
{ ?s ex:p ?o. }
Binding Introduce new variables bind('Hi!' as ?widget)
Template Return HTML widgets bind('<img src="{{image}}">' as ?widget)
Limit Set a maximum number of rows. limit 10
Offset Skip a number of rows. offset 10

select

Graph Patterns

Graph Pattern: One Triple Pattern

Graph patterns contain zero or more Triple Patterns.

Graph Pattern: Two Triple Patterns

?pokemon
A shared variable connects two or more triple patterns.
. (dot)
Marks the end of a Triple Pattern.

Graph Pattern: Four Triple Patterns


Graph Pattern: Abbreviated notation

; (semi-colon)
Repeat previous subject term.
, (comma)
Repeat previous subject and predicate terms.

Multiple values

filter( … )
A non-graph restriction that is added to the pattern.
X != Y, X < Y, …
X and Y must not be the same, X must be smaller than Y, etc.

Filter by language

lang(…)
Returns the language of a language-tagged string.
filter( A && B )
Apply filter A ánd filter B.

Graph pattern: Five Triple Patterns

IRI abbreviations

NotationExample
Absolute IRI<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
Relative IRI<type> (requires base)
Prefixed IRIrdf:type (requires prefix)
Type IRIa (only in predicate position)

Group Pattern abbreviation

SymbolNotation type
.Single triple
;Predicate list
,Object list

Datatype IRI abbreviations

ExampleDatatype IRI
falsexsd:boolean
11xsd:integer
1.1xsd:decimal
1.1e0xsd:double
"abc"xsd:string

Property Paths

P/Q
Sequence: first follow P, then follow Q.
P|Q
Choice: follow P ór follow Q.
P+
Follow P one or more times.
P*
Follow P zero or more times.

Making the query more specific

Instantiating a variable makes the query more specific.

Sort rows

order by ?x
Sorts rows from least happy to most happy Pokémon.
order by ?x ?y ?z
It is possible to sort by multiple criteria.

Inversely sort rows

order by desc(?x)
Inversely sort rows (descending).

SPARQL Gallery


              
'''…'''^^rdf:HTML
An HTML string with unescaped newlines and quotes.
?widget
Widget cards displayed in the gallery.

Exercise

What is the heaviest dragon, and how does it sound?

Pokémon Endpoint

GeoSPARQL

Geospatial data model (GeoSPARQL)

A feature can have 2D ánd 3D shapes; it can have; serializations in GML ánd in WKT.

GeoSPARQL: Geometry

geo:hasGeometry and geo:asWKT
geosparql, standardized by the Open Geospatial Consortium (OGC).
?shapeLabel
Popup for the shape bound to ?shape.

GeoSPARQL: Anonymous node syntax

a
Abbreviation for the predicate term rdf:type.
[ P O ]
Anonymous node notation (square brackets, […]) can be used to abbreviate unused subject terms.

Find a Dutch building

Exercise

Find your house or street in the Netherlands.

By using the API elements in this query.

GeoSPARQL: 3D geometries

?shapeColor
Color of the shape bound to ?shape.
?shapeHeight
Height of the shape bound to ?shape.
?shapeLabel
Label for the shape bound to ?shape.
values (?var … ?var) { (?term … ?term) … (?term … ?term) }
Specify multiple bindings.

Exercise

Find your house or street in the Netherlands.

By editing the query string directly in this query.

Color Schemes


              
Color names
CSS Color Values, HDL Color Codes, RGB Color Codes
Color gradients
colormap & Color Brewer

GeoSPARQL + modifiers: order by

Show the 25 oldest buildings in Apeldoorn.

Federation

Federation

service <URL> { Q }
Run SPARQL select query Q on the SPARQL endpoint located at URL.
https://dbpedia.org/sparql
SPARQL endpoint over the Linked Data version of Wikipedia.

Exercise

Write a federative query

Start at your own endpoint, and federate to an endpoint of one of your colleagues, or federate to DBpedia (https://dbpedia.org/sparql).

Endpoints must share at least one term (IRI or literal), maybe add a link with owl:sameAs.

Hierarchies

Transitive predicates

Hierarchy: Org Chart

Uses the Historical International Standard Classification of Occupations (HISCO) dataset.

Hierarchy: TreeMap

Uses the Historical International Standard Classification of Occupations (HISCO) dataset.

Aggregation

What is aggregation?

One or more functions that are applied to groups of values.

The groups are generated for each unique combination of values for a specified set of variables.

An example of groups

The set of variables is {?pokemon}.

The groups are the sets of names per Pokémon.

?pokemon?name
id:abomasnow"ABOMASNOW"@it-it
id:abomasnow"ABOMASNOW"@es-es
id:abomasnow"ABOMASNOW"@en-us
id:abomasnow"BLIZZAROI"@fr-fr
id:abomasnow"REXBLISAR"@de-de
id:abomasnow"ユキノオー"@ja-ja
id:abra"ABRA"@it-it
id:abra"ABRA"@fr-fr
id:abra"ABRA"@es-es
id:abra"ABRA"@de-de
id:abra"ABRA"@en-us
id:abra"ケーシィ"@ja-ja

Count function

count(…)
Applies the count function to each group of names.
group by ?pokemon
Explicit grouping criterion.

Count function + implicit grouping

Implicit grouping
When there is at least one aggregation function (e.g., count) and there is no group by clause.

Implicit grouping gone wrong

Implicit grouping
The implicitly grouped-by variables are the ones that are (1) visible, and that (2) are not input for an aggregation function.

Grouping variable hiding

The grouping variables must occur in the projection. This means that a sub-select is required in order to exclude grouping variables from the outer projection.

Concatenation function

concat(…)
Concatenate all arguments into one new string.
group_concat(…;separator=…)
Concatenate all bindings, interspersed with separators, into one new string.

Nested aggregate

First count, then calculate the maximum count.

Aggregate evidence

Compute a minimum and/or maximum value ánd return a resource for which that value occurs (i.e., the ‘evidence’).

DataCube

DataCube: Observation

observation:0007ddade4 a qb:Observation;
  qb:dataSet dataset:countries;
  dimension:location country:Netherlands;
  dimension:year "2002"^^xsd:gYear;
  measure:lifeExpectancy 7.9696e1.

DataCube: Dataset

dataset:countries a qb:DataSet;
  qb:structure dsd:countries;
  sdmx-attribute:unitMeasure dbr:Year.

DataCube: Data Structure Definition

dsd:countries a qb:DataStructureDefinition;
  qb:component
    [ qb:dimension dimension:location ],
    [ qb:dimension dimension:year ],
    [ qb:measure measure:lifeExpectancy ],
    [ qb:attribute sdmx-attribute:unitMeasure;
      qb:componentAttachment qb:Dataset ].

dimension:location a qb:DimensionProperty;
  qb:concept sdmx-concept:refArea;
  rdfs:range vocab:Country;
  rdfs:subPropertyOf sdmx-dimension:refArea.

dimension:year a qb:DimensionProperty;
  qb:concept sdmx-concept:refPeriod;
  rdfs:range xsd:gYear;
  rdfs:subPropertyOf sdmx-dimension:refPeriod.

measure:lifeExpectancy a qb:MeasureProperty;
  rdfs:range xsd:double;
  rdfs:subProperty sdmx-measure:obsValue.

Plot a measure for one dimension


              
Fixed dimension
dimension:year "2007"^^xsd:gYear
Plotted dimension
dimension:location/rdfs:label ?country
Plotted measure
measure:lifeExpectancy ?value

Plot a measure for one dimension


              
  • Column 1: plotted dimension
  • Column 2: plotted measure
  • Column 3: coordinate label
  • Column 4: tooltip

Post-processing


              
  • Caption, axes, legend
  • Linear/polynomial trend
  • Error bars
  • Log scale

Thank you for your attention!


Wouter Beek (wouter@triply.cc)
Thomas de Groot (thomas.de.groot@triply.cc)


https://triply.cc