TreeHugger Introduction


Home | Introduction | Downloads | Playground

1. XPath Over RDF/XML

Here are two serialisations of (roughly) the same thing:

(1)

<foaf:Person>
   <foaf:name>Damian Steer</foaf:name>
   <foaf:mbox rdf:resource="mailto:pldms@mac.com"/>
   <foaf:knows>
     <foaf:Person>
       <foaf:name>Libby Miller</foaf:name>
     </foaf:Person>
   </foaf:knows>
</foaf:Person>
(2)

<rdf:Description foaf:name="Damian Steer">
   <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
   <foaf:mbox rdf:resource="mailto:pldms@mac.com"/>
   <foaf:knows rdf:nodeID="libby"/>
</rdf:Description>

<foaf:Person rdf:nodeID="libby">
   <foaf:name>Libby Miller</foaf:name>
</foaf:Person>

The second is pretty perverse, but still valid. XML tools see a great difference between to two, whereas RDF tools see none. But RDF tools take some getting used to, and TreeHugger is an attempt to ease the transition by making xpaths navigate rdf graphs.

Suppose I want to get the name of the person that the person named with mailbox pldms@mac.com knows. In case (1) I might try:

/foaf:Person/
foaf:mbox[@rdf:resource='mailto:pldms@mac.com']/
../
foaf:knows/
foaf:Person/
foaf:name/text()

In case (2) it's much harder.

Here's another one: get the mailbox of the thing of type 'http://xmlns.com/foaf/0.1/Person' named 'Damian Steer'. For case (2) we could try:

/rdf:Description[@foaf:name='Damian Steer']/
rdf:type[@rdf:resource='http://xmlns.com/foaf/0.1/Person']/
../
foaf:mbox/
@rdf:resource

This won't work for case (1) of course.

Both paths, however, work for either serialisation in TreeHugger.

2. TreeHugger

TreeHugger is implemented as a Saxon extension function. The function returns a document root (about which more below). All of the following assumes that root.

TreeHugger uses the RDF striped syntax. Paths are always of the form:

/rdf_class/rdf_property/rdf_class/rdf_property/....

As a result the paths won't match the xml case when striping is broken, for example rdf:parseType="resource" and <property otherprop="foo">.

XPath TreeHugger
/ A little complicated in the implementation, but consider it as representing the rdf graph
/<rdf_class> All the subjects in the graph of type rdf class. Example: /rdf:Seq - returns every sequence node in the graph.
./<rdf_class> Matches node of type <rdf_class>. Special cases rdf:Description and rdfs:Resource which are equivalent and match all nodes (thus /rdf:Description returns all nodes in the graph). The children of an rdf class node are all the rdf properties of that node.
<rdf_class>/@rdf_property This form uses properties as attributes of a node. For example foaf:Person[@foaf:mbox='mailto:pldms@mac.com'] matches a person with email address pldms@mac.com. There are also two pseudo properties rdf:about and rdf:nodeID which have the URI of the node (for labelled nodes) and ID (for bnodes) respectively as values. You can thus use /rdfs:Resource[@rdf:nodeID] to select all bnodes in the graph. Caution is advised when using node[@attribute='val']. If the node has more than value for a given attribute then it is uncertain which value will be used for the match. For example if a resource has more than one dc:title you will have unpredictable results finding it using rdf:Description[@dc:title='title']. This is just a limitation of my implementation.
./<rdf_property> In contrast to the above, here a property is an element. This matches properties <rdf_property> of the parent, i.e. it is an instance of the property, an rdf statement.It has one child: the object of the property. You can use rdfs:Property to match any property. Example: /rdfs:Resource[@rdf:about='http://example.com']/dc:contributor/ - contributors to example.com.
<rdf_property>/@attribute Properties can have the following attributes: rdf:resource, the URI of the object of the property (if it is a labelled resource); rdf:nodeID, the ID of the object (if it is a bnode); xml:lang, the language of the object of the property (if it is a literal); finally rdf:datatype, the datatype of the object of the property (once again, if it is a literal). So foaf:Person/foaf:name[@xml:lang='ja']/* gives the Japanese name of the person.
./text() A literal. This fits the xml usage nicely, but in rdf literals may have attributes, which I've pushed up to the property (see above). Needs work, I think.
./.. Parent. This just backs up one level in the path, and is thus dependant on the path taken. RDF people may want to look at inv:property, described below.

3. Functions

In the following I'll assume the namespace declarations:

xmlns:th="http://rootdev.net/net.rootdev.treehugger.TreeHugger"
xmlns:inv="http://rootdev.net/treehugger/inverse#"

TreeHugger currently adds three extension functions:

th:document(rdf document)
Loads an RDF document and returns a document 'root'. The RDF document may be loaded from a file or URL.
th:documentRDFS(rdf document, rdf schema)
As above, but the model will now be RDFS 'aware', i.e. subproperty and subclass closures, domain and range etc.
th:documentOWL(rdf document, owl ontology)
As above, but the model will perform OWL reasoning (in addition to RDFS reasoning?), eg. inverse functional properties, restrictions.

TreeHugger also adds a 'pseudo' function, or (better) 'pseudo' axis:

.../inv:property/...
This allows you to create the inverse of a property. For example foaf:Person/inv:dc:creator takes you to things which this person created. It's a hack, but a very useful one.

A brief example:

$documentRDFS('foo.rdf', 'http://xmlns.com/foaf/0.1/')/
foaf:Person/
inv:foaf:knows/
foaf:Person/
rdfs:label/text()

This path takes all the people in foo.rdf, goes to people who know them, and returns the label for those people. Since foaf:name is a subproperty of rdfs:label, and this is an RDFS document, this will return the names of people who know people.

4. Lists

Alt, Bag, and Seq look nasty in the rdf model. TreeHugger follows the XML serialisation to protect you:

<xsl:for-each 
  select="./rss:items/rdf:Seq/rdf:li/rdf:Resource">
     ...
     do something with each member of the sequence
     ...
</xsl:for-each>

The members of the sequence will be in the correct order, of course. rdf:parseType="collection" support will be added when I find out how to do it.

5. An Example

Take an example document like Dan's foaf file. Let's try to make an html file which lists all the people who know people, plus the people they know and a link to their mailboxes.

The TreeHugger style sheet looks like this:

<?xml version="1.0"?>

<html xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:th="http://rootdev.net/net.rootdev.treehugger.TreeHugger"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xsl:version="1.0">

  <!-- Load an rdf document -->
  
 <xsl:variable name="doc" 
   select="th:document('http://rdfweb.org/people/danbri/rdfweb/danbri-foaf.rdf')"
   />

<head>

<title>Some people</title></head>

<body>

<h2>People who know people</h2>
<h5>(the happiest people)</h5>

<!-- find people who know people -->

<xsl:for-each select="$doc/foaf:Person[@foaf:knows]">

  <!-- show their name -->

  <h4><xsl:value-of select="./foaf:name/text()"/></h4>
  
  <!-- get people they know who have a name and mailbox -->

  <xsl:for-each select="./foaf:knows/foaf:Person[@foaf:name and @foaf:mbox]">
    
    <p>knows:

    <!-- show the name in a line to their mailbox -->

    <a href="{./foaf:mbox/rdfs:Resource[last()]}"> 
    
    <xsl:value-of select="./foaf:name/text()"/></a>
    
    </p>
  </xsl:for-each>
 </xsl:for-each>

<p>

  <b><i><u>
        
    <!-- show name of person with mbox 'libby...' -->
    <!-- here properties are used as attributes and elements -->

    <xsl:value-of 
      select="$doc/foaf:Person[@foaf:mbox='mailto:libby.miller@bristol.ac.uk']/foaf:name/text()"
     />
  </u></i></b>
   
</p>
</body>
</html>
    

The result is here. (Martin Poulter knows noone because the person he knows has no mbox, btw).

5. XPath To RDF Query

Although the TreeHugger implementation doesn't work like this, a more efficient implementation could take complete path expressions and translate them to RDF queries. This would be faster over RDB-backed models. (At some point I'll write a little perl or ruby script to do this).

The following are paths translated to squish (minus namespace declarations). The node set becomes an result set with each row containing one variable binding: 'node'.

/foaf:Person/foaf:name/text()

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:name ?a ?node)
/foaf:Person[@foaf:mbox="mailto:pldms@mac.com]/foaf:name/text()

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:mbox ?a mailto:pldms@mac.com)
(foaf:name ?a ?node)
/foaf:Person/inv:foaf:knows/foaf:Person/foaf:mbox/rdfs:Resource

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:knows ?b ?a) [note: inverted]
(rdf:type ?b foaf:Person)
(foaf:mbox ?b ?node)
/rdfs:Resource/foaf:knows/foaf:Person/../../foaf:mbox/rdfs:Resource

becomes:

SELECT ?node
WHERE
(foaf:knows ?a ?b) [note: 'a' untyped in path]
(rdf:type ?b foaf:Person)
(foaf:mbox ?a ?node) [note: ../../ takes us back to 'a']

Damian Steer
Last modified: Wed Sep 10 13:23:25 BST 2003