TreeHugger

Introduction

Here are two serialisations of (roughly) the same thing:

(1)

<foaf:Person>
   <foaf:name>Damian</foaf:name>
   <foaf:mbox rdf:resource="mailto:pldms@mac.com"/>
   <foaf:knows>
     <foaf:Person>
       <foaf:name>Libby</foaf:name>
     </foaf:Person>
   </foaf:knows>
</foaf:Person>
(2)

<rdf:Description foaf:name="Damian">
   <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
   <foaf:mbox rdf:resource="mailto:pldms@mac.com"/>
   <foaf:knows rdf:resource="http://example.com/#libby"/>
</rdf:Description>

<foaf:Person rdf:about="http://example.com/#libby">
   <foaf:name>Libby</foaf:name>
</foaf:Person>

The second is pretty perverse, but still valid. XML tools see a great difference between to two, whereas RDF tools see very little (namely that one node has gained a label). RDF tools take some getting used to, and TreeHugger is an attempt to ease the transition by making xpaths navigate rdf graphs.

Suppose I want to get the name of the person that the person named with mailbox pldms@mac.com knows. In case (1) I might try:

/foaf:Person
/foaf:mbox[@rdf:resource='mailto:pldms@mac.com']
/..
/foaf:knows
/foaf:Person
/foaf:name/text()

In case (2) it's much harder.

Here's another one: get the mailbox of the thing of type 'http://xmlns.com/foaf/0.1/Person' named 'Damian'. For case (2) we could try:

/rdf:Description[@foaf:name='Damian']
/rdf:type[@rdf:resource='http://xmlns.com/foaf/0.1/Person']
/..
/foaf:mbox
/@rdf:resource

This won't work for case (1) of course.

Both paths, however, work for either serialisation in TreeHugger.

TreeHugger

TreeHugger is implemented as a Saxon extension function. The function returns a document root (about which more below). All of the following assumes that root.

XPath TreeHugger
/ A little complicated in the implementation, but consider it as representing the rdf graph
/<rdf_class> All the subjects in the graph of type rdf class. Special cases: /rdf:Description and /rdfs:Resource which return all subjects in the graph. Example: /rdf:Seq - returns every sequence node in the graph.
./<rdf_class>/ The children of an rdf class node are all the rdf properties of that node.
./<rdf_property>/ The children of an rdf property node are all the objects such that <parent, property, child node> holds. Because of the RDF/XML striped syntax the full path will always be of the form /class/property/class/property...
./<node>/@<attribute> For an rdf class node this returns the objects of property <attribute>, eg. ./foaf:Person/@foaf:name gives all the foaf:name's of the person. One special case is rdf:about which gives the URI (if any) of node, so ./rdfs:Resource[@rdf:about] selects non-blank nodes. For property nodes there is only one attribute: rdf:resource, which gives the resource objects of the property, eg: ./rdf:type[@rdf:resource='http://xmlns.com/foaf/0.1/Person'] (used above), a verbose way of saying that a node has type foaf:Person. Warning: the implementation has a limitation when the node[@attribute='val'] form is used. If the node has more than value for a given attribute then it is uncertain which value will be used for the match. For example if a resource has more than one dc:title you will have unpredictable results finding it using rdf:Description[@dc:title='title'].
./text() A literal. This fits the xml usage nicely, but in rdf literals may have attributes: language and datatype. Needs work.
./.. Parent. This just backs up one level in the path, and is thus dependant on the path taken. RDF people may want to look at inv:property, described below.

Functions

In the following I'll assume the namespace declarations:

xmlns:th="http://rootdev.net/net.rootdev.treehugger.TreeHugger"
xmlns:inv="http://rootdev.net/treehugger/inverse#"

TreeHugger currently adds three extension functions:

$th:document(rdf document)
Loads an RDF document and returns a document 'root'. The RDF document may be loaded from a file or URL.
$th:documentRDFS(rdf document, rdf schema)
As above, but the model will now be RDFS 'aware', i.e. subproperty and subclass closures, domain and range etc.
$th:documentOWL(rdf document, owl ontology)
As above, but the model will perform OWL reasoning (in addition to RDFS reasoning?), eg. inverse functional properties, restrictions.

TreeHugger also adds a 'pseudo' function, or (better) 'pseudo' axis:

.../inv:property/...
This allows you to create the inverse of a property. For example foaf:Person/inv:dc:creator takes you to things which this person created. It's a hack, but a very useful one.

A brief example:

$documentRDFS('foo.rdf', 'http://xmlns.com/foaf/0.1/')/
foaf:Person/
inv:foaf:knows/
foaf:Person/
rdfs:label/text()

This path takes all the people in foo.rdf, goes to people who know them, and returns the label for those people. Since foaf:name is a subproperty of rdfs:label, and this is an RDFS document, this will return the names of people who know people.

Lists

Alt, Bag, and Seq look nasty in the rdf model. TreeHugger follows the XML serialisation to protect you:

<xsl:for-each 
  select="./rss:items/rdf:Seq/rdf:li/rdf:Resource">
     ...
     do something with each member of the sequence
     ...
</xsl:for-each>

The members of the sequence will be in the correct order, of course. prdf:parseType="collection" support will be added when I find out how to do it.

An example

Take an example document like Dan's foaf file. Let's try to make an html file which lists all the people who know people, plus the people they know and a link to their mailboxes.

The TreeHugger style sheet looks like this:

<?xml version="1.0"?>

<html xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:th="http://foo.com/blah/net.rootdev.treehugger.TreeHugger"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xsl:version="1.0">

  <!-- Load an rdf document -->
  
 <xsl:variable name="doc" 
   select="th:document('http://rdfweb.org/people/danbri/rdfweb/danbri-foaf.rdf')"
   />

<head>

<title>Some people</title></head>

<body>

<h2>People who know people</h2>
<h5>(the happiest people)</h5>

<!-- find people who know people -->

<xsl:for-each select="$doc/foaf:Person[@foaf:knows]">

  <!-- show their name -->

  <h4><xsl:value-of select="./foaf:name/text()"/></h4>
  
  <!-- get people they know who have a name and mailbox -->

  <xsl:for-each select="./foaf:knows/foaf:Person[@foaf:name and @foaf:mbox]">
    
    <p>knows:

    <!-- show the name in a line to their mailbox -->

    <a href="{./foaf:mbox/rdfs:Resource[last()]}"> 
    
    <xsl:value-of select="./foaf:name/text()"/></a>
    
    </p>
  </xsl:for-each>
 </xsl:for-each>

<p>

  <b><i><u>
        
    <!-- show name of person with mbox 'libby...' -->
    <!-- here properties are used as attributes and elements -->

    <xsl:value-of 
      select="$doc/foaf:Person[@foaf:mbox='mailto:libby.miller@bristol.ac.uk']/foaf:name/text()"
     />
  </u></i></b>
   
</p>
</body>
</html>
    

The result is here. (Martin Poulter knows noone because the person he knows has no mbox, btw).

XPath To RDF Query

Although the TreeHugger implementation doesn't work like this, a more efficient implementation could take complete path expressions and translate them to RDF queries. This would be faster over RDB-backed models. (At some point I'll write a little perl or ruby script to do this).

The following are paths translated to squish (minus namespace declarations). The node set becomes an result set with each row containing one variable binding: 'node'.

/foaf:Person/foaf:name/text()

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:name ?a ?node)
/foaf:Person[@foaf:mbox="mailto:pldms@mac.com]/foaf:name/text()

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:mbox ?a mailto:pldms@mac.com)
(foaf:name ?a ?node)
/foaf:Person/inv:foaf:knows/foaf:Person/foaf:mbox/rdfs:Resource

becomes:

SELECT ?node
WHERE
(rdf:type ?a foaf:Person)
(foaf:knows ?b ?a) [note: inverted]
(rdf:type ?b foaf:Person)
(foaf:mbox ?b ?node)
/rdfs:Resource/foaf:knows/foaf:Person/../../foaf:mbox/rdfs:Resource

becomes:

SELECT ?node
WHERE
(foaf:knows ?a ?b) [note: 'a' untyped in path]
(rdf:type ?b foaf:Person)
(foaf:mbox ?a ?node) [note: ../../ takes us back to 'a']