Developers who come to the Semantic Web effort via XML technology often make an understandable mistake. They assume that missing is broken when it comes to the contents of RDF/XML documents, that if you omit some piece of information from an RDF file, you have in some formal, technical sense 'done something wrong' and should be punished.
RDF doesn't work like that. Missing isn't broken. In the general case, you are free to say as much, or as little, in your RDF document as you like. RDF vocabularies such as FOAF, Dublin Core, MusicBrainz, RDF-Wordnet don't get to tell you what to do, what to write, what to say. Instead, they serve as an interconnected dictionary documenting the meaning of the terms you're using in your RDF documents.
This article walks through some example FOAF, showing (hopefully!) how the ability to omit and add data is an essential part of the freedom RDF provides. It follows from the recent articles I wrote on contradictions in FOAF and on identification strategies. Like those articles, it is written mostly for developers who are coming to RDF and Semantic Web from a different background, and tries to make explicit some of the design assumptions behind FOAF which haven't yet been made clear.
To take an example from FOAF, the foaf:workplaceHomepage property relates a person to a document that is the homepage of their workplace. The FOAF vocabulary contains markup that explains the basics of this to machines.
"The foaf:workplaceHomepage property has an rdfs:domain of foaf:Person and an rdfs:range of foaf:Document."
What does this mean?! Just that whenever you see an RDF description saying that something has a foaf:workplaceHomepage of something else, you know, by virtue of the meaning of that property, that the first 'something' is expected to be a Person, and the second expected to be a Document. You know that because foaf:workplaceHomepage is a relationship between people and documents. Note that these are expectations about the world and not about XML documents. For it to be true that something is the foaf:workplaceHomepage of someone, it will have to be something that is a document. That's a constraint on the world, not on XML tag structures.
What doesn't it mean? It doesn't mean that all RDF documents which use this property have to spell out explicitly the type of the things the property is relating. RDF leaves a lot of freedom, and doesn't punish document authors for the sin of omission. Often enough, the stuff you miss out could be infered from the things you wrote anyway, so why force document authors to needlessly pad out their RDF.
Another example. The foaf:knows relation is defined as one that relates a foaf:Person to another foaf:Person.
A typical usage would be:
<foaf:Person foaf:name="Dan Brickley">
<foaf:knows>
<foaf:Person foaf:name="Edd Dumbill" />
</foaf:knows>
</foaf:Person>
This basically is RDF's way of saying 'there is something that is a foaf:Person and that has a foaf:name of 'Dan Brickley' and that stands in a foaf:knows relationship to something that is a foaf:Person and that has a foaf:name of 'Edd Dumbill'.
It is important to understand that we can omit pieces of this information, without this RDF/XML being 'broken' (invalid etc.) in any formal sense. Or, perhaps more interestingly, we could add information. From RDF's perspective, you are free to choose. It is not up to the creators of FOAF (or MusicBrainz, or Dublin Core) to dictate to you which things you should or should not mention in your RDF documents.
So, we could write this:
<foaf:Person foaf:name="Dan Brickley">
<foaf:knows>
<foaf:Person/>
</foaf:knows>
</foaf:Person>
...ie. 'Dan knows someone'. Hardly very informative. Probably somewhat annoying, but entirely perfectly correct RDF. Nothing in that markup violates any rules associated with the FOAF vocabulary. Similarly, we could write:
<rdf:Description foaf:name="Dan Brickley">
<foaf:knows>
<rdf:Description foaf:name="Edd Dumbill" />
</foaf:knows>
</rdf:Description>
...and it is still just fine. The markup 'rdf:Description' is what RDF uses when you mention something but don't happen to mention it's type. So here, we are just saying 'there is something with a foaf:name 'Dan Brickley' that foaf:knows something else with a foaf:name 'Edd Dumbill'. Again, true, but slightly less informative. We didn't mention that these things were people. Although that information could be deduced because we know that foaf:knows relates a foaf:Person to a foaf:Person, it is sometimes helpful to be explicit.
The criticial thing to remember: don't assume it is broken because you don't see 'foaf:Person' in the markup. That's a feature not a bug, as it lies at the heart of RDF's free-form extensibility. Since we want FOAF to be easily extended by independent parties, without breaking the core interop provided by RDF and the basic FOAF vocabulary, this is a freedom to be valued.
Here is another example:
<foaf:Person foaf:name="Dan Brickley">
<foaf:knows>
<wordnet:Programmer foaf:name="Edd Dumbill" />
</foaf:knows>
</foaf:Person>
Here we give a more detailed type than foaf:Person. You can look up wordnet:Programmer in the Web for its RDF definition, which tells us amongst other things that a programmer is "a person who designs and writes and tests computer programs".
So in addition to being able to deduce from the foaf:knows relationship that Edd is a foaf:Person, the markup tells us that his skills include programming. Unlike in a strictly typed OO programming environment, however, Edd can have lots of independently defined 'types', and FOAF files can mention any or none of these according to need and circumstance.
Just as missing out information isn't wrong, nor is adding more information. From an XML perspective, it is both tempting and natural to see this as a too liberal and free-form. XML encourages us to think about this problem in terms of tags: "what tags can appear inside a foaf:knows? is foaf:Person allowed? what about wordnet:Programmer?". Unfortunately that doesn't scale well, since it requires a painful amount of coordination amongst the parties defining these vocabularies.
RDF was designed to expect the unexpected. You don't need anyone's permission to invent new tags, or to have your tags 'go inside' their tags or vice versa. This is hugely liberating, particularly for FOAF because so many problem domains overlap in this space, and life is too short to spend in standardisation committees arguing about XML schemas for dictating whose XML tags enclose whose.
The wordnet vocab wasn't designed to be used inside foaf:knows, and the foaf:knows property wasn't design to have wordnet:Programmer inside it. They were both designed to work within the Resource Description Framework (RDF), an approach to XML data mixing which allows such vocabularies to be freely mixed and combined without having everybody agree 1:1 how their vocabularies may or may not be combined.
So... missing isn't broken.
But it isn't always polite, either. With freedom comes responsibility. RDF and the Semantic Web provide a platform for exchanging XML documents that encode somewhat freeform claims about the world. FOAF was created to help explore practical deployment issues for 'RDF in the wild', and one issue we're currently working on is this balance between freedom and expectations. It is all well and good having a super-flexible way of saying anything-about-anything in FOAF RDF files, but where does that leave us poor developers?
If the stark notion of 'valid -vs- invalid' document checking doesn't make sense in the decentralised Semantic Web environment, how can we make things easier for developers who are trying to work with this free-flowing mix of RDF markup? If nothing is mandatory, then how can they write code that knows what to expect?.
There are several answers here. The first is that, if we want this to scale to the planet, we have to accept that one size won't fit all, that different parties will want to say quite varying things in their FOAF documents, and that our ability to impose our views on their documents is limited.
What can developers take for granted when reading a FOAF file? This is the key question...
They should be able to assume it is wellformed XML+Namespaces, and that it is structured according to the RDF syntax specification. And that it probably makes use of the FOAF vocabulary, typically alongside others such as DC, RSS1, Wordnet...
Beyond that, what should we strive for? Here we move into the world of best practice, etiquette, user guidelines and other forms of 'soft' documentation. In the FOAF world, these are only now beginning to take shape (in the wiki, on rdfweb-dev and in this weblog). For FOAF, this sort of documentation is more important than schema-based validation. In FOAF, you can't get it technically wrong by missing out information, but you can make a nuisance of yourself by writing un-necessarily obscure FOAF.
We are discussing (on the FOAF list, rdfweb-dev) possible 'common subset' properties which it might be reasonable to assume people will use in their FOAF files. These can't be mandatory, but may (alongside tools such as foaf-a-matic) help guide people into using some common core properties.
But even choosing those properties is tricky! How do we name people? (it turned out that in Japan, many users prefer to use an informal foaf:nick, and omit their foaf:name property). How do we identify people? By foaf:mbox, foaf:mbox_sha1sum, foaf:homepage, or an Instant Messenger chatID? The answer seems to be "one or more of the above...".
So the conclusion here is not to look to the evolving FOAF specification for black/white answers about what a FOAF file 'should' contain. The FOAF spec is like a dictionary, specifying the meaning of the things you use in your FOAF, but leaving it up to you and to emerging best practice as to what exactly you write.
In other words, RDF gives us the freedom to say whatever we like in our FOAF files, and we need to compliment the formality of the RDF Schema and Web Ontology language definitions for FOAF (see schema) with better documentation for users and developers helping them make their way in this strange new world...
Posted by danbri at July 24, 2003 12:22 PM | TrackBackPractically, the issue is whether you can get all the information that you need from the data you have. Often if you are using some form of RDF query over this data, properties will be more important than classes. In the case of foaf Person and knows, we can ask (in Squish): find me the names of people who know me:
select ?person, ?name where
(foaf:knows ?person ?me)
(foaf:mbox ?me mailto:libby.miller@bristol.ac.uk)
(foaf:name ?person ?name)
using foaf for http://xmlns.com/foaf/0.1/
adding a clause (rdf:type ?person foaf:Person) won't add anything to this particular query, because we know from the schema that we will get a foaf:Person at each end of a foaf:knows property.
Some of this conversation is continuing on the FOAF list, see my reply in this thread with Julian Bond
http://rdfweb.org/pipermail/rdfweb-dev/2003-July/011457.html
I've formed a view that poses the question "what is useful" rather than "what is valid". So I guess I'd say "useful" where you say "polite". The thing about claiming "useful" is that it immediately begs the question "for what purpose?". Very sparse data may be useful for a smaller range of purposes than more comprehensive data: it's the creator's call how useful they want their data to be. And of course, other folks may add to sparse data in ways that make it useful for a wider range of purposes.
Monotonicity helps us here, too: adding extra information never detracts from utility that may already be present.
Posted by: Graham Klyne on July 25, 2003 09:50 AMThanks Dan for this clarification. My ranty dig at Libby's FOAF was based on the false assumption that RDF schemas were explicit rather than implicit definitions. I stand corrected and will amend the post with a link to this informative article.
Posted by: Victor Lindesay on July 26, 2003 10:27 AMTo address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.
Posted by: Janikin on January 20, 2004 08:25 AMA variable leads a simple life, full of activity but quite short (measured in nanoseconds, usually). It all begins when the program finds a variable declaration, and a variable is born into the world of the executing program. There are two possible places where the variable might live, but we will venture into that a little later.
Posted by: Juliana on January 20, 2004 08:25 AMThe rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
Posted by: Sander on January 20, 2004 08:25 AMWhen compared to the Stack, the Heap is a simple thing to understand. All the memory that's left over is "in the Heap" (excepting some special cases and some reserve). There is little structure, but in return for this freedom of movement you must create and destroy any boundaries you need. And it is always possible that the heap might simply not have enough space for you.
Posted by: Ebotte on January 20, 2004 08:26 AMThe most basic duality that exists with variables is how the programmer sees them in a totally different way than the computer does. When you're typing away in Project Builder, your variables are normal words smashed together, like software titles from the 80s. You deal with them on this level, moving them around and passing them back and forth.
Posted by: Griffith on January 20, 2004 08:26 AMWhen compared to the Stack, the Heap is a simple thing to understand. All the memory that's left over is "in the Heap" (excepting some special cases and some reserve). There is little structure, but in return for this freedom of movement you must create and destroy any boundaries you need. And it is always possible that the heap might simply not have enough space for you.
Posted by: Harman on January 20, 2004 08:26 AMThis code should compile and run just fine, and you should see no changes in how the program works. So why did we do all of that?
Posted by: Quivier on January 20, 2004 08:26 AMTo address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.
Posted by: Andrew on January 20, 2004 08:26 AMWhen compared to the Stack, the Heap is a simple thing to understand. All the memory that's left over is "in the Heap" (excepting some special cases and some reserve). There is little structure, but in return for this freedom of movement you must create and destroy any boundaries you need. And it is always possible that the heap might simply not have enough space for you.
Posted by: Garret on January 20, 2004 08:26 AMWe can see an example of this in our code we've written so far. In each function's block, we declare variables that hold our data. When each function ends, the variables within are disposed of, and the space they were using is given back to the computer to use. The variables live in the blocks of conditionals and loops we write, but they don't cascade into functions we call, because those aren't sub-blocks, but different sections of code entirely. Every variable we've written has a well-defined lifetime of one function.
Posted by: Conrad on January 20, 2004 08:26 AM