February 06, 2004

new FOAF intro article

Leigh Dodds has written an excellent Introduction to FOAF for XML.com. It gives a good technical overview of the whole project, working through some examples on data-merging and relationship typing. Thanks Leigh!

Posted by danbri at 01:07 AM | Comments (3)

January 01, 2004

FOAF 2004

foafgirl

Welcome to 2004...

Posted by danbri at 03:23 PM | Comments (9)

August 19, 2003

W3C OWL language now a Candidate Recommendation

W3C's OWL language has just advanced to 'Candidate Recommendation' stage. OWL is used to publish and share sets of terms called ontologies; FOAF is one such ontology, as it uses OWL to define its terms in a machine-processable way.

The FOAF project relies on a number of W3C standards, including XML, RDF and the Web Ontology language, OWL. As OWL stabilises, we should benefit from an increasing number of tools that exploit the ability to reason about OWL-based vocabularies such as FOAF.

FOAF makes use of OWL in particular for describing when a certain kind of property (eg. 'homepage', 'mbox', 'dnaChecksum', 'mbox_sha1sum', 'jabberID' etc.) is uniquely identifying. This mechanism underpins the working of many FOAF aggregators and toolkits. FOAF benefits from W3C's work on OWL, since it allows us to adopt standard conventions shared across the Semantic Web community, rather than invent conventions that would only be understood by FOAF tools. Consequently FOAF implementations can make more use of generic RDF-based software, allowing FOAF developers to concentrate on FOAF-related functionality rather than on basic data-merging and storage techniques. This is particularly important to us as FOAF is designed to be freely mixed with other Semantic Web data (eg. geographic, calendar, taxonomy, bibliographic etc.).

See the W3C news item, press release and FAQ for more information on OWL and its relationship to other W3C technologies.

Posted by danbri at 11:52 PM | Comments (51)

Updated FOAF specification

We've just shipped a major update to the FOAF Specification. The spec now provides a lot more detail on the terms ("classes" and "properties") which make up the FOAF vocabulary. There's plenty more work to do, of course, but this is an healthy step in the creeping professionalisation of our documentation. Uniquely identifying properties in FOAF are indicated using W3C's OWL language, making it possible for generic Semantic Web tools to easily merge together scattered, decentralised collections of FOAF descriptions.

The spec includes detailed and cross-referenced documentation for each term, as well as information about its status ('unstable','testing','stable'). The FOAF specification is presented to humans in XHTML form, but is also available in RDF form for machines, drawing upon W3C's RDF Schema and Web Ontology (OWL) languages.

Posted by danbri at 11:37 PM | Comments (7)

August 04, 2003

FOAF 'vs' Introvertster

Things are heating up in the battle for the hearts, minds and contact lists of introverted Web geeks. While an Introvertster craze sweeps through workplaces and homes, FOAF's under-used foaf:myersBriggs support seems to be slowly catching on, revealing a disturbing trend. According to members of the Myers Briggs personality classification cult, individuals branded as IN** (ie INTJ, INFJ,INTP,INFP) should number no more than about 4 or 5% of the population. Yet a recent sampling by RDFWeb's FOAF scutter shows an alarming outburst of introversion amongst FOAF early adopters...

(I'm mostly serious...;-)

While I generally find MBTI a bit contentious, including foaf:myersBriggs has been an interesting exercise and even from the rather small dataset we have currently, there appear to be a distinct leaning towards the IN** side of things.

Dusting off my sociologists hat, there are various hypotheses we could investigate. Maybe FOAF's current audience is mainly geeky early adopters, people who show up as IN** when they go through the annoying-but-probing online questionnaires which try to approximate a full (tm)-compatible MBTI analysis. Maybe the character summaries for INTP are more flattering and ego-stroking than for other 'types' (or at least, more flattering to the kinds of people who show up as INTP in the surveys). Or at least, more likely to evoke a 'heheh, yeah that is sorta like me' reaction. Maybe INTP's like FOAF as a non-extraverted way of describing what they're like, and perhaps this extends to attaching a 'beware I'm an INTP / introverted freak, don't expect me to talk to you' health warning to their machine-readable homepage. Maybe INTP-ish people are the only ones obsessive and nit-picky enough to deal with RDF and XML syntax, while other types have malformed FOAF files which don't get past the RDF parser? Maybe the survey sample is too statistically insignificant to tell us anything.

Enough armchair sociologising about how 4% turned into 50%+, where's the data?

The results mentioned above come from an incomplete (but 24hrs+) run of my FOAF harvester ('scutter') from a couple days ago, ie. are based on a bunch of source-attributed RDF statements stored in a PostgreSQL database. I'll give the code example first, then the results, summarising all the foaf:myersBriggs properties that the database knows about. It uses my rough-n-ready RubyRdf library, and might some day be nicely packaged for others to use. In the meantime, here's the rough idea...

The mbti.rb script:

#!/usr/bin/env ruby
#
# Quick script to query RDF store for Myers Briggs data
$LOAD_PATH.unshift '../lib/'
$LOAD_PATH.unshift '../../lib/'

require 'basicrdf'
require 'squish'
require 'dbi'

query = SquishQuery.new.parseFromText <<EOQ;
    SELECT ?x, ?n, ?mb,
    WHERE
        (foaf::name ?x ?n)
        (foaf::myersBriggs ?x ?mb)
    USING
      foaf for http://xmlns.com/foaf/0.1/
EOQ

seen=Hash.new(false)
stats=Hash.new(0)
log=''
DBI.connect('DBI:Pg:scutter1','danbri','') do |connection|
  connection.select_all(query.toSQLQuery) do |row|
    if !seen[row['x']]
      log += "foaf:name='#{row['n']}' foaf:myersBriggs='#{row['mb']}'\n"
      seen[row['x']]=true
      stats[row['mb']]=stats[row['mb']]+1
    end
  end
end
puts "Statistics:"
stats.each_key do |type| puts "mbti type: #{type} count:#{stats[type]}\n" end
puts "Details:\n"+log

And here's the results of running it just now:

./mbti.rb
Stats:
mbti type: .... count:1
mbti type: ENFP count:1
mbti type: INTP count:11
mbti type: ENTJ count:1
mbti type: ISFJ count:1
mbti type: INFP count:2
mbti type: INTJ count:3
mbti type: ISTP count:1
mbti type: ISFP/ISFJ count:1
mbti type: ENTP count:1
mbti type: ISFP count:1
Details:
foaf:name='Michael Hanscom' foaf:myersBriggs='ISFP'
foaf:name='Bill Bradford' foaf:myersBriggs='ISFJ'
foaf:name='Jay Fienberg' foaf:myersBriggs='INTP'
foaf:name='Hiroki Ono' foaf:myersBriggs='INTP'
foaf:name='Eric Wahlforss' foaf:myersBriggs='ENTP'
foaf:name='Margaret Hart' foaf:myersBriggs='....'
foaf:name='Adam Gessaman' foaf:myersBriggs='INTP'
foaf:name='Simon' foaf:myersBriggs='INTP'
foaf:name='Jack Letourneau' foaf:myersBriggs='INTJ'
foaf:name='Earle Martin' foaf:myersBriggs='INFP'
foaf:name='Atsushi Suzuki' foaf:myersBriggs='INTP'
foaf:name='Evan Williams' foaf:myersBriggs='INTP'
foaf:name='Amy van der Hiel' foaf:myersBriggs='INTJ'
foaf:name='Dave Seidel' foaf:myersBriggs='ISFP/ISFJ'
foaf:name='\343\201\221\343\202\223\343\201\237\343\202\215' foaf:myersBriggs='INTP'
foaf:name='Nicole Sullivan' foaf:myersBriggs='INFP'
foaf:name='Bounty Hunters Guild' foaf:myersBriggs='INTP'
foaf:name='Chun-shek Chan' foaf:myersBriggs='ENFP'
foaf:name='Amy Alford' foaf:myersBriggs='INTJ'
foaf:name='Mark Pilgrim' foaf:myersBriggs='ISTP'
foaf:name='Jim Ley' foaf:myersBriggs='INTP'
foaf:name='Manuel Gonz\303\241lez Mart\303\255nez' foaf:myersBriggs='INTP'
foaf:name='Justin Knol' foaf:myersBriggs='INTP'
foaf:name='Masahide Kanzaki' foaf:myersBriggs='ENTJ'

Not much more to add really, thought this might be a fun mini-tutorial on RDF query. But I would be interested in pointers to other statistical analysis of MBTI data. In particular I reckon it would be worthwhile to feed the raw answers to the online quizes to statisticians, to see if the 16 types proposed by MBTI-advocates actually reflect clusterings within the data...

Posted by danbri at 02:58 PM | Comments (10)

Spiders, scutters, harvesters, indexers, aggregators...

...whatever we call them, FOAF and RDF/XML developers who write Web robots that traverse the Web of inter-linked FOAF files should stick to the rules. Julian Bond has written a very handy overview of the things that need to be taken into account. These include general HTTP and Web indexing stuff like robots.txt, Etags, last-modified headers, HTTP response codes etc. In addition, there are many lessons from the RSS experience which folk writing scutters (FOAF harvesters) should be aware of.

Related reading, see the (not yet a) ScutterSpec in the FOAF wiki, as well as the Scutter entry.

And this is all without (yet) going into the various responsibilities associated with the fact that these systems aren't just aggregating random Web page content or weblog entries, but data about people...

Posted by danbri at 01:44 AM | Comments (18)

Why FOAF has foaf:knows rather than foaf:friend...

xbox screenshot - 'you do not have any friends. continue...'

In brief, foaf:knows was designed to be relatively harmless. It's use (and non-use) should be inoffensive. We dropped foaf:friend and foaf:knowsWell a year or two ago as they proved undeployably awkward, both when used and more interestingly, when not used.

Posted by danbri at 01:23 AM | Comments (12)

July 28, 2003

Pages about people

A few people have asked how we can use FOAF to express that a page is about a person, or whether there is a way within FOAF to write testimonials about other people (presumably along the lines of various popular dating sites).

The former is easy; the latter isn't something the FOAF vocabulary currently supports. Not because it would have been hard to add, but because it would have opened a can of worms that is best left to ripen a little while longer.

But FOAF does have a way of describing the topics of pages, which is a way of providing similar functionality.

So, let's imagine you've written a whole HTML page all about one of your bestest friends, and you want to share that fact with your world by adding something to your FOAF file. How to do that?

The basic idea is one that recurs throughout FOAF. We use the foaf:topic property, which is a relationship between a Document and something that the document is about. In your FOAF file, you can mention that you know someone, and at same time mention that there's a page you made that's about them. Here's some example markup:


<!-- imagine you're Alice and this is the markup in your FOAF about you -->
<foaf:Person>
  <foaf:nick>Alice</foaf:nick>  
  <foaf:knows rdf:nodeID="bob"/> <!-- a pointer to the 'bob' entry below -->
  <foaf:made rdf:resource=""/>       <!-- this says: I made this FOAF file -->
  <foaf:made rdf:resource="http://alice.example.com/why-bob-is-great.html"/>       
</foaf:Person>

<!-- some markup describing a document, and its topic -->
<foaf:Document rdf:about="http://alice.example.com/why-bob-is-great.html">
  <dc:title>A page all about Bob, by Alice</dc:title>
  <foaf:topic>  
    <!-- here is a chunk of data about bob -->
    <foaf:Person rdf:nodeID="bob">
      <foaf:nick>Bob</foaf:nick> 
      <!-- other stuff about bob could go here -->
    </foaf:Person>
  </foaf:topic>
</foaf:Document>

What this basically says is "There is a Person whose nick is 'Alice' and who made this current FOAF document and who knows a Person whose nick is 'Bob'; there also is a Document whose title is 'A page all about Bob, by Alice' that Alice made and that is about Bob."

So one question facing us in the FOAF design is finding a trade-off between simplicity and expressiveness. It is possible to use FOAF to say some fairly complex things, but we have our work cut out to improve the tools and tutorials that explain how to create FOAF files that say what we want them to say.

Here's a diagram of the above example markup, generated by the W3C RDF validator. It may be useful for illustrating the underlying pattern of relationships we're trying to describe in the FOAF markup...

alice-bob.png
Posted by danbri at 11:59 PM | Comments (68)

Getting started with FOAF

A new FOAF homepage and user site is on its way... (and a new spec!).

The FOAF Project homepage is now at http://www.foaf-project.org/.

There is plenty of work to do on the new site, but now seems as good a time as any to cut across to the new version. The old FOAF homepage had a (somewhat random) collection of links and information on FOAF. A version of that is included in this 'getting started' article; for more comprehensive links to FOAF-related materials, see the collaboratively maintained FoafProject wiki site.

The current www.foaf-project.org site is pretty minimalistic. Nicole Sullivan has been working on some ideas for a more complete FOAF site. One of the things we need to figure out before doing that is a sensible balance between fairly accessible end-user materials (www.foaf-project.org) and the more obscure technical and developer content familiar from rdfweb.org.

The basic recipe for getting started with FOAF is as follows. First, you'll need a basic idea of what FOAF is, and is for. Short version: FOAF is all about creating and using machine-readable homepages that describe people, the links between them and the things they create and do.

For a more in depth version, the evolving FOAF Specification should be the authoritative reference, accompanied by less technical documents such as Edd Dumbill's FOAF intro article. The FOAF FAQ has links to some other media articles, including 'Metadata Mark II' in Web Monkey Magazine which describes FOAF alongside some related efforts.

If you're still interested, the best thing to do next is to make yourself a FOAF file. For this, most people use Leigh Dodd's handy foaf-a-matic tool, which is now available in eight (8!) languages. So make yourself a FOAF file, save it online somewhere, and add an auto-discovery link to it from your homepage so that machines have an easy way to find it. If you mess about with the FOAF markup, eg. to add your own extensions, the W3C RDF Validator is a useful tool for checking the file is still in legal RDF/XML format.

Then what? Well hopefully your data will start to show up in the various FOAF navigators and aggregators, some links to which are included in this article...

Here are a few useful links relating to FOAF: FOAFNaut (and SVG-based navigator), FOAF Explorer (server-based navigator using XSLT bookmarklet), eikeon.com FOAF Web View (another server-based navigator, written in Python), JabFOAF (utilities for integrating FOAF and Jabber).

Some more Links and background... this list is not complete, but repeated here so I can reference it from the new more minimalist FOAF site.

Posted by danbri at 12:46 PM | Comments (156)

July 24, 2003

Missing isn't broken: data validation and freedom on the Semantic Web

Developers who come to the Semantic Web effort via XML technology often make an understandable mistake. They assume that missing is broken when it comes to the contents of RDF/XML documents, that if you omit some piece of information from an RDF file, you have in some formal, technical sense 'done something wrong' and should be punished.

RDF doesn't work like that. Missing isn't broken. In the general case, you are free to say as much, or as little, in your RDF document as you like. RDF vocabularies such as FOAF, Dublin Core, MusicBrainz, RDF-Wordnet don't get to tell you what to do, what to write, what to say. Instead, they serve as an interconnected dictionary documenting the meaning of the terms you're using in your RDF documents.

This article walks through some example FOAF, showing (hopefully!) how the ability to omit and add data is an essential part of the freedom RDF provides. It follows from the recent articles I wrote on contradictions in FOAF and on identification strategies. Like those articles, it is written mostly for developers who are coming to RDF and Semantic Web from a different background, and tries to make explicit some of the design assumptions behind FOAF which haven't yet been made clear.

To take an example from FOAF, the foaf:workplaceHomepage property relates a person to a document that is the homepage of their workplace. The FOAF vocabulary contains markup that explains the basics of this to machines.

"The foaf:workplaceHomepage property has an rdfs:domain of foaf:Person and an rdfs:range of foaf:Document."

What does this mean?! Just that whenever you see an RDF description saying that something has a foaf:workplaceHomepage of something else, you know, by virtue of the meaning of that property, that the first 'something' is expected to be a Person, and the second expected to be a Document. You know that because foaf:workplaceHomepage is a relationship between people and documents. Note that these are expectations about the world and not about XML documents. For it to be true that something is the foaf:workplaceHomepage of someone, it will have to be something that is a document. That's a constraint on the world, not on XML tag structures.

What doesn't it mean? It doesn't mean that all RDF documents which use this property have to spell out explicitly the type of the things the property is relating. RDF leaves a lot of freedom, and doesn't punish document authors for the sin of omission. Often enough, the stuff you miss out could be infered from the things you wrote anyway, so why force document authors to needlessly pad out their RDF.

Another example. The foaf:knows relation is defined as one that relates a foaf:Person to another foaf:Person.

A typical usage would be:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <foaf:Person foaf:name="Edd Dumbill" />
  </foaf:knows>
</foaf:Person>

This basically is RDF's way of saying 'there is something that is a foaf:Person and that has a foaf:name of 'Dan Brickley' and that stands in a foaf:knows relationship to something that is a foaf:Person and that has a foaf:name of 'Edd Dumbill'.

It is important to understand that we can omit pieces of this information, without this RDF/XML being 'broken' (invalid etc.) in any formal sense. Or, perhaps more interestingly, we could add information. From RDF's perspective, you are free to choose. It is not up to the creators of FOAF (or MusicBrainz, or Dublin Core) to dictate to you which things you should or should not mention in your RDF documents.

So, we could write this:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <foaf:Person/>
  </foaf:knows>
</foaf:Person>

...ie. 'Dan knows someone'. Hardly very informative. Probably somewhat annoying, but entirely perfectly correct RDF. Nothing in that markup violates any rules associated with the FOAF vocabulary. Similarly, we could write:

<rdf:Description foaf:name="Dan Brickley">
  <foaf:knows>
     <rdf:Description  foaf:name="Edd Dumbill" />
  </foaf:knows>
</rdf:Description>

...and it is still just fine. The markup 'rdf:Description' is what RDF uses when you mention something but don't happen to mention it's type. So here, we are just saying 'there is something with a foaf:name 'Dan Brickley' that foaf:knows something else with a foaf:name 'Edd Dumbill'. Again, true, but slightly less informative. We didn't mention that these things were people. Although that information could be deduced because we know that foaf:knows relates a foaf:Person to a foaf:Person, it is sometimes helpful to be explicit.

The criticial thing to remember: don't assume it is broken because you don't see 'foaf:Person' in the markup. That's a feature not a bug, as it lies at the heart of RDF's free-form extensibility. Since we want FOAF to be easily extended by independent parties, without breaking the core interop provided by RDF and the basic FOAF vocabulary, this is a freedom to be valued.

Here is another example:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <wordnet:Programmer foaf:name="Edd Dumbill" />
  </foaf:knows>
</foaf:Person>

Here we give a more detailed type than foaf:Person. You can look up wordnet:Programmer in the Web for its RDF definition, which tells us amongst other things that a programmer is "a person who designs and writes and tests computer programs".

So in addition to being able to deduce from the foaf:knows relationship that Edd is a foaf:Person, the markup tells us that his skills include programming. Unlike in a strictly typed OO programming environment, however, Edd can have lots of independently defined 'types', and FOAF files can mention any or none of these according to need and circumstance.

Just as missing out information isn't wrong, nor is adding more information. From an XML perspective, it is both tempting and natural to see this as a too liberal and free-form. XML encourages us to think about this problem in terms of tags: "what tags can appear inside a foaf:knows? is foaf:Person allowed? what about wordnet:Programmer?". Unfortunately that doesn't scale well, since it requires a painful amount of coordination amongst the parties defining these vocabularies.

RDF was designed to expect the unexpected. You don't need anyone's permission to invent new tags, or to have your tags 'go inside' their tags or vice versa. This is hugely liberating, particularly for FOAF because so many problem domains overlap in this space, and life is too short to spend in standardisation committees arguing about XML schemas for dictating whose XML tags enclose whose.

The wordnet vocab wasn't designed to be used inside foaf:knows, and the foaf:knows property wasn't design to have wordnet:Programmer inside it. They were both designed to work within the Resource Description Framework (RDF), an approach to XML data mixing which allows such vocabularies to be freely mixed and combined without having everybody agree 1:1 how their vocabularies may or may not be combined.

So... missing isn't broken.

But it isn't always polite, either. With freedom comes responsibility. RDF and the Semantic Web provide a platform for exchanging XML documents that encode somewhat freeform claims about the world. FOAF was created to help explore practical deployment issues for 'RDF in the wild', and one issue we're currently working on is this balance between freedom and expectations. It is all well and good having a super-flexible way of saying anything-about-anything in FOAF RDF files, but where does that leave us poor developers?

If the stark notion of 'valid -vs- invalid' document checking doesn't make sense in the decentralised Semantic Web environment, how can we make things easier for developers who are trying to work with this free-flowing mix of RDF markup? If nothing is mandatory, then how can they write code that knows what to expect?.

There are several answers here. The first is that, if we want this to scale to the planet, we have to accept that one size won't fit all, that different parties will want to say quite varying things in their FOAF documents, and that our ability to impose our views on their documents is limited.

What can developers take for granted when reading a FOAF file? This is the key question...

They should be able to assume it is wellformed XML+Namespaces, and that it is structured according to the RDF syntax specification. And that it probably makes use of the FOAF vocabulary, typically alongside others such as DC, RSS1, Wordnet...

Beyond that, what should we strive for? Here we move into the world of best practice, etiquette, user guidelines and other forms of 'soft' documentation. In the FOAF world, these are only now beginning to take shape (in the wiki, on rdfweb-dev and in this weblog). For FOAF, this sort of documentation is more important than schema-based validation. In FOAF, you can't get it technically wrong by missing out information, but you can make a nuisance of yourself by writing un-necessarily obscure FOAF.

We are discussing (on the FOAF list, rdfweb-dev) possible 'common subset' properties which it might be reasonable to assume people will use in their FOAF files. These can't be mandatory, but may (alongside tools such as foaf-a-matic) help guide people into using some common core properties.

But even choosing those properties is tricky! How do we name people? (it turned out that in Japan, many users prefer to use an informal foaf:nick, and omit their foaf:name property). How do we identify people? By foaf:mbox, foaf:mbox_sha1sum, foaf:homepage, or an Instant Messenger chatID? The answer seems to be "one or more of the above...".

So the conclusion here is not to look to the evolving FOAF specification for black/white answers about what a FOAF file 'should' contain. The FOAF spec is like a dictionary, specifying the meaning of the things you use in your FOAF, but leaving it up to you and to emerging best practice as to what exactly you write.

In other words, RDF gives us the freedom to say whatever we like in our FOAF files, and we need to compliment the formality of the RDF Schema and Web Ontology language definitions for FOAF (see schema) with better documentation for users and developers helping them make their way in this strange new world...

Posted by danbri at 12:22 PM | Comments (10)

July 13, 2003

nearestAirport documentation in Japanese

From kota's weblog, details on using 'nearestAirport' in FOAF files. Thanks kota!

The 'contact:nearestAirport' property is a way of indicating very broadly which part(s) of the world you're from, without needing to know exact coordinates or giving away too much detail. It isn't part of the FOAF vocabulary, but can be included as an extension in any FOAF file. A few 'nearestAirport' links: pixel's writeup, swad-e developer map, FOAF people map

See also FOAF overview in Japanese, Japanese FOAF wiki

Posted by danbri at 11:54 AM | Comments (0)

SemaView, Social Networking and FOAF

semafoaf.jpg SemaView have a nice writeup of their work with RDF, Semantic Web and FOAF: Social Networking utilizing the Intelligent Internet

Actually it was published back in March, but I missed the chance to write about it then. Better late than never. The article introduces the basic concepts of the Semantic Web and RDF using their FOAF browser, built using Java and PHP. The article goes on to talk about the potential business value of such work, giving a brief case study of the Ecademy networking site.

The other publications on SemaView's site are worth a read too. They've made an effort to provide a friendly overview of Semantic Web technologies, and to provide a business-oriented perspective as well as a technical one.

Posted by danbri at 10:56 AM | Comments (0)

FOAF Contradictions

Q: If I can say what I like in a FOAF file, even say nothing, and if I can use any semantic web vocabularies at all, all mixed together, how can we ever know if a FOAF file is 'wrong' (broken, in error)?

A: Which answer do you want...? ;)

One part of the answer relates to the detection of inconsistencies in FOAF data.
In particular, checking for documents that contradict themselves is becoming possible, thanks to our use of W3C's Web Ontology language (OWL).

So I wrote a bit about this in the FOAF wiki, see the FoafContradictions article there. I hope to expand on it with more examples and detail about how OWL works, so am writing in wiki rather than weblog mode this time. It should be readable and hopefully useful now.

A natural topic for further attention would be the discovery of disagreements between documents. That's a rich area to explore, as it combines a variety of techniques, eg. logical (people only have one foaf:dateOfBirth) and statistical (20% of FOAF files think my surname is 'Brinkley', maybe they're right...). This is an important topic as it relates to trust strategies, to dealing with stale / dated information, and to the practical problems inherent in any 'semantic web search engine' efforts. But I didn't write about it yet. Take a look at the FoafContradictions piece and let me know if that's a useful level of detail to attempt...

Posted by danbri at 03:16 AM | Comments (0)

July 10, 2003

Identifying things in FOAF

There is growing interest in FOAF and its relationship to various approaches to "identity management" on the Internet. The FOAF approach to all this is distinctly pluralistic, to the extent that you might not even notice that there is a FOAF way of dealing with identity. There aren't, for example, 'FOAF identifiers' as such, although there is certainly a FOAF approach to identifying things. So this is a first cut at writing up some of the as-yet-unarticulated design assumptions behind FOAF. A more user-friendly version would have examples, those will have to come later.

So here's the basic story. FOAF is built on top of W3C's Resource Description Framework (RDF), which itself uses XML and Unicode as file format standards. All FOAF documents are RDF documents, and any RDF application vocabularies (such as Dublin Core, RSS 1.0 core + extensions, MusicBrainz, Wordnet etc.) can be used within FOAF documents. FOAF shares with RDF a concern to use standard Web identifiers (URIs) wherever possible. The URI specification (RFC 2396) provides a common syntax for naming things on the Web, providing an umbrella concept which covers both 'URLs' and 'URNs'.

To the extent that everything we want to talk about has a well known URI, this solves all our problems. Lots and lots of things that we want to talk about do have URIs. There are URIs for Web pages, for mailboxes, for Java classes, for telephones, for ISBN-registered publications, and so on. This is great - when you want to talk about one of these things in a FOAF file, you just mention its URI. Simple, decentralised, standard.

However our story doesn't end here, FOAF needs to play in a world where we don't all have total knowledge of every relevant fact. Sometimes a thing might 'have' a URI (in some pedantic sense) yet 99% of parties on the Web might not know what that URI is. Or, closer to my main theme, we might want to talk in our FOAF files about things that it has proved peculiarly difficult to get agreement about identifying. People, for example.

Just try setting up a planet-wide system for identifying people and you'll see my point. There is significant resistence to the idea of creating a single set of identifiers used to 'tag' everyone. To put it mildly. So... where does this leave FOAF? FOAF documents are scattered around the Web, and each document makes a unique contribution to a bigger picture which can only be seen when those documents are merged together. In FOAF, we need to identify people, without there being agreement on person-identifiers. Tricky!

So here is the good news. RDF was designed for generic, cross-domain data merging. Imagine taking two arbitrary SQL databases and merging them, so that your new database could answer questions which required knowledge of things which were previously described partially in one dataset, and partially in another. That sort of operation is hard to do, because SQL wasn't designed in a way that makes this easy. Neither was XML. But RDF was, and FOAF is built as an RDF application. In RDF, there are off the shelf software tools which can take RDF documents, 'parse' them into a set of simple 3-part statements (triples) which make claims about the world, and store those statements alongside others in a merged RDF database. To the extent that both datasets use the exact same identifiers when mentioning things they describe, you get a rather handy data-merge effect.

So here is the (not very) bad news. If two different RDF files (eg. FOAF documents) are talking about the same thing but don't use exactly the same URI when mentioning that thing, how are our poor stupid computers supposed to be able to understand? In the real world, we want to write RDF documents (eg. for FOAF) about things that we've not yet agreed on common identifiers for. This is one of the core problems we've had to address in FOAF.

Basically, off the shelf RDF tools can still do a lot to help us, but we have to help them. FOAF, as an application that focusses on the distributed, decentralised, almost out of control use of RDF 'in the wild', ran into this problem after we had about half a dozen FOAF files. There are now hundreds, soon thousands, of FOAF documents. Most of them talk about people, quite successfully, despite the absence of a global person-id registry. This sounds like a recipe for chaos, yet somehow many of our FOAF aggregation tools are quite happy with this situation. They can often figure out when two files are about the self-same thing, without much help from the authors of those documents. We do this using what might be called "reference by description". Instead of saying, "this page was created by urn:global-person-registry:person-n22314151", we say "this page was created by the peson whose (some-property...) is (some-value...)", taking care to use an unambiguous property such as foaf:homepage or foaf:mbox_sha1sum.

Here's how it works. Recall that FOAF is built on top of RDF, and so every FOAF document boils down to nothing more than a set of 3-part statements which relate two things together via terms such as 'workplaceHomepage', 'homepage', 'mbox'.

I am related to those things that are my homepages; FOAF's name for that relationship is 'foaf:homepage'.

I am related to those things that are my personal mailboxes by a relationship FOAF calls 'foaf:mbox'.

I am related to the strings that you get from feeding my mailbox identifiers to the SHA1 mathematical function by a relationship FOAF calls 'foaf:mbox_sha1sum'.

I am related to a myers briggs personality classification, FOAF calls that relationship 'foaf:myersBriggs'.

I am related to my workplace homepage (http://www.w3.org/) by a relationship called -- you guessed it -- 'foaf:workplaceHomepage'.

I am related to my name, 'Dan Brickley' by the 'foaf:name' relationship.

I am related to my AIM chat identifier by a relationship FOAF calls 'foaf:aimChatID'.

And so on. Other RDF vocabularies can define additional relationships (see the FoafVocab entry in our wiki for pointers). They all relate things to other things in named ways. A FOAF document, like any RDF document, is simply a collection of these simple claims about how things in the world relate.

But look again.There is a hidden pattern here. Some of these relationships are special.

foaf:homepage foaf:mbox foaf:mbox_sha1sum foaf:aimChatID fall in one category.

foaf:workplaceHomepage, foaf:myersBriggs, foaf:name fall in another.

Here's the difference. The former kinds of relationship (or 'property' in RDF-talk) have a special characteristic. They have been defined such that there is at most one thing in the world that has any particular value for that property.

There is... at most one thing in the world with any given foaf:homepage. Or foaf:mbox, or foaf:mbox_sha1sum, or foaf:aimChatID. By contrast, there may well be multiple things in the world with the same foaf:workplaceHomepage, or foaf:myersBriggs, or even (it's a big world) foaf:name. Apparently there's another Dan Brickley out there. And lots of my colleagues share my workplace homepage. And there are a lot of people who myers brigg surveys classify as 'INTP' . But there is nobody else at all who has the same foaf:homepage as me, or the same foaf:mbox. Or foaf:aimChatID.

This is one of the design principles underlying FOAF (and for that matter the entire Semantic Web effort): a pragmatic, pluralistic approach to resource description and identification. Rather than building big, centralised registries of people (or companies, or physical things) we look for cheaper, more lightweight shared strategies for identification. In FOAF, we do this by making sure there are multiple ways we can identify things.

So one FOAF file might mention 'here is a photo; it depicts the person whose mailbox is danbri@rdfweb.org'. Another FOAF file might say 'here is a weblog entry written by the person whose homepage is http://rdfweb.org/people/danbri/', a 3rd FOAF file might say, 'here is a chat transcript by the person whose foaf:aimChatID is danbri_2002'. To the extent that there is publically readable RDF in the Web that makes all these claims, and that there is, perhaps scattered around, enough information to deduce that these all describe the same people, RDF /FOAF tools can 'smush' it all together. They could 'realise' that the photo and the weblog and the chat log were all associated with the self-same thing, ie me.

To do that, we need certain pieces of information. We need to know which, of all the kinds of relationship there are, are the uniquely identifying ones. In RDF terminology we call these unambiguous (or more technically, inverse-functional) properties. When RDF software reads the FOAF spec it can determine this from markup embedded in the document itself. So machines can find out quite easily which properties are ones which uniquely identify people. They can do this for the FOAF spec, and for any other RDF vocabulary that is used alongside FOAF.

The other bit of information needed is that somewhere in the Web, it would need to be claimed that there is a person who has a mailbox of ... and a homepage of ... and an aimChatID of ...

If that information is available, then FOAF tools are all set to do the data merge, even though there is no planet-wide unified identification system for people. We don't use anything else except off the shelf standards: URIs plus W3C RDF and OWL technology.

If you find the data merging potential creepy, you are not alone. This kind of technology is not going away, but there are steps you can take. A full discussion of the privacy aspect isn't possible here, but the basic idea is (i) be aware -- scattered information can easily be merged (ii) keep things as secret as they need to be. Don't tell the world (in your FOAF file or elsewhere) all the chat IDs and homepages and mailboxes that you use, then act suprised when people and machines piece together your scattered contributions to the Web. Reading up on PGP might be a good idea.

We don't need to wait for a global identity management system before privacy and data merging becomes an issue. FOAF is intended to explore these issues, and to provide some advance warning for the way certain aspects of semantic web technology may affect our lives. Just as the world has had to adapt to the notion of 'being Googled' and having things that once seemed obscure now all to easily found, the rise of semantic web technology needs to be accompanied by an understanding of the risks and opportunities that 'being identified' presents.

Finally... a couple of points of further reading on the technical rather than social side of this problem. A couple of years ago I wrote a brief note on aggregation strategies which describes the 'smushing' problem. A more recent writeup by Matt Biddulph describing his Java implementation is worth a read too, as are many of the documents from the TAP project, which share FOAF's concern for reference-by-description. Guha and Rob's overview paper sets out the issues very clearly.

Posted by danbri at 12:05 PM | Comments (4)

June 25, 2003

Using foaf:weblog in your FOAF file

Here's how to add a foaf:weblog property to a FOAF document.

Anywhere there is an element describing a Person (or for that matter a company, group etc), you can include a sub-element that mentions their weblog(s):

So, if you start out with markup like this:

(Nicole just asked me how to do this, so she gets to be the example...)

<foaf:Person>
  <foaf:name>Nicole Sullivan</foaf:name>
  <foaf:homepage rdf:resource="http://www.apocalypse.org/~nicole/"/>
</foaf:Person>

...and add a weblog property of the Person described, it should look like this:

<foaf:Person>
  <foaf:name>Nicole Sullivan</foaf:name>
  <foaf:homepage rdf:resource="http://www.apocalypse.org/~nicole/"/>
  <foaf:weblog rdf:resource="http://www.stubbornella.org/"/>
</foaf:Person>

It doesn't matter exactly where you put the foaf:weblog entry, so long as it is immediately 'inside' the foaf:Person element. You can have the foaf:homepage bit first, or foaf:img, foaf:workplaceHomepage etc. there too., all alongside each other in any order.

This same technique works if you are describing your friends and collaborators in a FOAF file; just add in a foaf:weblog property inside the foaf:Person section that describes them.

That's all there is to it. Your FOAF file now describes the address of your weblog, making it easier to find for FOAF and other Semantic Web tools.

Note that If you have multiple weblogs, list them each separately, one after another. Note also that each foaf:weblog described in FOAF has to be the weblog of one 'thing' (whether Person, Company or whatever). So describing collaboratively edited weblogs is a topic I'll come back to in a future article.

Posted by danbri at 11:48 AM | Comments (0)

June 22, 2003

A purpose of FOAF

...is to engineer more coincidences in the world. There are several reasons for FOAF, and this is one that hasn't yet been documented. After my mostly unexpected involvement in the W3C Semantic Web Tour as it hit London, I was chatting to Matt Biddulph about FOAF and the 'vapourware for the masses' thing, and it occured to me that I'd never written this up. FOAF was designed as technology to encourage coincidence. You're walking past a pub... you go to a conference... you're standing at the barracades... or sitting in an interview... and the last thing you'd expect... a friend of a friend. Everything's connected. Who'd have thought it?

To pull this off, we need to ground FOAF in the real word. Foaffinger (a wireless / rendevous FOAF detector), RDF GeoInfo, Bluetooth and more are all part of the picture. Every new gadget, every new dataset, makes the unexpected more expected.

Posted by danbri at 12:41 AM | Comments (4)

June 19, 2003

FOAF introduction in Japanese

Masahide Kanzaki has announced a new introduction to FOAF, written in Japanese. It is quite detailed, and illustrated with examples, and figures, covering recent additions to the FOAF vocabulary. And, like the rest of his Semantic Web and RDF site, it looks glorious. I only wish I could read Japanese; the babelfish translation is a poor substitute. FOAF may now be better documented in Japanese than it is in English!

Masahide Kanzaki's FOAF document is also of technical interest, since it references an XSLT transformation which, in modern Web browsers, generates an XHTML page. (IE5/mac viewers should be warned that it crashed my browser; Mozilla seems happy, by contrast).

Japanese readers may also find these other RDF-related links useful.

Posted by danbri at 02:42 PM | Comments (0)

April 27, 2003

FOAF and weblogs

Three things about FOAF and weblogs! Firstly, Ben Hammersley has written a piece for the Guardian on the latest project of the Six Apart folks behind the Moveable Type weblog publishing system. They're launching a new service, TypePad providing what looks to be a very full-featured hosting service. While MT is pretty easy to install, TypePad looks pretty cool. As well as rumoured FOAF support, it has a built-in photo album facility. The combination of those two could be quite interesting...

Oh, the other two FOAF/weblog things: I wanted to announce the existence of the foaf:weblog property a bit more widely. FOAF now has the ability to represent the address of your weblog, clearing the way for FOAF aggregators to support queries like "Find me weblogs of people who... (work for / live in / etc...)", matching against any of the other FOAF properties listed.

Finally, it's time to finish setting up the weblog we're running at http://rdfweb.org/ for FOAF and related projects. This mostly involves tweaking our MT installation and adding back the navigation and suchlike...

Posted by danbri at 11:22 PM | Comments (2)

March 15, 2003

A few new foaf things

From a chat on #foaf yesterday: photo RDF, codepiction changes, foaffinger and meeting Eikeon; foaf and web view stats and for CVS; a new design for foafnaut homepage by ephi :)

Posted by libby at 04:48 PM | Comments (0)

February 13, 2003

Wobbly

The front page of rdfweb.org is somewhat wobbly this week. I'm in the middle of moving it across to use the Movable Type weblog, instead of hand-coded HTML. Sorry things are a bit rough. We're getting there slowly!

Posted by danbri at 10:41 PM | Comments (0)

January 30, 2003

rdfweb-dev mailing list moved!

The rdfweb-dev / FOAF mailing list has a new home on rdfweb.org. We have moved across the old archives from YahooGroups, but you will need to resubscribe by hand.

Posted by danbri at 04:28 PM | Comments (0)

January 26, 2003

New RDFWeb server

If all goes well, this post will show up on the new Web server.

Posted by danbri at 10:39 PM | Comments (0)