July 28, 2003

Pages about people

A few people have asked how we can use FOAF to express that a page is about a person, or whether there is a way within FOAF to write testimonials about other people (presumably along the lines of various popular dating sites).

The former is easy; the latter isn't something the FOAF vocabulary currently supports. Not because it would have been hard to add, but because it would have opened a can of worms that is best left to ripen a little while longer.

But FOAF does have a way of describing the topics of pages, which is a way of providing similar functionality.

So, let's imagine you've written a whole HTML page all about one of your bestest friends, and you want to share that fact with your world by adding something to your FOAF file. How to do that?

The basic idea is one that recurs throughout FOAF. We use the foaf:topic property, which is a relationship between a Document and something that the document is about. In your FOAF file, you can mention that you know someone, and at same time mention that there's a page you made that's about them. Here's some example markup:


<!-- imagine you're Alice and this is the markup in your FOAF about you -->
<foaf:Person>
  <foaf:nick>Alice</foaf:nick>  
  <foaf:knows rdf:nodeID="bob"/> <!-- a pointer to the 'bob' entry below -->
  <foaf:made rdf:resource=""/>       <!-- this says: I made this FOAF file -->
  <foaf:made rdf:resource="http://alice.example.com/why-bob-is-great.html"/>       
</foaf:Person>

<!-- some markup describing a document, and its topic -->
<foaf:Document rdf:about="http://alice.example.com/why-bob-is-great.html">
  <dc:title>A page all about Bob, by Alice</dc:title>
  <foaf:topic>  
    <!-- here is a chunk of data about bob -->
    <foaf:Person rdf:nodeID="bob">
      <foaf:nick>Bob</foaf:nick> 
      <!-- other stuff about bob could go here -->
    </foaf:Person>
  </foaf:topic>
</foaf:Document>

What this basically says is "There is a Person whose nick is 'Alice' and who made this current FOAF document and who knows a Person whose nick is 'Bob'; there also is a Document whose title is 'A page all about Bob, by Alice' that Alice made and that is about Bob."

So one question facing us in the FOAF design is finding a trade-off between simplicity and expressiveness. It is possible to use FOAF to say some fairly complex things, but we have our work cut out to improve the tools and tutorials that explain how to create FOAF files that say what we want them to say.

Here's a diagram of the above example markup, generated by the W3C RDF validator. It may be useful for illustrating the underlying pattern of relationships we're trying to describe in the FOAF markup...

alice-bob.png
Posted by danbri at 11:59 PM | Comments (68)

Getting started with FOAF

A new FOAF homepage and user site is on its way... (and a new spec!).

The FOAF Project homepage is now at http://www.foaf-project.org/.

There is plenty of work to do on the new site, but now seems as good a time as any to cut across to the new version. The old FOAF homepage had a (somewhat random) collection of links and information on FOAF. A version of that is included in this 'getting started' article; for more comprehensive links to FOAF-related materials, see the collaboratively maintained FoafProject wiki site.

The current www.foaf-project.org site is pretty minimalistic. Nicole Sullivan has been working on some ideas for a more complete FOAF site. One of the things we need to figure out before doing that is a sensible balance between fairly accessible end-user materials (www.foaf-project.org) and the more obscure technical and developer content familiar from rdfweb.org.

The basic recipe for getting started with FOAF is as follows. First, you'll need a basic idea of what FOAF is, and is for. Short version: FOAF is all about creating and using machine-readable homepages that describe people, the links between them and the things they create and do.

For a more in depth version, the evolving FOAF Specification should be the authoritative reference, accompanied by less technical documents such as Edd Dumbill's FOAF intro article. The FOAF FAQ has links to some other media articles, including 'Metadata Mark II' in Web Monkey Magazine which describes FOAF alongside some related efforts.

If you're still interested, the best thing to do next is to make yourself a FOAF file. For this, most people use Leigh Dodd's handy foaf-a-matic tool, which is now available in eight (8!) languages. So make yourself a FOAF file, save it online somewhere, and add an auto-discovery link to it from your homepage so that machines have an easy way to find it. If you mess about with the FOAF markup, eg. to add your own extensions, the W3C RDF Validator is a useful tool for checking the file is still in legal RDF/XML format.

Then what? Well hopefully your data will start to show up in the various FOAF navigators and aggregators, some links to which are included in this article...

Here are a few useful links relating to FOAF: FOAFNaut (and SVG-based navigator), FOAF Explorer (server-based navigator using XSLT bookmarklet), eikeon.com FOAF Web View (another server-based navigator, written in Python), JabFOAF (utilities for integrating FOAF and Jabber).

Some more Links and background... this list is not complete, but repeated here so I can reference it from the new more minimalist FOAF site.

Posted by danbri at 12:46 PM | Comments (156)

July 24, 2003

Missing isn't broken: data validation and freedom on the Semantic Web

Developers who come to the Semantic Web effort via XML technology often make an understandable mistake. They assume that missing is broken when it comes to the contents of RDF/XML documents, that if you omit some piece of information from an RDF file, you have in some formal, technical sense 'done something wrong' and should be punished.

RDF doesn't work like that. Missing isn't broken. In the general case, you are free to say as much, or as little, in your RDF document as you like. RDF vocabularies such as FOAF, Dublin Core, MusicBrainz, RDF-Wordnet don't get to tell you what to do, what to write, what to say. Instead, they serve as an interconnected dictionary documenting the meaning of the terms you're using in your RDF documents.

This article walks through some example FOAF, showing (hopefully!) how the ability to omit and add data is an essential part of the freedom RDF provides. It follows from the recent articles I wrote on contradictions in FOAF and on identification strategies. Like those articles, it is written mostly for developers who are coming to RDF and Semantic Web from a different background, and tries to make explicit some of the design assumptions behind FOAF which haven't yet been made clear.

To take an example from FOAF, the foaf:workplaceHomepage property relates a person to a document that is the homepage of their workplace. The FOAF vocabulary contains markup that explains the basics of this to machines.

"The foaf:workplaceHomepage property has an rdfs:domain of foaf:Person and an rdfs:range of foaf:Document."

What does this mean?! Just that whenever you see an RDF description saying that something has a foaf:workplaceHomepage of something else, you know, by virtue of the meaning of that property, that the first 'something' is expected to be a Person, and the second expected to be a Document. You know that because foaf:workplaceHomepage is a relationship between people and documents. Note that these are expectations about the world and not about XML documents. For it to be true that something is the foaf:workplaceHomepage of someone, it will have to be something that is a document. That's a constraint on the world, not on XML tag structures.

What doesn't it mean? It doesn't mean that all RDF documents which use this property have to spell out explicitly the type of the things the property is relating. RDF leaves a lot of freedom, and doesn't punish document authors for the sin of omission. Often enough, the stuff you miss out could be infered from the things you wrote anyway, so why force document authors to needlessly pad out their RDF.

Another example. The foaf:knows relation is defined as one that relates a foaf:Person to another foaf:Person.

A typical usage would be:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <foaf:Person foaf:name="Edd Dumbill" />
  </foaf:knows>
</foaf:Person>

This basically is RDF's way of saying 'there is something that is a foaf:Person and that has a foaf:name of 'Dan Brickley' and that stands in a foaf:knows relationship to something that is a foaf:Person and that has a foaf:name of 'Edd Dumbill'.

It is important to understand that we can omit pieces of this information, without this RDF/XML being 'broken' (invalid etc.) in any formal sense. Or, perhaps more interestingly, we could add information. From RDF's perspective, you are free to choose. It is not up to the creators of FOAF (or MusicBrainz, or Dublin Core) to dictate to you which things you should or should not mention in your RDF documents.

So, we could write this:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <foaf:Person/>
  </foaf:knows>
</foaf:Person>

...ie. 'Dan knows someone'. Hardly very informative. Probably somewhat annoying, but entirely perfectly correct RDF. Nothing in that markup violates any rules associated with the FOAF vocabulary. Similarly, we could write:

<rdf:Description foaf:name="Dan Brickley">
  <foaf:knows>
     <rdf:Description  foaf:name="Edd Dumbill" />
  </foaf:knows>
</rdf:Description>

...and it is still just fine. The markup 'rdf:Description' is what RDF uses when you mention something but don't happen to mention it's type. So here, we are just saying 'there is something with a foaf:name 'Dan Brickley' that foaf:knows something else with a foaf:name 'Edd Dumbill'. Again, true, but slightly less informative. We didn't mention that these things were people. Although that information could be deduced because we know that foaf:knows relates a foaf:Person to a foaf:Person, it is sometimes helpful to be explicit.

The criticial thing to remember: don't assume it is broken because you don't see 'foaf:Person' in the markup. That's a feature not a bug, as it lies at the heart of RDF's free-form extensibility. Since we want FOAF to be easily extended by independent parties, without breaking the core interop provided by RDF and the basic FOAF vocabulary, this is a freedom to be valued.

Here is another example:

<foaf:Person foaf:name="Dan Brickley">
  <foaf:knows>
     <wordnet:Programmer foaf:name="Edd Dumbill" />
  </foaf:knows>
</foaf:Person>

Here we give a more detailed type than foaf:Person. You can look up wordnet:Programmer in the Web for its RDF definition, which tells us amongst other things that a programmer is "a person who designs and writes and tests computer programs".

So in addition to being able to deduce from the foaf:knows relationship that Edd is a foaf:Person, the markup tells us that his skills include programming. Unlike in a strictly typed OO programming environment, however, Edd can have lots of independently defined 'types', and FOAF files can mention any or none of these according to need and circumstance.

Just as missing out information isn't wrong, nor is adding more information. From an XML perspective, it is both tempting and natural to see this as a too liberal and free-form. XML encourages us to think about this problem in terms of tags: "what tags can appear inside a foaf:knows? is foaf:Person allowed? what about wordnet:Programmer?". Unfortunately that doesn't scale well, since it requires a painful amount of coordination amongst the parties defining these vocabularies.

RDF was designed to expect the unexpected. You don't need anyone's permission to invent new tags, or to have your tags 'go inside' their tags or vice versa. This is hugely liberating, particularly for FOAF because so many problem domains overlap in this space, and life is too short to spend in standardisation committees arguing about XML schemas for dictating whose XML tags enclose whose.

The wordnet vocab wasn't designed to be used inside foaf:knows, and the foaf:knows property wasn't design to have wordnet:Programmer inside it. They were both designed to work within the Resource Description Framework (RDF), an approach to XML data mixing which allows such vocabularies to be freely mixed and combined without having everybody agree 1:1 how their vocabularies may or may not be combined.

So... missing isn't broken.

But it isn't always polite, either. With freedom comes responsibility. RDF and the Semantic Web provide a platform for exchanging XML documents that encode somewhat freeform claims about the world. FOAF was created to help explore practical deployment issues for 'RDF in the wild', and one issue we're currently working on is this balance between freedom and expectations. It is all well and good having a super-flexible way of saying anything-about-anything in FOAF RDF files, but where does that leave us poor developers?

If the stark notion of 'valid -vs- invalid' document checking doesn't make sense in the decentralised Semantic Web environment, how can we make things easier for developers who are trying to work with this free-flowing mix of RDF markup? If nothing is mandatory, then how can they write code that knows what to expect?.

There are several answers here. The first is that, if we want this to scale to the planet, we have to accept that one size won't fit all, that different parties will want to say quite varying things in their FOAF documents, and that our ability to impose our views on their documents is limited.

What can developers take for granted when reading a FOAF file? This is the key question...

They should be able to assume it is wellformed XML+Namespaces, and that it is structured according to the RDF syntax specification. And that it probably makes use of the FOAF vocabulary, typically alongside others such as DC, RSS1, Wordnet...

Beyond that, what should we strive for? Here we move into the world of best practice, etiquette, user guidelines and other forms of 'soft' documentation. In the FOAF world, these are only now beginning to take shape (in the wiki, on rdfweb-dev and in this weblog). For FOAF, this sort of documentation is more important than schema-based validation. In FOAF, you can't get it technically wrong by missing out information, but you can make a nuisance of yourself by writing un-necessarily obscure FOAF.

We are discussing (on the FOAF list, rdfweb-dev) possible 'common subset' properties which it might be reasonable to assume people will use in their FOAF files. These can't be mandatory, but may (alongside tools such as foaf-a-matic) help guide people into using some common core properties.

But even choosing those properties is tricky! How do we name people? (it turned out that in Japan, many users prefer to use an informal foaf:nick, and omit their foaf:name property). How do we identify people? By foaf:mbox, foaf:mbox_sha1sum, foaf:homepage, or an Instant Messenger chatID? The answer seems to be "one or more of the above...".

So the conclusion here is not to look to the evolving FOAF specification for black/white answers about what a FOAF file 'should' contain. The FOAF spec is like a dictionary, specifying the meaning of the things you use in your FOAF, but leaving it up to you and to emerging best practice as to what exactly you write.

In other words, RDF gives us the freedom to say whatever we like in our FOAF files, and we need to compliment the formality of the RDF Schema and Web Ontology language definitions for FOAF (see schema) with better documentation for users and developers helping them make their way in this strange new world...

Posted by danbri at 12:22 PM | Comments (10)

July 13, 2003

Handshake for OSX design notes

Dan Hon is working on a Java/Cocoa based MacOS X desktop tool using FOAF, called 'Handshake'. He's posted some draft user interface screens, all of which looks very promising. I look forward to trying it...

Posted by danbri at 11:59 PM | Comments (0)

FOAFbot and Gnome Dashboard

dashfoafphone-thumb.png Edd Dumbill writes with news of his latest work on FOAFbot, the chat-based FOAF aggregator.

FOAFbot is now based on the 'twisted' framework, opening up possibilities for various new interfaces to the data beyond the current IRC one. This is interesting as it makes it easier for people to adapt FOAFbot, eg. for instance messenger or HTTP interfaces, or to hook it up to alternate front-ends such as foafnaut. Edd's weblog provides more details on all this, including links to source code and a teaser screenshot of FOAFbot data showing up in Gnome Dashboard. Nice work! :)

Posted by danbri at 02:38 PM | Comments (0)

nearestAirport documentation in Japanese

From kota's weblog, details on using 'nearestAirport' in FOAF files. Thanks kota!

The 'contact:nearestAirport' property is a way of indicating very broadly which part(s) of the world you're from, without needing to know exact coordinates or giving away too much detail. It isn't part of the FOAF vocabulary, but can be included as an extension in any FOAF file. A few 'nearestAirport' links: pixel's writeup, swad-e developer map, FOAF people map

See also FOAF overview in Japanese, Japanese FOAF wiki

Posted by danbri at 11:54 AM | Comments (0)

SemaView, Social Networking and FOAF

semafoaf.jpg SemaView have a nice writeup of their work with RDF, Semantic Web and FOAF: Social Networking utilizing the Intelligent Internet

Actually it was published back in March, but I missed the chance to write about it then. Better late than never. The article introduces the basic concepts of the Semantic Web and RDF using their FOAF browser, built using Java and PHP. The article goes on to talk about the potential business value of such work, giving a brief case study of the Ecademy networking site.

The other publications on SemaView's site are worth a read too. They've made an effort to provide a friendly overview of Semantic Web technologies, and to provide a business-oriented perspective as well as a technical one.

Posted by danbri at 10:56 AM | Comments (0)

Updated Autocreation tool from digiboy

Marcus Campbell has updated his FOAF Autocreation tool.

The FOAF autocreation script takes two links - one to your current FOAF file and one to your OPML blogroll - and produces a brand new FOAF file for you by attempting FOAF autodiscovery on every site your OPML file mentions. See the autofoaf page for more details on this tool.

So this is yet more reason to take a few seconds to make an auto-discovery link to your FOAF file in from the <head> of your Web page, to give software tools a hint about how to find it. This auto-discovery syntax is detailed in the FOAF spec, but here it is once more as a reminder:

 <link rel="meta" type="application/rdf+xml" title="FOAF" href="foaf.rdf" />

...where "foaf.rdf" is a relative link to your FOAF file, with the name changed appropriately.

The upcoming TypePad weblog hosting service from Six Apart also makes use of this auto-discovery markup, as does the XML::FOAF Perl software library, so it is increasingly worthwhile adding in the FOAF autodiscovery link.

The basic idea for FOAF auto-discovery is to remove the need to remember (and type in) a separate URL for your FOAF file. Instead of remembering both your homepage and FOAF address, just know the former and make sure it references your FOAF file in a way that software tools can follow. Less work for humans, more work for machines, just as it should be.

Further reading: Autodiscovery page in the FoafProject wiki.

Posted by danbri at 10:16 AM | Comments (1)

FOAF Contradictions

Q: If I can say what I like in a FOAF file, even say nothing, and if I can use any semantic web vocabularies at all, all mixed together, how can we ever know if a FOAF file is 'wrong' (broken, in error)?

A: Which answer do you want...? ;)

One part of the answer relates to the detection of inconsistencies in FOAF data.
In particular, checking for documents that contradict themselves is becoming possible, thanks to our use of W3C's Web Ontology language (OWL).

So I wrote a bit about this in the FOAF wiki, see the FoafContradictions article there. I hope to expand on it with more examples and detail about how OWL works, so am writing in wiki rather than weblog mode this time. It should be readable and hopefully useful now.

A natural topic for further attention would be the discovery of disagreements between documents. That's a rich area to explore, as it combines a variety of techniques, eg. logical (people only have one foaf:dateOfBirth) and statistical (20% of FOAF files think my surname is 'Brinkley', maybe they're right...). This is an important topic as it relates to trust strategies, to dealing with stale / dated information, and to the practical problems inherent in any 'semantic web search engine' efforts. But I didn't write about it yet. Take a look at the FoafContradictions piece and let me know if that's a useful level of detail to attempt...

Posted by danbri at 03:16 AM | Comments (0)

July 10, 2003

Identifying things in FOAF

There is growing interest in FOAF and its relationship to various approaches to "identity management" on the Internet. The FOAF approach to all this is distinctly pluralistic, to the extent that you might not even notice that there is a FOAF way of dealing with identity. There aren't, for example, 'FOAF identifiers' as such, although there is certainly a FOAF approach to identifying things. So this is a first cut at writing up some of the as-yet-unarticulated design assumptions behind FOAF. A more user-friendly version would have examples, those will have to come later.

So here's the basic story. FOAF is built on top of W3C's Resource Description Framework (RDF), which itself uses XML and Unicode as file format standards. All FOAF documents are RDF documents, and any RDF application vocabularies (such as Dublin Core, RSS 1.0 core + extensions, MusicBrainz, Wordnet etc.) can be used within FOAF documents. FOAF shares with RDF a concern to use standard Web identifiers (URIs) wherever possible. The URI specification (RFC 2396) provides a common syntax for naming things on the Web, providing an umbrella concept which covers both 'URLs' and 'URNs'.

To the extent that everything we want to talk about has a well known URI, this solves all our problems. Lots and lots of things that we want to talk about do have URIs. There are URIs for Web pages, for mailboxes, for Java classes, for telephones, for ISBN-registered publications, and so on. This is great - when you want to talk about one of these things in a FOAF file, you just mention its URI. Simple, decentralised, standard.

However our story doesn't end here, FOAF needs to play in a world where we don't all have total knowledge of every relevant fact. Sometimes a thing might 'have' a URI (in some pedantic sense) yet 99% of parties on the Web might not know what that URI is. Or, closer to my main theme, we might want to talk in our FOAF files about things that it has proved peculiarly difficult to get agreement about identifying. People, for example.

Just try setting up a planet-wide system for identifying people and you'll see my point. There is significant resistence to the idea of creating a single set of identifiers used to 'tag' everyone. To put it mildly. So... where does this leave FOAF? FOAF documents are scattered around the Web, and each document makes a unique contribution to a bigger picture which can only be seen when those documents are merged together. In FOAF, we need to identify people, without there being agreement on person-identifiers. Tricky!

So here is the good news. RDF was designed for generic, cross-domain data merging. Imagine taking two arbitrary SQL databases and merging them, so that your new database could answer questions which required knowledge of things which were previously described partially in one dataset, and partially in another. That sort of operation is hard to do, because SQL wasn't designed in a way that makes this easy. Neither was XML. But RDF was, and FOAF is built as an RDF application. In RDF, there are off the shelf software tools which can take RDF documents, 'parse' them into a set of simple 3-part statements (triples) which make claims about the world, and store those statements alongside others in a merged RDF database. To the extent that both datasets use the exact same identifiers when mentioning things they describe, you get a rather handy data-merge effect.

So here is the (not very) bad news. If two different RDF files (eg. FOAF documents) are talking about the same thing but don't use exactly the same URI when mentioning that thing, how are our poor stupid computers supposed to be able to understand? In the real world, we want to write RDF documents (eg. for FOAF) about things that we've not yet agreed on common identifiers for. This is one of the core problems we've had to address in FOAF.

Basically, off the shelf RDF tools can still do a lot to help us, but we have to help them. FOAF, as an application that focusses on the distributed, decentralised, almost out of control use of RDF 'in the wild', ran into this problem after we had about half a dozen FOAF files. There are now hundreds, soon thousands, of FOAF documents. Most of them talk about people, quite successfully, despite the absence of a global person-id registry. This sounds like a recipe for chaos, yet somehow many of our FOAF aggregation tools are quite happy with this situation. They can often figure out when two files are about the self-same thing, without much help from the authors of those documents. We do this using what might be called "reference by description". Instead of saying, "this page was created by urn:global-person-registry:person-n22314151", we say "this page was created by the peson whose (some-property...) is (some-value...)", taking care to use an unambiguous property such as foaf:homepage or foaf:mbox_sha1sum.

Here's how it works. Recall that FOAF is built on top of RDF, and so every FOAF document boils down to nothing more than a set of 3-part statements which relate two things together via terms such as 'workplaceHomepage', 'homepage', 'mbox'.

I am related to those things that are my homepages; FOAF's name for that relationship is 'foaf:homepage'.

I am related to those things that are my personal mailboxes by a relationship FOAF calls 'foaf:mbox'.

I am related to the strings that you get from feeding my mailbox identifiers to the SHA1 mathematical function by a relationship FOAF calls 'foaf:mbox_sha1sum'.

I am related to a myers briggs personality classification, FOAF calls that relationship 'foaf:myersBriggs'.

I am related to my workplace homepage (http://www.w3.org/) by a relationship called -- you guessed it -- 'foaf:workplaceHomepage'.

I am related to my name, 'Dan Brickley' by the 'foaf:name' relationship.

I am related to my AIM chat identifier by a relationship FOAF calls 'foaf:aimChatID'.

And so on. Other RDF vocabularies can define additional relationships (see the FoafVocab entry in our wiki for pointers). They all relate things to other things in named ways. A FOAF document, like any RDF document, is simply a collection of these simple claims about how things in the world relate.

But look again.There is a hidden pattern here. Some of these relationships are special.

foaf:homepage foaf:mbox foaf:mbox_sha1sum foaf:aimChatID fall in one category.

foaf:workplaceHomepage, foaf:myersBriggs, foaf:name fall in another.

Here's the difference. The former kinds of relationship (or 'property' in RDF-talk) have a special characteristic. They have been defined such that there is at most one thing in the world that has any particular value for that property.

There is... at most one thing in the world with any given foaf:homepage. Or foaf:mbox, or foaf:mbox_sha1sum, or foaf:aimChatID. By contrast, there may well be multiple things in the world with the same foaf:workplaceHomepage, or foaf:myersBriggs, or even (it's a big world) foaf:name. Apparently there's another Dan Brickley out there. And lots of my colleagues share my workplace homepage. And there are a lot of people who myers brigg surveys classify as 'INTP' . But there is nobody else at all who has the same foaf:homepage as me, or the same foaf:mbox. Or foaf:aimChatID.

This is one of the design principles underlying FOAF (and for that matter the entire Semantic Web effort): a pragmatic, pluralistic approach to resource description and identification. Rather than building big, centralised registries of people (or companies, or physical things) we look for cheaper, more lightweight shared strategies for identification. In FOAF, we do this by making sure there are multiple ways we can identify things.

So one FOAF file might mention 'here is a photo; it depicts the person whose mailbox is danbri@rdfweb.org'. Another FOAF file might say 'here is a weblog entry written by the person whose homepage is http://rdfweb.org/people/danbri/', a 3rd FOAF file might say, 'here is a chat transcript by the person whose foaf:aimChatID is danbri_2002'. To the extent that there is publically readable RDF in the Web that makes all these claims, and that there is, perhaps scattered around, enough information to deduce that these all describe the same people, RDF /FOAF tools can 'smush' it all together. They could 'realise' that the photo and the weblog and the chat log were all associated with the self-same thing, ie me.

To do that, we need certain pieces of information. We need to know which, of all the kinds of relationship there are, are the uniquely identifying ones. In RDF terminology we call these unambiguous (or more technically, inverse-functional) properties. When RDF software reads the FOAF spec it can determine this from markup embedded in the document itself. So machines can find out quite easily which properties are ones which uniquely identify people. They can do this for the FOAF spec, and for any other RDF vocabulary that is used alongside FOAF.

The other bit of information needed is that somewhere in the Web, it would need to be claimed that there is a person who has a mailbox of ... and a homepage of ... and an aimChatID of ...

If that information is available, then FOAF tools are all set to do the data merge, even though there is no planet-wide unified identification system for people. We don't use anything else except off the shelf standards: URIs plus W3C RDF and OWL technology.

If you find the data merging potential creepy, you are not alone. This kind of technology is not going away, but there are steps you can take. A full discussion of the privacy aspect isn't possible here, but the basic idea is (i) be aware -- scattered information can easily be merged (ii) keep things as secret as they need to be. Don't tell the world (in your FOAF file or elsewhere) all the chat IDs and homepages and mailboxes that you use, then act suprised when people and machines piece together your scattered contributions to the Web. Reading up on PGP might be a good idea.

We don't need to wait for a global identity management system before privacy and data merging becomes an issue. FOAF is intended to explore these issues, and to provide some advance warning for the way certain aspects of semantic web technology may affect our lives. Just as the world has had to adapt to the notion of 'being Googled' and having things that once seemed obscure now all to easily found, the rise of semantic web technology needs to be accompanied by an understanding of the risks and opportunities that 'being identified' presents.

Finally... a couple of points of further reading on the technical rather than social side of this problem. A couple of years ago I wrote a brief note on aggregation strategies which describes the 'smushing' problem. A more recent writeup by Matt Biddulph describing his Java implementation is worth a read too, as are many of the documents from the TAP project, which share FOAF's concern for reference-by-description. Guha and Rob's overview paper sets out the issues very clearly.

Posted by danbri at 12:05 PM | Comments (4)