# ETech 2004: Foaf session # # Dan Brickley - fear of a foaf planet # # abstract: http://conferences.oreillynet.com/cs/et2004/view/e_sess/4757 # # a pretty verbatim transcript, dropped some of the 'kindas' and suchlike] # thanks to dav for the movie. # # danbri@rdfweb.org # # todo: transcribe Q+A, Edd's foafbot talk, link to slides etc. # ####################################################################### Ok so i'm talking about this FOAF project -- friend of a friend project -- Its been going behind the scenes since around 2000. I'm here with 2 hats on really ... here as a W3C representative, where i spend my dayjob working on formal web standards... but i'm talking about a side project i've been working on with friends: "friend of a friend". [audience: try talking louder! etc.] friend of a friend project i just want a show of hands thing at the start how many of you have heard of xml? [lots] ...of rdf? [respectable] of foaf? interesting. how many of you have got foaf files? ...so, i'm suprised, that's a lot of files. So, what I really want to do is too many things at once here. I want to talk a bit about FOAF as a technology... and most of its technical characteristics are inherited from its use of RDF, and its use of XML. I want to talk a bit about its social impact, and probably not enough about either to really get into the interesting details. So... what is this foaf project? what can we do with FOAF files? and why do so many people seem to find it interesting? how do the technical and social aspects interrelate...? and where are we up to? The basic idea is pretty simple. It's machine readable homepages. Initially homepages for people, but also companies, organisations, anything that uses the Web. It's an exploration of the idea of a Semantic Web, a Web of machine-readable pages. It's an experiment in 'just doing it' really. A few years ago... the first FOAF file was my homepage... the second FOAF file was my friend Libby's home page, and its just gone from there. Last week I heard that LiveJournal were about to switch on FOAF export, another 2 million FOAF pages about to be on the Web. So... it's interesting times... The two core concepts I think with FOAF are this notion of FOAF files or FOAF profiles, RDF files in the Web. And the notion of FOAF as a vocabulary, as a kind of dictionary of terms you can use to say things in FOAF files. So FOAF gives you markup for saying things like -- if you're talking about People -- their mailboxes, their homepages... For saying of a person what their workplaceHomepage is, which is a kind of nerdy way of saying where they work. A way of saying where they work that is quite easy for computers to deal with, so you can run queries against it to say, 'find me the weblogs of people who work for the place whose homepage is w3.org'. The really interesting with FOAF is the really interesting thing with the Web: connectivity. That my FOAF file has a reference to Libby's FOAF file, which has a reference to Edd's, and so on and so on. So you can feed these things to a harvester, just in the same way that traditional harvesters traverse from page to page, but harvesters are indexing machine-readable assertions about the world, rather than just a list of words from human language. So just to give a quick example... It's a bunch of angle brackets. It's some XML. I took this from a very nice article on XML.com that Leigh Dodds wrote last week... it says, this person, Peter Parker, [...] has a foaf:gender property of 'male', a foaf:title of 'Mr', it has given name and family name. It gives a hash of Peter Parker's mailbox, which is a kind of sneaky way identifying this person in the absence of a planet-wide identifier scheme, so, this person, with these details... they have a homepage, weblog, and they know another person... this markup at the bottom here. Peter Parker knows Harry Osborn; Harry Osborn has such'n'so homepage, and with respect to Harry, see-also this RDF file over here. And it's this little bit in bold that's really made this an interesting project to me. Once you've got that, that capacity for linking amongst machine-readable files, you've got the basis for scooping it all up and pushing it into a database. And what do you get when you do that? This [foafnaut slide] this is one of several visualisations and user interfaces that people have built with FOAF. Edd's going to talk about a textmode interface later. Just to give you a sense of the kind of data we've got. People, connected to people, by a variety of named relationships. People described in a variety of machine-readable files, and then arbitrary other characteristics of those people. And this is the really fun thing in terms of the structure of FOAF... we didn't in the FOAF specification nail down once and for all, what it is you can say about people. So, each of the people described can say as much or as little about themselves, about their colleagues and friends, as they choose to. So what we're really trying to do here, was to take a thought experiment and roll it out and see what happens. What would it be like if machines could read our homepages? And, not wanting to wait for Artificial Intelligence or Natural Language Processing and so on, we took the approach of "dumbing down" self-descriptions to a level that is well suited to machine processing. So the next obvious question there is: well, what do we lose when we dumb down the subtleties of interpersonal relationships, the subtleties of self-description, into a machine-friendly form. This debate has come up again in the "social networking" discussions, where people present themselves through Friendster, through Orkut, for example. And from a more geeky perspective, what might search engines evolve into, as we move from indexing the words in a page to indexing claims about the world. So we're trying to sneak up on the big hard problems, through a very simple technology. It is at heart just a file you put on a Web site, or a file that someone exposes from a service on your behalf. I've got some kinda wordy examples here, of the kind of things you can say in FOAF. I can say... simplistic atomic statements about myself and other people. I can say, "I'm Dan, and I work at W3C, I know Libby who works at ILRT, and her FOAF profile is blah-blah-blah over here. And here are the titles and descriptions of some documents we wrote together". Or I can say, "here's a photo I use on my homepage. Here's a bunch of other photos. And here are the people in those photos." I can use any FOAF extension at all in these descriptions. And "FOAF extension" in this sense is any other RDF vocabulary that people have created for use within the RDF framework. So there are... maybe some of you here have heard of MusicBrainz, or Dublin Core, or Creative Commons. Each of those RDF applications gives you terms that you can plug into these descriptions. So if I want to talk about rights over these documents, use Creative Commons. If I want to talk about music, eg. to say that "I really like Massive Attack", I use MusicBrainz. We try not to do too much in FOAF, and to come up with an architecture where we can plug in other people's work. So... my dayjob is acronyms. This is the acronym view. FOAF uses XML as a data format. It uses RDF as a data model, a set of conventions over the top of XML. It also uses this thing called OWL, the Web Ontology Language. I'm very pleased to say that both OWL and RDF became W3C recommendations yesterday, which is [clapping!] ... it's so nice to be able to say that, it's been years... So, this is a practical application of RDF and a practical application of OWL. The current FOAF spec defines 50 or so terms for talking about the world, for making very simple claims. And we use OWL [...] for the following reasons. RDF, in a sense, guarantees the freedom of indepdent extension. Because we're describing people, because people are such interesting, complex, political beasts, there is no way that a single spec, written by a bunch of primarily technically oriented people, is really going to ever do a complete job of capturing the things you'd want to say of people. So what we try to do instead, was find a way of using RDF, so that other people's descriptive concerns could be plugged in there. So RDF guarantees that. It guarantees that a FOAF file can have arbitrary other ways of talking about people in the file. OWL provides us also with something pretty important. Algorithms for data merging. If you think about the problem of independent parties scattered around the Web, trying to describe each other. There is no planet-wide identification system for people. There is no planet-wide identification system for companies, ... Despite that, what we need to do is to be able to fold data together from multiple sources, and figure out when they're talking about the same things. Without getting into the detail, that's really what OWL gives us a lot of off-the-shelf tools for. A quick recap. It's a new kind of Web page; it's a Web page for machines. We try to echo the freedom and flexibility you have in your own homepage, in machine-readable form... that you might intuitively think that because it's machine-readable it's going to be kind of stilted, static, rigid, and ugly. It's a very low-tech approach, it's like RSS in the sense that, in its simplest form, it's just a page you put on a Web site. And we've got quite a long way with just putting pages on Web sites. But we run into issues that allow us to explore the harder things. So we've been PGP-signing these pages, for example, or encrypting them. It's a Web of files describing people, and webs of people. I don't have time to get into the nitty-gritty of these issues, but the kind of things we've run into include the need to plan for lying, to plan for people being mischievous... If you remember the fuss at Friendster about "Fakesters". I don't know how many of you followed this, but on the Friendster site, people were creating playful cartoon characters, and they were being deleted because they weren't true descriptions of the world. In the FOAF universe, we just don't have that control. If you create a FOAF file, out there on the Web, and it describes Peter Parker, or Mr Benn, I can't delete that file, 'cos its on your Web server. So we need an architecture that allows us to survive with lies, survive with half-truths. Anyone can say anything about anyone, using any RDF vocabulary. So there are etiquette issues there, there are privacy and politics issues there. We primarily use FOAF for self-description, but some of the lightning talks later will address its use in activism, where for example we might be talking about politicians. FOAF increasingly gets lumped with this Social Networking thing. When we first started the FOAF project, a few years back, the big social networking site was Six Degrees. Last year the big social networking site was Friendster. Last week it was Orkut. Who knows what it'll be in 6 months time. The driving ethic behind FOAF and a lot of this Semantic Web work is this sense that people want their data back, that they want control of their data, they want to be able to migrate it between hosting sites, to be able to host it themselves... A good friend of mine was copying and pasting her profile from Friendster into Orkut last week. There's gotta be a be a better way, there really has. These sites they also partially describe the world, and present that as a full description of the world... so, "he's on Friendster, she's on Orkut...", they don't show up in each other's friends lists. To my mind, that's just wrong. There's one world, and we kind of stumble towards describing it. We sorely need import and export between these sites. And the way FOAF was designed, it gives you the basics for doing that. What we don't have in FOAF is a representation yet of all the nitty-gritty ratings, profiles, and "fan" characteristics that these sites use. We don't have a "dating" vocabulary. I'm kind of scared of adding one, but I'm sure it'll happen eventually. Where are we up to today? I should have come here with stats. There is a growing number of self-hosted foaf-a-matic generated files. Foaf-a-matic is a simple Javascript interface for describing yourself and linking to your friends. If see these funny little FOAF faces on a Web site that link to a FOAF file... there are a suprising number of them. People have gone to the foaf-a-matic site, created a self description, uploaded that to their site. On top of that there are an increasing number of services generating FOAF. So Ecademy, a business networking site in the UK. TypePad and Cocolog, the hosted version of Movable Type, they both produce and consume FOAF descriptions. Some interesting characteristics of what they've done there, which maybe we can talk about later. Live Journal, I heard last week, are about to switch on 2 million FOAF files. And we'll hear later about Tribe. So... getting the data... it's _almost_ easy, a few lines of Perl code and suddenly there's 2 million more FOAF descriptions in the world. Consuming the data is an interesting problem, and again this is something that distinguishes the approach we've taken from the approach of the monolithic aggregator sites. So you see FOAF user interfaces that use HTML, that use SVG for zooming around loads of people... Edd's going to talk about text-mode interfaces to it. It's a very interesting data set. There's a lot of it out there, a lot of it is public, and its there for anyone to play with. It's under nobody's control. You don't have to wait for the hosting site to hire some user interface designers. You can do a good job (or a bad job) yourself. The other thing is... FOAF terminology... the little bits of language that we defined in the FOAF spec, are being used in other RDF data formats. They describe people adequately enough, and the RDF design... tries to have a division of labour such that we don't duplicate things. If you're creating an RDF format, say for calendar exchange, and you want to talk about people, you can just plug in FOAF, you don't need to duplicate it. There's a couple of styles of using FOAF. There's a couple of styles of social networking sites. And we tried to architect FOAF to be neutral between them, although I think there are some cultural biases in the FOAF crowd towards one of them. So, you can be very explicit in a FOAF file. I could say: "Edd's my friend", or I could say "Edd's my _best_ friend". Or I could say "Edd's my arch-nemesis". I could plug in any set of interpersonal relationships that someone else out there decides to make available. That's a very... articulated, social networking site style of talking about sociality. There's also, and my biases lean this way, a more kind of implicit, evidence-based approach. So we talk about: Liam and I work for the same organisation. Or... Libby and I went to the same school. The two of us co-authored a document, and so on. So you describe facts about the world which have associated with them implicit information about your relationships to someone. There's quite a lot of work in the FOAF crowd on image metadata, I think because of this bias towards trying to humanise the machine-representation of interpersonal relationships. So we talk about 'co-depiction', y'know, Edd and I are in the same photo. A lot of this stuff, it's processable by machines, you have to look at the picture to get the point. Applications... we're going to hear about in more detail in the lightning sessions, but there are a couple of favourites I have. The connection of this stuff to locality... so... having the ability to scan the room to see who's there, with bluetooth or there's another application called FoafFinger that uses Rendevous to get the profiles, to get the weblogs, to get the most recent weblog articles written by people who are with you in the room. If you've ever sat at the station, or been wandering around town wondering who all these people are, but not really wanting to ask them, there's quite an interesting application there. The activist side of things is my pet hobbyhorse, I think there's a whole separate talk there. One of our cuter apps is a dataset taken from theyrule.net, which describes webs of connections between boards of directors in the USA. I also wanted to mention DeanLink recently, the database of activists on the Howard Dean campaign. There's an RDF view of that dataset, so you can scoop that up and see connections amongst people and collaborators there. People ask what's FOAF's _for_... I can gives use cases. The standard cheesy business traveller use case is: I'm a vegetarian, I always forget to tell the airport, tell the airlines that I'm vegetarian. You should be able to check in at the airport and have their machine read your homepage and go "Hey! You appear to be a vegetarian, but we don't have a special missing listed for you - is there something up here?" That's not rocket science. Basically, what is FOAF for? What is data for? Any FOAF file is just some RDF, it just contains statements about the world, whatever you might want to use that data for. That's why it's up there. So the vegetarian thing, is an arbitrarily picked use case. Whenever you want data, that's what FOAF's for. So anything you can say in FOAF, you can ask of a crawl of harvested data. Whether you believe that data is a whole other topic... Just to recap. FOAF is designed really to be a freeform platform. If you think back to the homepage, your homepage, it's a blank slate, page... you can write what you like there. No-one tells you what you can write, no-one tells you which words you can use. We're trying to reproduce that for machines. It's out of control by design. We don't want to say what words you can use. We don't want to say where it has to be hosted. We don't want to own your data. I think the technical machinery, after a few year's work in the RDF community, is reasonably there. There are RDF crawlers, there are RDF data stores, there are query systems. There is no single way to deploy FOAF. We're really feeling our way around the options here. Whether it's hosted. Whether it's exported from something like LiveJournal. We're on the borders of going mainstream. A few lines of code separate us from there being millions of these files. That seems kinda scary. Although I think we've got the technological aspects sorted, what we don't really have is a sense of the legal, privacy, etiquette issues. Within the closed world of say Orkut, you get this awkwardness of someone saying they're a fan of you, someone giving you 4 stars for sexyness... it's kinda unsettling for a lot of people. It's been unsettling for people to see the marital styles and sexual biases of their business colleagues. But it's been within the scope of a particular site. Now what happens when we take that and we scale it to the Web? We make it possible for you to say that about anyone, you don't have to log in on Orkut.... All the stuff we didn't have time to talk about... [end] Questions? [transcribe later]