SWIPE is a simple RDF vocabulary that provides some basic facilities to support the extraction of structured RDF data from arbitrary HTML, XHTML and pseudo-HTML textual content. SWIPE can be used to support simple screenscraping and meta-search applications, or extended (like RSS) to more richly describe interfaces to Web data services.
Latest Version: http://rdfweb.org/2001/01/swipe/
This is a working discussion note and should not be considered a stable target for implementation.
Comments should be directed to the RDFWeb-dev mailing list (email@example.com), copying RSS-Dev (firstname.lastname@example.org) for RSS-related issues.
Copyright © 2000,2001 by the Authors.
(this paragraph copied without permission from the RSS 1.0 Specification :-)
Permission to use, copy, modify and distribute the SWIPE Specification and its accompanying documentation for any purpose and without fee is hereby granted in perpetuity, provided that the above copyright notice and this paragraph appear in all copies. The copyright holders make no representation about the suitability of the specification for any purpose. It is provided "as is" without expressed or implied warranty.
This copyright applies to the SWIPE Specification and accompanying documentation and does not extend to the data format itself.
SWIPE is a simple RDF vocabulary that provides some basic facilities to support the extraction of structured RDF data from arbitrary HTML, XHTML and pseudo-HTML textual content. The SWIPE vocabulary is in particular intended for use with ill-formed markup (typically online search results) that may not be parseable using more formal SGML and XML based tools. SWIPE descriptions are associated with one or more online searchable services that are typically accessed using the HTTP protocol, and that typically return HTML or pseudo-HTML in response to a search request consisting of a number of attribute/value pairs. The combination of a SWIPE service description and some textual data returned from a query to that service provides the basis for data extraction tools to generate an RDF data graph representing (some aspects of) the returned data.
SWIPE is designed to provide a relatively simple, practical tool that can be used to encapsulate ad-hoc, human oriented Web services behind a more machine oriented interface. As such it might be used alongside specifications such as XML-RPC, SOAP etc that offer API or message-baed Web data interfaces, or with tools in the WWW::Search, WIDL, and Sherlock tradition that are more concerned with "screen scraping" data from arbitrary HTML-formatted search results. XSLT, the XML transformation language, is another related technology. Where appropriate (eg. search result pages that are in XHTML format), a SWIPE service description can reference an XSLT sheet instead of employing the more basic regular-expression extraction language described below.
SWIPE descriptions are intended for general use by RDF tools, but is in particular for use as an extension module with the RDF Site Summary (RSS) vocabulary. The base RSS 1.0 specification allows for the description of a Web content feed as a channel consisting of a list of items (such as news, updated pages, announcements etc). RSS 1.0 also allows for a very simple characterisation of a search facility associated with a Web site or channel. SWIPE can be used to augment that description with additional meta-information to allow RSS 1.0 tools to better process search results from the search services mentioned in RSS site descriptions. This might be used, for example, to aggregate search results from a distributed search of several RSS-described data sources, or to provide a common user interface for managing and navigating search results.
This vocabulary is not intended to replace the richer facilities offered by fully-featured search protocols such as Z39.50, DASL (WebDAV), LDAP etc. It is also not intended to serve as a general purpose machine interface (API, query language etc.) to XML or RDF networked data sources. Future extensions to SWIPE may provide for better interoperability with more sophisticated (and heavyweight) specifications.
The SWIPE vocabulary is divided into "Basic" and "Util" sections, reflecting a pragmatic, tool oriented approach. Additional utility constructs may be added in future revisions to this specification, or by provided by 3rd party extensions. The SWIPE-Basic core is a very simple set of properties and types that should allow simple, practical tools to be easily constructed using generic RDF APIs.
The following properties and types are defined.
swiperrelation connects our SWIPE information to some identified Web service. Rather than use the search resource (CGI-script, servlet etc) as the identifier for the service, we use the 'home page', eg.
http://oreillynet.com/meerkat/. Consequently, we can use the
swiperproperties of a Web service to get to a bundle of RDF properties that describe how to interact with that service. The range of the
inrelation is used on a
SwipeSpec, and points to an RDF container listing one or more SWIPE
outrelation is used on a
SwipeSpec, and points to SWIPE
ParseRules. The interpretation of the "parse rules" info depends on the format(s) used; we indicate this using a
parseformatproperty on the
actionproperty, like the HTML forms attribute of the same name, indicates a Web service that can be respond to parameters passed via HTTP GET or POST methods.
methodproperty, like the HTML forms attribute of the same name, indicates the (@@TODO: or 'a'?) HTTP method(s) through which a Web service offers an interface. (@@TODO: extensions? SOAP/XP/XML-RPC?)
TODO: more properties are needed for richer extraction. Write Schema.
SWIPE-Basic defines the following RDF classes.
DataItemis a super-class for
BasicSpec is a sub-class of SwipeSpec. A BasicSpec will often be described using properties from other namespaces such as the RSS Syndication (@@TO: refs, status) and Taxonomy modules.
[to be specified]
Here we show the use of the Swipe vocabulary in a stand-alone RDF description. It can also be used as an extension module for use with RSS and Dublin Core applications.
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:rss="http://purl.org/rss/1.0/" xmlns="http://rdfweb.org/2000/01/swipe-ns#"> <rdf:Description rdf:about="http://oreillynet.com/meerkat/"> <dc:title>Meerkat: An Open Wire Service</dc:title> <dc:description> Meerkat is a Web-based syndicated content reader providing a simple interface to RSS stories. </dc:description> <dc:creator>Rael Dornfest</dc:creator> <dc:publisher>The O'Reilly Network, O'Reilly & Associates, Inc.</dc:publisher> <swiper> <BasicSpec rdf:about="" method="GET"> <action rdf:resource="http://oreillynet.com/meerkat/sherlock"/> <macfile rdf:resource="http://oreillynet.com/meerkat/etc/sherlock/meerkat.sit"/> <!-- the RSS syndication vocabulary tells us how often to refresh the data --> <sy:updatePeriod>daily</sy:updatePeriod> <sy:updateFrequency>7</sy:updateFrequency> <sy:updateBase>2001-01-01T12:00+00:00</sy:updateBase> <!-- todo: banner image / text /link, use rss and util vocabs --> <!-- incoming data needed by web service --> <in> <rdf:Seq> <li><Input name="t" content="7DAY"/></li> <li><Input name="_fl" content="sherlock"/></li> <li><UserInput name="s"/></li> </rdf:Seq> </in> <!-- interpretation rules for output from web service --> <out> <ParseRules resultListStart="<meerkat>" resultListEnd="</meerkat>" resultItemStart="<story>" resultItemEnd="</story>"> <!-- here we use a simple text-match approach --> <parseformat rdf:resource="http://www.apple.com/sherlock/"/> </ParseRules> </out> <!-- XSLT and other output format handlers would be listed here --> </BasicSpec> </swiper> </rdf:Description>
Select image for the full picture.
example goes here...
SWIPE descriptions can be used to create Sherlock channels compatible with The Apple 'Sherlock' plugin format. [MacSherlock]. Conversely, SWIPE can provide an open, modular and extensible representation for the metadata encoded within Sherlock plugins. RDF-capable browsers such as Mozilla (and Netscape 6.0) that implement a Sherlock-like search system can use RDF datasources to interchange search service descriptions. Similarly, online services such as Sherch which understand the Sherlock plugin format will be able to exploit SWIPE descriptions supplied via RSS syndication.
Mozilla, the opensource browser and Web application toolkit, makes heavy use of RDF and XML. The Mozilla RDF documentation ([MozillaRDF]) provides more information on the Mozilla RDF APIs and RDF-based services than can be presented here. In particular, see the Mozilla search documentation [MozillaSearch] for details of the Sherlock-compatible search tool built into Mozilla.