RDFWeb Scutter

'Scutter' is an RDF harvesting and indexing utility. It consists of a collection of libraries and utilities that support the traversal, storage and query of a distributed web of RDF/XML documents.

Scutter is written in Ruby, and uses the RubyRDF for most of its functionality. In addition, an RDF parser is required, alongside an RDBMS/SQL database for storage of the harvested data. Scutter can make use of PGP/GPG cryptographic signatures of the RDF content, if GPG is installed. Web interfaces (for humans and machines) to the aggregated RDF data can be created using HTTP (via the WebRick library) and/or SOAP (using SOAP4R).


A rather rough distribution of Scutter has been packaged. Grab the tarball from downloads directory. The .zip version probably lags behind (sorry!) but anyhow contains exactly the same files. The unpacked directory listing is also browsable online.


Scutter has the following dependencies...


A Ruby interpreter (1.6+) is needed.

debian: apt-get ruby

an RDF parser

Scutter requires an RDF parser that can generate N-Triples. It currently uses Redland's Raptor parser, via the 'rdfdump' command line utility.


an SQL/RDBMS database

Scutter stores RDF in a relational database. While not the fastest means for querying RDF, an RDBMS makes various data management and merging tasks easy. We can always dump data from the SQL store for faster query in other systems. Scutter currently uses PostgreSQL, but the basic approach (using RubyRDF's Squish-to-SQL rewriter) should generalise to other SQL implementations that allow self-joins on a table.

Web interfaces

...also various Unix/Linux-isms probably pollute the code. Basics are pretty cross-platform. This is an alpha release, and the chances of success are correlated with how similar your machine's setup is to mine. Let me know if you have any problems, with the documentation or code.

Usage Instructions

Scutter is run from the commandline.

You should be able to get some results by typing 'bin/scutter --scutter' from the directory in which you unpacked these files.

Use 'bin/scutter --scutter some-url' to chose a different starting point.

'bin/rdfweb_server' runs a sample servlet under the WebRick framework. Connect to http://localhost:2000/rss/ to see a list of all the RSS files the harvester found. The servlet can also download and display the current contents of each channel.

More documentation needed here (eg. GPG/PGP assurances)