Codepiction Paths - Explanation

  1. Building the Graph

    The first stage is to get the data to work on. This is done using a simple query on the database:
    SELECT ?mbox1, ?mbox2, ?uri
    WHERE
           (foaf::depiction ?x ?img)
           (foaf::depiction ?z ?img)
           (foaf::thumbnail ?img ?uri)
           (foaf::mbox ?x ?mbox1)
           (foaf::mbox ?z ?mbox2)
    USING foaf for http://xmlns.com/foaf/0.1/
    	
    This (essentially) returns a table where each row contains two mailboxes and the uri of a (thumbnail of) an image depicting each of the people with the mailboxes. For example:

    mbox1 mbox2 uri
    mailto:alice@example.com mailto:bob@example.com http://example.com/images/aliceandbob_small.png
    mailto:alice@example.com mailto:bob@example.com http://example.com/images/aliceandbobagain_small.png
    mailto:bob@example.com mailto:bob@example.com http://example.com/images/bob_small.png
    mailto:claire@example.com mailto:alice@example.com http://example.com/images/claireandalice_small.png
    ... ... ...

    There are two things to note here: people can be codepicted in more then one image; and people are codepicted with themselves in any image in which they appear.

    The latter is something to watch out for: since such entries are irrelevant for paths we don't use them. The former is quite important for our model - we store all images depicting the same two people.

  2. The Model

    The model we use is a simple graph. It amounts to three classes: TGraph.java, TNode.java and TEdge.java.
    TNode.java
    This holds the mailbox, plus an array of TEdges which connect it to other nodes.

    Note: TNode also holds knows when it has been visited in the paths search (see below).

    TEdge.java
    This holds the two TNodes it connects, plus an array of the images in which the people represented by the nodes are depicted.

    Currently sending a getLabel() message to a TEdge returns images at random.

    TGraph.java
    This holds and arrays of all the nodes and all the edges. It also hold a mailbox-to-node lookup table and a mailbox-mailbox-to-edge lookup table so that we don't duplicate nodes or edges.

    TGraph provides an easy way to add mbox1-mbox2-image triples to the graph. This does the necessary lookups and also ignores the troublesome mbox-mbox(again)-image triples.

  3. Finally - Shortest Paths

    Finding shortest paths is pretty easy. The method works as follows:
    1. Mark all the nodes as 'not visited'.
    2. Pick a start point. Make a (singleton) set of this point called 'parents'. Mark this point as 'visited'.
    3. For each node in parents get each edge, and the node at the other end of the edge (the 'child'). Create a new array called 'children'.
    4. If the child hasn't been visited then the path taken to this node is the shortest path from the initial node. Mark the child as 'visited' and add the path to the 'shortest paths' list. Add this child to an array 'children'.
    5. Once each node in 'parents' has been dealt with set 'parents' to 'children'. If children is empty we have finished. Otherwise go back to step 3.
    This prose version isn't very clear. It is also a little tricky to implement - getting the path is difficult. So what we do is use a tree to store the paths. The initial node is the root (it has no parent). Each child that is added is tacked onto the bottom of the tree below its parent. It is added to a 'leaves' array. The paths can be derived by going through each 'leaf' and stepping up parents until the root is hit.

    Graphically the procedure looks like this:

    At the end parents is {E} and is linked to no unvisited nodes, so children is empty and we finish. leaves = {C, D, B, F, H, G, E} at the end.

    The code is pretty simple. We use a class TElement.java to represent an element of the tree (TElements are actually used in the parent/children arrays, rather than nodes). A TElement contains a parent TElement (null for the root), a TNode, and a TEdge (which connects the TElement to its parent, null for the root).

    public Vector findShortestPathsFromNode(TNode root)
        {
    	// Clear graph for search
    
    	for (Iterator nodeIterator = nodes.iterator();
    	     nodeIterator.hasNext();)
    	    {
    		TNode node = (TNode) nodeIterator.next();
    		node.setVisited(false);
    	    }
    
    	// But (of course) we start with a node, so let's say it's
    	// been visited
    
    	root.setVisited(true);
    
    	// Set up the root of the tree
    
    	TElement treeRoot = new TElement(root, null);
    
    	// Set up parents vector (just treeRoot) and leaves (empty)
    
    	Vector parents = new Vector();
    	parents.add(treeRoot);
    
    	Vector leaves = new Vector();
    
    	// indicates whether to keep going
    	boolean continuing = true;
    
    	while (continuing)
    	    {
    		Vector children = new Vector();
    		continuing = false;
    
                    // Iterate through parents
    		for (Iterator parentIterator = parents.iterator();
    		     parentIterator.hasNext();)
    		    {
    			TElement parent = (TElement)
    			    parentIterator.next();
    			TNode parentNode = parent.node();
    
                            // Iterate through edges from parent
    			for (Iterator edgeIterator =
    				 parentNode.getEdges().iterator();
    			     edgeIterator.hasNext();)
    			    {
    				TEdge edge = (TEdge)
    				    edgeIterator.next();
    				TNode childNode =
    				    edge.getOtherNode(parentNode);
    				if (childNode.notVisited())
    				    {
    					childNode.setVisited(true);
    					TElement child = new
    					    TElement(childNode,
    						     parent, edge);
    					leaves.add(child);
    					children.add(child);
    					continuing = true;
    				    }
    			    }
    		    }
    		parents = children;
    	    }
    
    	return leaves;
        }
    	
    Simple enough I hope.
  4. Scalability - Pah

    Currently the graph has fifty nodes and about seventy-eight edges. It takes under two seconds to create the graph from a database and calculate every path. With our (currently stupid) system this becomes two minutes since we recreate the paths database each time. Doh.

    Improvements:


Damian Steer
Last modified: Tue Feb 19 18:41:27 GMT 2002