Many operations on graphs are easily expressible as a graph traversal. An example is a transitive closure. Expressing complicated traversals can be quite challenging when using the object-oriented APIs like Jena or Blueprints. For that reason, a framework Pipes together with a graph-oriented DSL Gremlin were created. Pipes implements the basic building blocks of a traversal/graph query in a form of a data-flow pipe, i.e. an object transforming its input (e.g. a vertex or an edge) to some outputs (usually an edge or a vertex or their collections). The pipes can then be tied together into a pipeline. As the individual pipes are implemented as Java classes, the code for expressing a single pipeline can be quite verbose. For instance, consider the following code traversing from a document to its author:
/**
* @return Authors of the document.
*/
def authorPipe(doc: Vertex) = {
val authorE = new InEdgesPipe("author_of")
val authorV = new OutVertexPipe
val docAuthorP = new Pipeline[Vertex, Vertex](authorE, authorV)
docAuthorP.setStarts(Arrays.asList(doc))
asScalaIterable(docAuthorP)
}
Clearly even in the very simple case like this one we needed quite a lot of lines to implement the traversal. Fortunately Gremlin can do this in a much more efficient manner: it allows the developer to express the query in an XPath-like syntax and then it compiles the query into the pipe(line)s. However Gremlin is implemented in Groovy and therefore it (1) does not support static typing, and (2) it may be problematic to integrate the Gremlin code into a mixed Scala/Java project. At least I have encountered with troubles when I tried to configure a Maven build for a project mixing all the three languages.
As Scala has excellent capabilities for designing DSLs, I wrote couple of classes implementing part of the Gremlin functionalities in Scala. The implementation consists of essentially three components:
- Wrappers for Blueprints’ Vertex, Edge, and Element classes exposing them as a Scala mutable Map
- Few pipes for running Scala closures and a pipeline helping class that composes the pipes and finally generate the com.tinkerpop.pipes.util.Pipeline object
- An abstraction of a vertex/edge property allowing a handy type inference in many cases
The wrapper classes RichVertex, RichEdge, and RichElement implement the mutable Map trait mapping the element’s properties to its values. Further, the wrappers contain a trav[T] method that returns the helping pipeline object which enables the build of a pipeline in the Gremlin style. Probably a simple example will clarify how these things work together:
import ie.deri.uimr.crosscomanalysis.graphdb.gremlinska.GremlinSka._ import ie.deri.uimr.crosscomanalysis.graphdb.GraphProperties.AUTHOR_OF def authorPipe(doc: Vertex) = doc.trav[Vertex].outE(AUTHOR_OF).inV
So we were able to squeeze the original code to essentially a single line, which is, I’d argue, also more readable: given the node doc we want to construct a traversal ending in another vertex (trav[Vertex]) representing the author of the document. In order to get it, we need to follow the edge of type AUTHOR_OF and finally return the nodes which this edge leads to (inV). The first line imports the implicit conversions wrapping the vertices and edges to provide the map-like behaviour an the trav method. We can further make use of the fact the enriched elements behave like a map and e.g. filter out edges with weights below a certain threshold t:
import ie.deri.uimr.crosscomanalysis.graphdb.gremlinska.GremlinSka._
import ie.deri.uimr.crosscomanalysis.graphdb.GraphProperties.{AUTHOR_OF,WEIGHT}
def authorPipe(doc: Vertex, t: Long) = doc.trav[Vertex].outE(AUTHOR_OF).strain{e => e(WEIGHT) >= t}.inV
The strain step is similar to the Gremlin’s filter step, but as a method of the same name already exists in Scala’s Iterable trait, we chose a different name. As a consequence, the traversed pipe (its results) can be filtere ex-post by Scala’s filter method as well.
Let’s now look more closer how the graph properties abstraction works. Each property is an instance of
case class GraphProperty[T](id: Int, name: String)
Hence it is defined by unique id and name, and further by the type T of the values of the property. For easy definition of properties there exists a trait GraphPropertiesLike that automatically manages the property ids and checks whether all the property names are unique. Its behaviour resembles Scala’s Enumeration trait. The extra type information allows the Scala compiler to infer the type without an explicit recast which is needed in the case of the plain Java API. In the example above the compiler thus inferred the edge’s weight is of type Long and not AnyRef which would be necessary to first recast before any numerical comparison.
Although this implementation is rather experimental, it proved to be useful in many cases in which I needed to work with Blueprints-enabled graphs (especially Neo4J). In case you would like to play with it, feel free to download and modify the sources. I would appreciate any questions or comments regarding its usage or implementation.