X-Pack Graph explore API

edit

Initial request

edit

Graph queries are executed using the explore() method:

GraphExploreRequest request = new GraphExploreRequest();
request.indices("index1", "index2");
request.useSignificance(false);
TermQueryBuilder startingQuery = new TermQueryBuilder("text", "projectx");

Hop hop1 = request.createNextHop(startingQuery); 
VertexRequest people = hop1.addVertexRequest("participants"); 
people.minDocCount(1);
VertexRequest files = hop1.addVertexRequest("attachment_md5");
files.minDocCount(1);

Hop hop2 = request.createNextHop(null); 
VertexRequest vr2 = hop2.addVertexRequest("participants");
vr2.minDocCount(5);

GraphExploreResponse exploreResponse = client.graph().explore(request, RequestOptions.DEFAULT); 

In this example we seed the exploration with a query to find messages mentioning the mysterious projectx

What we want to discover in these messages are the ids of participants in the communications and the md5 hashes of any attached files. In each case, we want to find people or files that have had at least one document connecting them to projectx.

The next "hop" in the graph exploration is to find the people who have shared several messages with the people or files discovered in the previous hop (the projectx conspirators). The minDocCount control is used here to ensure the people discovered have had at least 5 communications with projectx entities. Note we could also supply a "guiding query" here e.g. a date range to consider only recent communications but we pass null to consider all connections.

Finally we call the graph explore API with the GraphExploreRequest object.

Response

edit

Graph responses consist of Vertex and Connection objects (aka "nodes" and "edges" respectively):

Collection<Vertex> v = exploreResponse.getVertices();
Collection<Connection> c = exploreResponse.getConnections();
for (Vertex vertex : v) {
    System.out.println(vertex.getField() + ":" + vertex.getTerm() + 
            " discovered at hop depth " + vertex.getHopDepth());
}
for (Connection link : c) {
    System.out.println(link.getFrom() + " -> " + link.getTo() 
            + " evidenced by " + link.getDocCount() + " docs");
}

Each Vertex is a unique term (a combination of fieldname and term value). The "hopDepth" property tells us at which point in the requested exploration this term was first discovered.

Each Connection is a pair of Vertex objects and includes a docCount property telling us how many times these two Vertex terms have been sighted together

Expanding a client-side Graph

edit

Typically once an application has rendered an initial GraphExploreResponse as a collection of vertices and connecting lines (graph visualization toolkits such as D3, sigma.js or Keylines help here) the next step a user may want to do is "expand". This involves finding new vertices that might be connected to the existing ones currently shown.

To do this we use the same explore method but our request contains details about which vertices to expand from and which vertices to avoid re-discovering.

GraphExploreRequest expandRequest = new GraphExploreRequest();
expandRequest.indices("index1", "index2");


Hop expandHop1 = expandRequest.createNextHop(null); 
VertexRequest fromPeople = expandHop1.addVertexRequest("participants"); 
for (Vertex vertex : initialVertices) {
    if (vertex.getField().equals("participants")) {
        fromPeople.addInclude(vertex.getTerm(), 1f);
    }
}

Hop expandHop2 = expandRequest.createNextHop(null);
VertexRequest newPeople = expandHop2.addVertexRequest("participants"); 
for (Vertex vertex : initialVertices) {
    if (vertex.getField().equals("participants")) {
        newPeople.addExclude(vertex.getTerm());
    }
}

GraphExploreResponse expandResponse = client.graph().explore(expandRequest, RequestOptions.DEFAULT);

Unlike the initial request we do not need to pass a starting query

In the first hop which represents our "from" vertices we explicitly list the terms that we already have on-screen and want to expand by using the addInclude filter. We can supply a boost for those terms that are considered more important to follow than others but here we select a common value of 1 for all.

When defining the second hop which represents the "to" vertices we hope to discover we explicitly list the terms that we already know about using the addExclude filter