Neo4j Backing Store

From OpenCog
Jump to: navigation, search

This wiki page was created December 2014, initially by Ben Goertzel, and presents a specific proposal regarding creating an alternative backing store for OpenCog, using the Neo4j graph database.

In short, the proposal is to

  • Specialize the current BackingStore API into a GraphBackingStore API, capable of taking special queries that map naturally into simple manipulations of Neo4J graph traversals.
  • Create custom indices within Neo4j, appropriate for OpenCog.
  • Use Neo4j as a backing store for OpenCog (accessible via the GraphBackingStore API).

The motivation for suggesting Neo4j is a combination of the following factors

  • It's a graph DB so its internals match those of the AtomSpace reasonably well.
  • It has reasonable OSS license terms (GPLv3 and AGPLv3).
  • It has a robust software ecosystem around it (e.g. connection to web-services frameworks, plugins for spatial and temporal indexing, etc.) and a fairly large, active user community.
  • We've used it before (well, Rodas and Eskender have) so we have some validation that there aren't weird, obvious gotchas in its usage.
  • As a side point: a couple potential customers for OpenCog work are already using Neo4j, so using it will help with these particular business relationships.
  • A specific analysis of the types of queries we probably want to execute against a backing store in the near future, and a comparison of this to Neo4j's querying and indexing methods, suggests that we should be able to execute these queries reasonably efficiently against Neo4j, via appropriate machinations as will be suggested below.
  • Neo4j can be run distributed across multiple machines; currently this uses a master-slave architecture, but there is momentum behind scaling it further. Scalability requirements and issues are discussed in Scaling OpenCog. See also Distributed AtomSpace Architecture.

OpenCog, in the medium-to-long term, is not going to commit to any particular backing store technology; the BackingStore API should remain storage-technology-independent. However, in the short-to-medium term, the choice of backing store technology may have some meaningful impact on development and utilization of the system; so the choice of which backing stores to utilize isn't a totally trivial choice even though it's not a "permanent" one.

A warning to the reader is that, to really grok what I'm getting at here, you'll need to at least lightly familiarize yourself with the nature of Neo4j traversals and paths.

While this page mainly references Neo4j, most of the discussion would actually apply to using any reasonably functional graph DB as a Backing Store; the final section discusses this point in the context of comparing Neo4j to HGDB (HypergraphDB) as a Backing Store.

Hendy will propose working on this for GSoC 2015. --Hendy (talk) 07:24, 9 March 2015 (CDT)

Linas would like to add the following remarks: Perhaps notable in the above justifications is that performance and scalability is not mentioned. The example queries are easy: the first example is so easy, that it would not normally make sense to run this in the pattern matcher, unless one was extremely lazy about writing code one afternoon. Both queries suppose datasets that would seem to be sufficiently small, that the query could be fully performed in RAM, using the currently available technology. Thus, the limits of what is possible are not being stretched here.

Several remarks below seem to imply that using Neo4J would be faster than doing the query in-RAM. This seems to be unlikely. The atomspace, and the pattern matcher do have some serious bottlnecks and limitations. The pattern matcher has a hefty setup overhead, and makes a number of worst-case, non-optimal assumptions about how to perform the query. In essence, its designed to work well for complex queries, not simple ones. Moving to Neo4J does not seem to address any of the current design issues. Parts of the proposal seem sufficiently brain-stormy that the resulting system seems likely to be both more complex, and also significantly slower than what is possible today. Extreme care and cleverness would be needed to not build a performance de-accelrator.

Current Backing Store API

The current BackingStore API is in opencog/atomspace/BackingStore.h.

and supports a small but very useful API including query functions

  • getLink
  • getAtom
  • getNode
  • getIncoming

Currently it is used to access a Postgres backing store. Discussion has been had intermittently about augmenting Postgres with a different sort of backing store, accessible via the same API; and about extending the API.

For comparison, the current API for accessing the (in-RAM) Atomspace can be seen in opencog/atomspace/AtomSpace.h

Searching multiple AtomSpaces via the pattern matcher

Currently the pattern matcher does not search multiple AtomSpaces by default. This can be easily implemented by overloading the perform_query() callback so that the AtomSpace::fetchIncomingSet() method is called for each constant in the query. Recall that the fetchIncomingSet() method is defined so that it will fetch atoms from remote atomspaces/atom repositories. Implementing this is "almost" trivial. Because almost all queries have only a small number of constants in them, performance should be excellent when all of the Atoms can fit in RAM. There is no obvious or apparent choke-point in the current design.

However, actual performance on real-world data is not known. If the portion of an example database is too large to fit into available RAM, the actual query would need to be split out into parts; designing for this is considerably more complex, possibly as complicated as the proposal below.

Note also: the pattern matcher was designed to carry out queries far more complex than the ones analyzed below. It has a large setup overhead, and makes a number of worst-case assumptions for various search types. By contrast, the queries below are easy, bordering on trivial. In the first case, performing the query directly, using ad-hoc code, will almost surely be faster than using the pattern matcher. With some cleverness about how the join is performed, significant improvements over the "dumb" pattern matcher should be possible.

Comments on Indexing/Searching by Space and Time

This section is a bit digressive, but is relevant to the general future requirements for backing stores...

Specific APIs for accessing temporal information from the Atomspace are defined in opencog/spacetime/TimeServer.h.

So far as I can tell (I may be misunderstanding something) this API doesn't have a method of the form "do getXXX but restricted to Atoms with timestamps or timelags intersecting time interval L". Such a method would seem very useful both for the AtomSpace and for the backing store. I mention this here because we'd like any technology or approach chosen for the backing store to support this in the future.

Handling of spatial information is even more preliminary and heavily oriented toward the Unity3D game world: opencog/spacetime/SpaceServer.h

Ultimately we will want to be able to handle queries of the form "do getXXX but restricted to Atoms with spatial locations intersecting region R in space S"

(An even more digressive note is that we ultimately want the time and space stamps to be able to be relative to a particular "world" rather than absolute in nature. For example, time-stamps in regular life on Earth are one thing; but for a game AI there may be time-stamps within that game-world's time axis. Similarly, space on Earth is different than space in a game world. So, ultimately search methods regarding time and space would optionally take an argument regarding what world they refer to, with our everyday physical world on Earth being the default. (I won't get into measuring time and space quantum-relativistically near a physics-style singularity here ;D) )

Expanding the BackingStore API

A couple methods in the Atomspace API that would be nice to have for the BackingStore are:

  • getHandlesByOutgoing (?? huh? You mean, 'anything that has this outgoing set, we don't care about its type ??)
  • getNeighbors

That's a small point though.

We would also like certain trivial types of pattern matcher queries to be executable below the BackingStore, at the database level. There is only one reason for this: the database itself is far larger than what can fit into RAM, and thus, the database query optimizer is needed to determine what disk sectors have to be accessed. Thus, a true test of the usefulness of these queries happens only when the required subset that would normally be loaded into the atomspace is too large to fit into the local machines RAM.

Below are two example queries from real application areas. These are fairly representative of useful queries that are "easy, almost trivially easy" -- the first example is so easy, that it does not really make sense to bother with the pattern matcher; a half-dozen lines of code can do it better, faster. However, it is illustrative. These two queries can be very rapidly performed using the current AtomSpace and BackingStore design.

Most typical pattern matcher queries are far more complex than the queies analyzed below. The pattern matcher is optimized for performing complex queries, not for simple ones.

Example OpenCog biology query

Suppose we have the query "find drugs whose active ingredients dock with the protein FKH1". This would be formulated as

(AND
  (Inheritance 
    $X 
    (ConceptNode "drug"))
  (Evaluation
    (PredicateNode "dock")
    (List
      $X
      (ProteinNode "FKH1"))))

This is an easy and straightforward pattern matcher query, as it has only two clauses and one variable joining them. In fact, it is almost too easy: a dozen lines of ad-hoc code could return the result easily enough, with slightly less setup overhead than the pattern matcher incurs.

In the current implementation, to run this across a database, assuming that the curent AtomSpace is empty, and that all of the atoms are in the database, there would be three calls to AtomSpace::fetchinIncomingSet(): once to get the incoming set of (ProteinNode "FKH1"), once to get the incoming set of (PredicateNode "dock") and once to get the incoming set of (ConceptNode "drug"). After this, the pattern matcher operates as normal, performing its usual traversal. This is an example of a query that can be run with (reasonably) high performance in the current system, with no design changes required. In essence, this is pretty much the simplest possible query that one could perform, and the current atomspace design handles it easily, and handles it well.

It is important to understand why this query is "easy": the number of items $X that could even be remotely considered to be drugs is probably less than 1 million (the FDA has approved 7K drugs). Assuming 500 Bytes per Atom (i.e. including all the assorted overheads coming from AtomTable indexes, truth values, attention values, etc.), this would require less than a single GByte of RAM for 1 million drugs. The number of things known about (ProteinNode "FKH1") is surely even less: there may be only 100K ListLinks that contain (ProteinNode "FKH1"). This, assuming that we start with an emtpy atomspace, it is straight-forward to pull *all* of these into RAM, burn up a few GBytes of RAM, and perform the search directly. For this reason, the above query should be considered "easy" and almost even "trivial".

There is also a second reason why the above query is "easy": if the pattern matcher did not exist (or if one chose not to use it), one could perform the query with about half-a-dozen lines of scheme code: the scheme utilities class is filled with all kinds of utilities that automate queries such as this, without having to use the pattern matcher.

Thus, let us assume that ther is another, similar query, that is not easy: for example, it might contain 100 million drugs, and 100 million known facts about "FKH1". In this case, a naive fetch from database would overflow the RAM. A proper query optimizer would be needed to determine which disk sectors contain the needed data, and fetch only those sectors, as needed, to perform the join.

So, to improve on the current implementation, several changes could/should be made:

  • Use pattern matcher callback so that all constants in an expression are fetched in advance. This simply requires identifying all leaf nodes that are not variables, and called AtomSpace::fetchinIncomingSet() for each. This risks overflowing RAM, so...
  • Extend the fetchinIncomingSet() API, so that only those atoms ath are of a given type are fetched. For example: for (ConceptNode "drug"), we are only interested in those atoms ath are InheritanceLinks.
If, after the above changes, the amount of data that would be fetched is still much too large to fit into available RAM, then there is another "obvious" extension: for every clause that contains only a single variable, give that clause, whole, to the backingstore. The idea here is that the number of things $X that satisfy
(Evaluation (PredicateNode "dock") (List $X (ProteinNode "FKH1")))
is probably much smaller, and might now fit into RAM. Thus, the final join of the two clauses could now be performed in-RAM.

Doing this is no longer "easy", but it is "doable", and could represent a significant time savings, by not having to instantiate un-wanted Atoms in the Atomspace.

Example biology query in Neo4j

The above example is almost trivial to solve with the current Atomspace and pattern matcher design, and can run with high performance, without bottlenecks, when the database fits in RAM. It is worth comparing it to what it takes to run a similar query, performed in Neo4J. The difference in complexity and effort is discernible. It is perhaps not surprising that Neo4J is already quite slow for such a simple query.

Create the graph:

CREATE
  (inheritance1:Inheritance) <-[:OPERAND]- (and1:And) -[:OPERAND]-> (evaluation1:Evaluation),
  (inheritance1) -[:SUPER]-> (drug:Concept {id: "drug", name: "drug"}),
  (inheritance1) -[:SUB]-> (acme:Concept {id: "acme_drug", name: "Acme drug"}),
  (evaluation1) -[:PREDICATE]-> (dock:Predicate {id: "dock", name: "dock"}),
  (evaluation1) -[:PARAMETER {position: 0}]-> (acme),
  (evaluation1) -[:PARAMETER {position: 1}]-> (fkh1:ProteinNode {id: "protein_fkh1", name: "FKH1"});

Opencog-neo4j1.png

Having this graph now you can query it: (try this query online at http://console.neo4j.org/r/wmrc6v)

MATCH
  (i:Inheritance) <-[:OPERAND]- (:And) -[:OPERAND]-> (e:Evaluation),
  (i) -[:SUPER]-> (:Concept {id: "drug"}),
  (i) -[:SUB]-> (x:Concept),
  (e) -[:PREDICATE]-> (:Predicate {id: "dock"}),
  (e) -[:PARAMETER {position: 0}]-> (x),
  (e) -[:PARAMETER {position: 1}]-> (:ProteinNode {id: "protein_fkh1"})
RETURN x;

Opencog-neo4j2.png

Query Results

+-------------------------------------------+
| x                                         |
+-------------------------------------------+
| Node[17]{name:"Acme drug",id:"acme_drug"} |
+-------------------------------------------+
1 row
32 ms

Execution Plan

Compiler CYPHER 2.2

Planner RULE

ColumnFilter
  |
  +Filter(0)
    |
    +SimplePatternMatcher
      |
      +Filter(1)
        |
        +TraversalMatcher

+----------------------+------+--------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|             Operator | Rows | DbHits | Identifiers |                                                                                                                                                                                                   Other |
+----------------------+------+--------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|         ColumnFilter |    1 |      0 |           x |                                                                                                                                                                                          keep columns x |
|            Filter(0) |    1 |      3 |     e, i, x |                                                                                                                            (Property(anon[83],id(1)) == {  AUTOSTRING0} AND hasLabel(anon[83]:Concept)) |
| SimplePatternMatcher |    1 |      3 |     e, i, x |                                                                                                                                                                                                         |
|            Filter(1) |    1 |      5 |     e, i, x | ((((Property(anon[151],id(1)) == {  AUTOSTRING1} AND Property(anon[251],id(1)) == {  AUTOSTRING4}) AND hasLabel(anon[251]:ProteinNode)) AND NOT(anon[22] == anon[41])) AND NOT(anon[182] == anon[220])) |
|     TraversalMatcher |    1 |     38 |     e, i, x |                                                                                                                                                                                , , , , , , , , , , , ,  |
+----------------------+------+--------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Total database accesses: 49

Example OpenCog NLP Query

For an NLP application, suppose one wants to find "all words that are in category C, and occur immediately before the word W at least one sentence" (say, all adjectives that occur right before "pig" in at least one sentence)

Then we'd have a query like

(VariableTypeLink
   $X1
   (VariableTypeNode 'WordNode))
(VariableTypeLink
   $W1
   (VariableTypeNode 'WordInstanceNode))
(AND
   (Inheritance
       $X1
       (ConceptNode "adjective"))
   (Inheritance 
       $W1
       (WordNode "Pig"))
   (Evaluation
        (PredicateNode "before")
        (List
           $X1
           $W1)))

Compared to the bio example given above, this query has a more complex structure, but is not any more difficult to actually perform. This is because, once again, there are only three incoming sets that are needed: once for (PredicateNode "before"), once for (WordNode "Pig") and once for (ConceptNode "adjective"). Thus, this is not actually any more expensive, complex, or slower than the previous example.

Again: it is important to note that the total number of words in the English langauge is less than 180K, of which only 80K are in typical use. Thus, the entire English language lexis, full decorated with parts-of-speech, a complete set of LG disjuncts, and some decent portion of the WordNet word-sense database, and taking into account that Atoms are really quite large creatures -- all this should still fit comfortably into 4-16GB of RAM. Thus, queries such as the above are easily run fully in-RAM, and would not typically require disk access.

Example binary and n-ary relationships

Taken from Cosmo Harrigan's opencog-neo4j project. It is inline with Hendy's July 2014 proposal for Neo4j implementation of AtomSpace.

Example #1: A simple binary relation

Robot_1 eats batteries.

AtomSpace syntax:

(EvaluationLink
  (PredicateNode "eats")
  (ListLink
    (ConceptNode "Robot_1")
    (ConceptNode "Batteries")))

Create the nodes and relationships:

CREATE
  (n1:Concept {name: "Robot_1"}),
  (n2:Concept {name: "Batteries"}),
  (n1)-[:EATS]->(n2);

Ask: What eats batteries?

MATCH
  (n:Concept)-[:EATS]->(b:Concept {name: "Batteries"})
RETURN
  (n);

Example #2: An n-ary relation

Zeno gave Einstein a battery.

gave(Zeno, Battery, Einstein)

AtomSpace syntax:

(EvaluationLink
  (PredicateNode "gave")
  (ListLink
    (ConceptNode "Zeno")
    (ConceptNode "Battery")
    (ConceptNode "Einstein")))

Create the nodes:

CREATE
  (n1:Relation {type: "GivingEvent"}),
  (n2:Concept {name: "Zeno"}),
  (n3:Concept {name: "Battery"}),
  (n4:Concept {name: "Einstein"}),
  (n2)-[:GIVER]->(n1),
  (n3)-[:OBJECT]->(n1),
  (n4)-[:RECEIVER]->(n1);

Ask: What has been given?

MATCH
  (n1:Relation {type: "GivingEvent"})<-[r]-(n2)
RETURN
  n1, r, n2;

Ask: Did anyone give a battery to anyone else?

MATCH
  (n1:Relation {type: "GivingEvent"}),
  (n2:Concept)-[:GIVER]->(n1),
  (n3:Concept {name: "Battery"})-[:OBJECT]->(n1),
  (n4:Concept)-[:RECEIVER]->(n1)
RETURN
  n2, n4;

Defining an Appropriate Subset of Pattern Matching Queries

I note that the above two example queries could both be easily composed from query operators of the form:

FOLLOW-CHAINS-FROM: Follow a specified type of chain from a source Atom: "Find Atoms that are reachable from Atom X by following a particular type of chain" (e.g. "Find X1 that is reachable from "Adjective" by finding X so that (Inheritance X Adjective), and then X1 so that (Inheritance X X1)" or "Find W1 that is reachable from "Pig" by (Inheritance W1 Pig)" )

Of course, this is exactly what the pattern matcher does. This is called an "upward" movement in the patttern matcher source code. The term "FOLLOW-CHAINS-FROM" does not appear in n the source code.

FOLLOW-CHAINS-BETWEEN: Given two Atom lists L1 and L1, find X1 in L1 and X2 in L2, for which there is a particular type of chain between X1 and X2. (e.g. for which "EvaluationLink "before" X1 X2" holds)

This is the other very common, central step in the pattern matcher: after grounding an atom, it pivots to examine other atoms attached to that ground. The term "FOLLOW-CHAINS-BETWEEN" does not appear in n the source code.

I note also that, in both of the examples I gave above, the chains are quite short.

Using Neo4j as a Backing Store

Neo4j has a query language called Cypher, and one can execute Cypher queries e.g. from Java code. One approach to interact with a Neo4j instance from a C++ program (like OpenCog) would seem to be to use the Neo4j REST API. Another approach would be to use LightSocket or some more recent analogue.

A key aspect of Neo4j querying is what are called "traversals" (see Neo4j Java API for Traversals). What I have crudely called FOLLOW-CHAINS-FROM above, could straightforwardly be implemented in Neo4j as a Traversal...

For optimizing queries, Neo4j allows for custom indexing (see also creating indexes). Obvious initial candidates for custom indexes are:

  • Atom name
  • Atom type
  • Link target type

For handling time and space queries, a little further down the road, note that Neo4j has extensions for spatial indexing already. There is some code that seems to handle efficient temporal indexing as well.

Implementation Strategy

This section contains some first brainstormy ideas about how to effectively create a Neo4j BackingStore for OpenCog.

Emphatically, the above general ideas are not tied to the specific, first-draft implementation suggestions made in this section.

One approach would be to:

  • Write code translating Atoms into Neo4j nodes/links (which is relatively straightforward; we're just mapping hypergraphs into graphs here, and the typical arity of the hyperlinks is not so large)
  • Write code that initializes a Neo4j instance for use as an OpenCog backing store (basically, this just means creating some custom indexes)
  • Extend the BackingStore interface to allow single-clause, single-variable PatternMatcher type queries. Because of thier simplicity, these perhaps might be able to run fairly quickly.
  • Extend the BackingStore interface to allow multi-clause, multi-variable PatternMatch requests. These would need to be somehow translated into equivalent FOLLOW-CHAINS-FROM and FOLLOW-CHAINS-BETWEEN as mentioned above [more details on this below].
  • Create a C++ Neo4jProxy, within the OpenCog codebase, connected to the GraphBackingStore interface, and containing custom C++ code that submits appropriate Cypher queries to Neo4j

Specifying chain-type queries

How would inputs to queries of type FOLLOW-CHAINS-FROM and FOLLOW-CHAINS-BETWEEN be specified?

I think one can specify them similarly to Pattern Matcher queries -- but with restrictions on what patterns can be submitted for matching. The pattern matcher was designed to support a very general form of query; the above primitives do not allow the full generality, and cannot do everything that the pattern matcher can do. Thus, providing this as a layer under the general API introduces a number of difficulties and design issues. To explain this will take a bit of work though.

Specifying queries as lists of Atoms

First I'll clarify how to express the suggested query types as lists of Atoms.

For FOLLOW-CHAINS-FROM, the arguments would be

  • an Atom comprising the source
  • a series of Atom-expressions containing one or two variables each. Each expression in the series must share some variable with the expression coming before it in the series, except the first expression in the series, which must contain the source Atom.

For example:

SOURCE: "adjective"

SERIES
entry 1: 
Inheritance $X (ConceptNode "adjective")

entry 2:
Inheritance $X1 (WordNode $X)


For FOLLOW-CHAINS-BETWEEN, arguments would include

  • Source1
  • Series1
  • Source2
  • Series2

for example


SOURCE1: "adjective"

SERIES1:
entry 1: 
Inheritance $X (ConceptNode "adjective")

entry 2:
Inheritance $X1 (WordNode $X)

SOURCE2: "pig"

SERIES2:
entry 1: 
Inheritance 
		WordInstanceNode $W1
		WordNode "Pig"

entry 2:
Evaluation
     PredicateNode "before"
     List
        $X1
        $W1


This corresponds to the NLP example given above.

Packaging Queries as Single Atoms

The above proposed query formats are transparent and ally closely to the Neo4j traversal framework. However, from an OpenCog view, it would be nicer to submit queries as single Atoms, similar to how the Pattern Matcher does it.

For instance, what if we submitted the above example NLP query as the following pattern to be matched:

(ANDLink
   (ANDLink
      (Inheritance
        $X
        (ConceptNode "adjective"))
      (Inheritance 
        $X1
        (WordNode $X)))
   (ANDLink
       (Inheritance 
		 (WordInstanceNode $W1)
		 (WordNode "Pig"))
       (Evaluation
         (PredicateNode "before")
         (List
           $X1
           $W1))))


Note, this form could be supplied directly to OpenCog's Pattern Matcher for matching. It could also be supplied directly to the PLN backward chainer for inference.

However, if decoded properly, it can also be interpreted as a FOLLOW-CHAINS-BETWEEN query:

  • The first embedded ANDLink is the first chain, and the second embedded ANDLink is the second chain.
  • The first named Atom in each chain, is the source of that chain

The biology example given above is even simpler; the query

(ANDLink
   (Inheritance 
       $X 
       (ConceptNode "drug"))
   (Evaluation
      (PredicateNode "dock")
      (List
          $X
          (ProteinNode "FKH1"))))

could be interpreted straightforwardly as FOLLOW-CHAINS-FROM query defined by a series of two links with source "drug".

Of course, one could write sequences of ANDLinks that did not neatly decode into chains (in terms of the number of variables, the ways variables are shared between Atoms in the sequence, etc.). So if the BackingStore API were allowed to accept queries of this form, it would need to check whether the queries could in fact be neatly turned into FOLLOW-CHAINS-FROM or FOLLOW-CHAINS-BETWEEN queries. If not, then

  • A "slow backing store query" warning can be issued.
  • The query can be resolved the fast way: Just grab the needed incoming sets, stick them in RAM, and run the query using the regular pattern matcher.

To make this more reasonable, it might be good if a BackingStore pattern-match query came with an optional "effort tolerance" parameter. This would then halt a query-processing initiative if the amount of work exceeded the specified tolerance, and return the best result obtained so far (no matter how bad).

Exploring HGDB as a Backing Store

Neo4j is the leading graph database at present and thus seems a natural place to start. However, it is not the only graph database out there. HGDB (HypergraphDB), also seems potentially appropriate as an OpenCog backing store, and is worth exploring.

Fortunately, most of the work described above would be applicable to HGDB as well as to Neo4j.

Essentially, any graph DB with capability of efficiently doing traversals, and with the capability for custom indexes, should be able to be used basically the same way as Neo4j as a backing store. I.e. if we make a GraphBackingStore interface, then it should be usable with a Neo4jProxy or a HGDBProxy without a lot of extra work. Of course the work to build custom indexes within HGDB would need to be done but that's not likely a lot of effort.

(OTOH, a backing store that is not a graph DB (e.g. a triplestore) would require a different sort of interface, say a TripleBackingStore that extends BackingStore in a different way than GraphBackingStore.)

It seems it would be hard to test Neo4j vs. HGDB in a really meaningful way without doing most of the work for integrating both of the tools with OpenCog. I.e. to compare the two effectively, we would need to put complex AtomSpaces into both of them, create custom indices in both of them, submit complex queries to both of them and see how they perform.