SatisfactionLink and QueryLink

From OpenCog
Jump to: navigation, search

SatisfactionLinks, and their special cases QueryLink, BindLink, MeetLink and GetLink are used to specify a search pattern that can be grounded, solved or satisfied. Patterns consist of a set of clauses, with each clause containing one or more variables. When executed, the pattern matcher attempts to find groundings for the variables; that is, it attempts to find values for the variables such that the resulting graph exists in the AtomSpace. Roughly speaking, the pattern matcher performs a "fill in the blanks" type query for graphs. It implements a full-fledged, complete query language for graphs.

This page documents the workings of several related link types. The key differences between these link types are as follows:

  • SatisfactionLink evaluates to a TruthValue (true or false) indicating if the pattern was found (or not). That is, it answers the question "is this pattern satisfiable?"
  • MeetLink returns a set of groundings that were found for each variable.
  • GetLink is identical to MeetLink, except it returns the results wrapped in a SetLink.
  • QueryLink performs graph rewriting: after finding a match for a pattern, it uses the results to create a new pattern.
  • BindLink is identical to QueryLink, except it returns the results wrapped in a SetLink.

Closely related are two other link types:


Introduction

The concept of pattern matching is most easily understood as a means of solving "fill-in-the-blank" problems. For example, if the atomspace contains the sentence "Sally threw a ball.", then the question "Sally threw a ____." is solved by "ball", with the blank being the variable, and "ball" its grounding.

The concept of variable binding is most easily understood as a means of specifying "fill-in-the-blank" problems. For the above example, a slightly more formal specification is "$var is a variable and Sally threw a $var". Even more formally, "lambda $var . (Sally threw a $var)". The goal of binding is simply to identify the variables. So, for example, in the expression "lambda $var . ($name threw a $var)", the node $name is not bound: its free, and the pattern matcher will not treat it as a variable (a blank to be filled in) despite it's suggestive form $name. In this example, unbound variables behave like constants.

Satisfiability

SatisfactionLinks evaluate to a truth value, indicating if the pattern is satisfiable or not. MeetLinks, when executed, return the satisfying set. Thus, if a pattern is satisfiable, the satisfying set will be non-empty; the satisfying set is empty if and only if the pattern is not satisfiable. In terms of performance, checking satisfiability is faster than trying to find all elements of the satisfying set.

The basic structure of a SatisfactionLink and a MeetLink is

SatisfactionLink
   <variable-decls>  (optional)
   <pattern-to-be-matched>

where <pattern-to-be-matched> has the form

 <yoke-link>
     <clause 1>
     <clause 2>
     <clause 3>
     ...

The <yoke-link> must be one of either AndLink, SequentialAndLink or ChoiceLink. In the first two cases, all of the clauses must be satsified, in order for the whole pattern to be satisfied. In the ChoiceLink case, at least one clause must be satisfied.

Individual clauses can be satisfied in several ways; these can be combined and mixed together. Most commonly, one searches for clauses that are present in the AtomSpace; this can be made explicit with the PresentLink, although the matcher implicitly assumes that presence is the criterion. Clauses can be forced to be absent, using the AbsentLink. A third way of obtaining satisfaction is when a clause can be evaluated to be true or false. An example of this would be the GreaterThanLink, which is satisfied only when any variables appearing in any formulas are grounded in such a way that the expression is validly "greater than".

By default, any clause that is not 'evaluatable' is treated as a 'pattern' which must be present. Pattern clauses will have variables in them, and the pattern matcher will search the atomspace for a matching graph which grounds the variables.

An 'evaluatable' clause is a clause that consists of a combination of zero or more AndLinks, SequentialAndLinks, OrLinks, NotLinks, which are used to join together one or more 'evaluatable' links. Examples of 'evaluatable' links include GreaterThanLink, EqualLink, and many forms of EvaluationLinks. Evaluatable clauses may contain variables, but they don't have to; they will be evaluated anyway, if prior clauses were satisfied.

When an 'evaluatable' clause is encountered, it is evaluated, and a crisp true or false value is obtained. A clause that evaluates to 'true' is considered to be satisfied.

EvaluationLinks with GroundedPredicateNodes (GPN's) in them are "black boxes"; it is not easy to know what will happen when the GPN is evaluated. Thus, several "clear-box" forms are provided: the EqualLink, the IdenticalLink, the GreaterThanLink and a few others (AlphaEqualLink ...). The EqualLink evaluates to true when it's two arguments are in fact the same atom (after grounding). The GreaterThanLink evaluates to true if it's two arguments evaluate to NumberNodes or FloatValues after grounding, and the first is greater than the second.

The default pattern match callback treats the results of evaluating a GPN as a crisp boolean yes/no true/false value. Thus, it also allows black-box and clear-box GPN's to be combined with NotLink, OrLink and AndLink to define an arbitrarily complicated clause.

SatisfactionLinks can be evaluated using the cog-evaluate! function. MeetLinks can be executed with the cog-execute! function. The difference between the two is that the first returns a truth value, indicating whether satisfaction succeeded or not. The second returns the list of atoms that form the satisfying set.

Observe that MeetLink is conceptually very similar to SatisfyingSetLink. The difference is that the former is imperative, while the later is declarative. That is, when the cog-execute! function encounters a MeetLink, it takes it as an imperative command: 'you must perform a search, now', yielding a set of atoms that satisfies the given pattern (the given query). By contrast, SatisfyingSetLink is a declaration: 'here is a thing, and the set of things that satisfy it'. Attached to the SatisfyingSetLink is a TruthValue that indicates the truthiness of the link. The SatisfyingSetlink is neither evaluated or executed; it simply exists, just a static declaration.

See link comparison for other related links.

Term re-writing

The QueryLink resembles an ImplicationLink in its form. That is, it expresses the idea "if P then Q". The difference between a QueryLink and an ImplicationLink is that the QueryLink is imperative, whereas the ImplicationLink is declarative. In essence, the QueryLink says "you must search for all P now, and then create the consequent Q", whereas the ImplicationLink is a declaration "when P holds, then Q does too".

The QueryLink is a 2-ary or a 3-ary link, containing an optional variable declaration, followed by a pattern, followed by an implicand (consequent). If it is 3-ary, then only the explicitly-declared variables become bound; otherwise, all variables in the body become implicitly bound.

These QueryLink is treated as a graph re-write rule by the pattern matcher. That is, if the graph P (the antecedent) is found, then the graph Q (the consequent or implicand) is instantiated in the AtomSpace. If the graph Q is an ExecutionOutputLink, then the specified scheme/python/C++/etc. routine is executed, with the rest of the ExecutionOutputLink contents passed as arguments. Conceptually, the pattern matcher acts to apply rules, specified as QueryLinks, to the contents of the AtomSpace. It implements a single step of a forward chainer.

The general form of a graph rewrite rule, specified using a QueryLink is:

 QueryLink
    $variable_declarations  (optional)
    P
    Q1
    Q2
    ...
    Qn

The expression or predicate P is the pattern to be matched. The variables appearing in P are listed in the $variable_declarations. If a match to P is found, the variables are grounded with their matching atoms. The graphs Q1, Q2, ..., Qn are then created, plugging in the groundings into the variables in the Q 's. One set of graphs Q1..Qn are created for each match to P. By default, the matcher runs until all possible matches have been found. A list of all of the created Q's is then returned by the cog-execute! function.

One useful form is having Q be an ExecutionOutputLink, that is, having the form

 QueryLink
    $variable_declarations    (optional)
    P
    ExecutionOutputLink
       GroundedSchemaNode "some-function"
       ListLink
          A_1
          A_2
          ...

In this case, the various A_1, A_2, etc. are created for each match, and are then passed as the first, second, ... arguments to a function "some-function". This function, written in scheme or python (or, potentially, C++ or another language), can then perform any action whatsoever. Such forms are typically used to drive external systems, such as motor controls or other devices.

Passing the above structure to the cog-execute! function will cause the entire AtomSpace to be searched for the predicate P, and, if found, cause the implication to occur. (The actual implementation uses tricks to quickly narrow the search; although conceptually "the whole atomspace" is searched, in practice, its much faster than that.) A generic graph re-writing system can be implemented simply by repeatedly calling cog-execute! with different rules and formulas, until some desired result is achieved.

A detailed presentation of these steps, developed through examples, is given below. A master set of examples can be found in examples.

Formal definition

The basic structure of a QueryLink is

QueryLink
   <variable-decls>  (optional)
   <pattern-to-be-matched>
   <created-pattern-1>
   ...
   <created-pattern-n>

The <variable-decls> may be a single VariableNode, or a VariableList of VariableNodes. The allowed type range of a of a VariableNode may be restricted by using a TypedVariableLink together with a TypeNode; see examples below.

The <pattern-to-be-matched> atom may be any atom or link that can be interpreted as being true or false. It is usually (but not always) an AndLink (possibly with nested ChoiceLinks, NotLinks, AbsentLinks, and so on) specifying a set of pattern clauses that might be grounded in the atomspace. More generally, it can be any predicate bearing a truth value that can be interpreted as "accept" or "reject"; for example, it could be an EvaluationLink with a GroundedPredicateNode in it.

The <pattern-to-be-matched> must contain the VariableNodes appearing in the <variable-decls> section. If the <pattern-to-be-matched> evaluates to true (typically, because the pattern is found in the atomspace), then the <created-pattern> will be instantiated in the atomspace.

Variable declarations

Below follow a series of examples showing how variable declarations are to be made. Variables may freely range over all types, or may be restricted to have only certain types.

The following examples can be run in the scheme shell, with the cog-execute! command, as follows:

guile> (cog-execute! h)

where h is an QueryLink that was previously defined. This will return results, provided that there is matching data to be found in the AtomSpace; otherwise, an empty SetLink is returned.

Single Variable

The following example shows the use of a single variable declaration to find all InheritanceLinks of a given structure. Resulting matches are then rewritten, using all groundings of the VariableNode that were found. The returned values are wrapped in a SetLink. Note that the returned values are placed in the AtomSpace, where they stay 'forever', until deleted.

This example can be run from the scheme shell:

; Define a hypergraph called "human-implies-animal"
(define human-implies-animal
  (Query
     (Variable "$H")
     ; This is the pattern that will be matched ...
     (Inheritance
        (Variable "$H")
        (Concept "human")
     )
     ; This is the hypergraph that will be created if the 
     ; above pattern is found.
     (Inheritance
        (Variable "$H")
        (Concept "animal"))))

; Some data to populate the atomspace:
(Inheritance
  (Concept "Linas")
  (Concept "human"))

; Run the actual implication
(cog-execute! human-implies-animal)

The result of running the above will produce the following output:

(Set (Inheritance (Concept "Linas") (Concept "animal")))

Multiple Variables

Multiple variables may be specified using a VariableList. So, for example, if a father has a child, then the child has that person as a father:

(Bind
   (VariableList
      (Variable "$F")
      (Variable "$C"))
   (Evaluation
      (Predicate "father of")
      (List
         (Variable "$F")
         (Variable $C")))
   (Evaluation
      (Predicate "child of")
      (List
         (Variable "$C")
         (Variable "$F"))))

If there is a Variable that is not listed in the VariableList, then that Variable is taken to be free, and it will NOT be grounded. If the VariableList is completely absent, then all Variables found in the body are bound (and so, grunded during the search.)

Variable Types

It can be very convenient to restrict variable types, that is, to define a signature. For example, the set of reported groundings can be restricted to so that the results are of a given type. So, for example, one can limit $F and $C to be ConceptNodes:

(Bind
   (VariableList
      (TypedVariable
         (VariableNode "$F")
         (Type 'ConceptNode))
      (TypedVariable
         (Variable "$C")
         (Type 'ConceptNode)))
   (Evaluation
      (Predicate "father of")
      (List
         (Variable "$F")
         (Variable "$C")))
   (Evaluation
      (Predicate "child of")
      (List
         (Variable "$C")
         (Variable "$F"))))

Multiple types

The TypedVariableLink can enumerate several allowed types:

(TypedVariable
   (Variable "$G")
   (TypeChoice
      (Type 'ConceptNode)
      (Type 'PredicateNode)))

The TypeChoice link is interpreted as menu choice: during the search, the variable $G can be grounded by a ConceptNode or a PredicateNode. Ranges of types can be specified with the TypeInhNode and the TypeCoInhNode, and further combined with the TypeSet to specify more complex types. Atomese has a fairly rich type system to draw upon.

Globbing variables

A GlobNode can be used to match zero or more atoms in a list. For example:

(Get
   (VariableList
      (Glob "start")
      (Glob "finish"))
   (List
       (Glob "start")
       (Word "loves")
       (Glob "finish")))

will find groundings for any of the following :

(List (Word "she") (Word "loves") (Word "me"))
(List (Word "she") (Word "loves") (Word "me") (Word "not"))
(List (Word "I") (Word "think") (Word "she") (Word "loves") (Word "me"))

A Variable is just a glob which is forced to be of length exactly one.

Anchors

By default, executing a MeetLink or QueryLink causes the search results to be wrapped in a SetLink, which is returned to the caller. This is not always a desirable behavior: the SetLinks pollute the AtomSpace, and have to be manually removed if they are no longer needed. A different issue is that one must wait for the complete set of results; this can be problematic for long-running searches over large atomspaces. Both of these problems can be avoided by using the AnchorNode in the variable specification section. Results are then chained to the Anchor, using MemberLinks. For example:

  (Get
     (VariableList
        (Variable "$x")
        (Anchor "search results"))
     (Present
        (Evaluation (Predicate "foo")
           (List (Variable "$x") (Concept "bar"))))

will create a MemberLink to the (Anchor "search results") for each grounding that is found. No SetLink is created. See examples/pattern-matcher/anchor.scm for a working example.

Complex Searches

Below are a variety of more complex examples.

Optional and rejected subpatterns

The pattern matcher can detect optional clauses, and perform alternative handling when these are discovered. This can be done with the AbsentLink. (There is also a corresponding PresentLink; unless otherwise specified, clauses are assumed to be wrapped in a PresentLink.)

Optional clauses are used to reject a pattern whenever a certain sub-pattern appears with it. For example: "Find all swans that are not white". In this case, the color attribute stating "whiteness" must be absent, in order for the pattern to be found. For example:

(Get
   (Variable "$swan-instance")
   (And
      (Present (Inheritance (Variable "$swan-instance") (Concept "swan")))
      (Absent (Evaluation (Predicate "color")
           (List (Variable "$swan-instance") (Concept "white"))))))

will find the following:

 (Inheritance (Concept "Bobby") (Concept "swan"))
 (Evaluation (Predicate "color") (List (Concept "Bobby") (Concept "black")))

but not the following:

 (Inheritance (Concept "Sue") (Concept "swan"))
 (Evaluation (Predicate "color") (List (Concept "Sue") (Concept "white")))

Mandatory (for-all) patterns

One can search for patterns that always have a certain invariant sub-pattern in them, using the AlwaysLink. This is the pattern-matcher-centric version of the "ForAll" concept from first-order logic, requiring that all groundings satisfy a subpattern. The AlwaysLink page shows an example of finding all baskets that have only red-colored balls in them.

Quoting

One can search for expressions that contain VariableNodes, or GroundedPredicateNodes, without binding or evaluating them, by using the QuoteLink to escape the subgraph.

MeetLink, PutLink

QueryLink is completely equivalent to a certain combination of two other links: MeetLink and PutLink.

Satisfiability is about taking an expression that contains variables, and discovering values for those variables, that renders the expression "true". The MeetLink is able to find all of the atoms that satisfy a pattern.

Graph re-writing is about taking an existing graph, and creating a new graph derived from it. This is what PutLink does. The PutLink is a beta redex: when it is executed, a new graph is created, with values substituted for variables. That is, executing a PutLink causes beta reduction to occur. Thus, the MeetLink finds possible groundings, while the PutLink then uses those groundings as values to replace variables with the values that MeetLink found.

Relation to other systems

CoRelational Model

The paper A co-Relational Model of Data for Large Shared Data Banks by Erik Meijer and Gavin Bierman demonstrates how SQL and noSQL are category-theoretic duals. The notation used there corresponds to the following:

  • QueryLink corresponds to "SelectMany", and it is a monad.
  • The AtomSpace is the functor of the monad.
  • The MeetLink corresponds to the SQL function (with P the pattern predicate)
  • The PutLink corresponds to the SQL function (with F the re-write function).

The thing that makes QueryLink different and unique is that it, too, is an element of the atomspace, rather than an abstract theoretical construction outside of the database.

Apache Gremlin Language/Tinkerpop

The Apache TinkerPop system has the Gremlin Language for defining a graph traversal. A Gremlin traversal is very much like a single-clause MeetLink. There is no concept of a multi-clause traversal in Gremlin. There are many obvious and important differences:

  • Tinkerpop edges always have two vertexes, one at each end of the edge. Atomspace edges (aka Links) can have an arbitrary number of vertexes in them, and can also contain other edges. Tinkerpop edges are untyped; AtomSpace edges are typed.
  • Gremlin queries are compiled down to a bytecode, and can be distributed to OLAP or OLTP systems, where the actual query is performed. There is (currently) no such offloading ability in the atomspace.

See atomspace issue #1874 for status on tinkerpop compatibility ideas. See atomspace issue #1893 for some "syntactic sugar" to make tinkerpop-like traversals less ugly.