SatisfactionLink and BindLink
SatisfactionLinks, BindLinks and GetLinks are used to specify a search pattern that can be grounded (solved) with the pattern matcher. Patterns consist of a set of clauses, with each clause containing one or more variables. The pattern matcher attempts to find groundings for the variables; that is, it attempts to find values for the variables such that the resulting graph exists in the atomspace.
Search patterns are somewhat like programs written in Prolog or in answer set programming, in that the goal of pattern matching is to find Atoms that simultaneously satisfy all of the clauses in the pattern.
SatisfactionLinks and GetLinks are used to specify the pattern to be searched for. SatisfactionLinks evaluate to a truth value, indicating if the pattern is satisfiable or not. GetLinks return the actual satisfying set. A pattern is satisfiable if and only if the satisfying set is non-empty. BindLinks are used to perform graph-re-writing; BindLinks take each element of the satisfying set, and stuff it into a new, different graph.
By default, all of these link types search the entire atomspace for matching patterns (because the atomspace is indexed, this search can be very efficient). The MapLink is similar, except that it limits the search to its input arguments.
Note that pattern matching is dual to pattern recognition: so, while pattern matching searches for all data fitting a pattern, the dual search, of finding all patterns that fit a lump of data, is implemented with the DualLink.
- 1 Introduction
- 2 Variable Binding
- 3 Satisfiability
- 4 Rule evaluation
- 5 Formal definition
- 6 Variable declarations
- 7 Evaluation
- 8 Execution
- 9 GetLink, PutLink
- 10 Relation to other systems
The concept of pattern matching is most easily understood as a means of solving "fill-in-the-blank" problems. For example, if the atomspace contains the sentence "Sally threw a ball.", then the question "Sally threw a ____." is solved by "ball", with the blank being the variable, and "ball" its grounding.
The concept of variable binding is most easily understood as a means of specifying "fill-in-the-blank" problems. For the above example, a slightly more formal specification is "$var is a variable and Sally threw a $var". Even more formally, "lambda $var . (Sally threw a $var)". The goal of binding is simply to identify the variables. So, for example, in the expression "lambda $var . ($name threw a $var)", the node $name is not bound: its free, and the pattern matcher will not treat it as a variable (a blank to be filled in) despite it's suggestive form $name. In this example, unbound variables behave like constants.
The SatisfactionLink is a 1-ary or a 2-ary link. If it is 1-ary, then the body of the link specifies the pattern that is supposed to be searched for. If it is 2-ary, then the first member must be a variable declaration. When variables are explicitly declared, these become the "bound variables" that will be explicitly grounded. Otherwise, a search is peformed for the free variables in the body, and all of them become implicitly bound for the pattern search.
In other words, the variable declaration is optional; if it is present, then it is used to confine the search to only those variables.
The SatisfactionLink is similar to both the PresentLink and the ChoiceLink, except that these last two do NOT bind their variables. Thus, the latter two can be used to build complex expressions inside of a SatisfactionLink.
SatisfactionLinks evaluate to a truth value, indicating if the pattern is satisfiable or not. GetLinks, when executed, return the satisfying set. Thus, if a pattern is satisfiable, the satisfying set will be non-empty; the satisfying set is empty if and only if the pattern is not satisfiable. In terms of performance, checking satisfiability is faster than trying to find all elements of the satisfying set.
The basic structure of a SatisfactionLink and a GetLink is
SatisfactionLink <variable-decls> (optional) <pattern-to-be-matched>
where <pattern-to-be-matched> has the form
<yoke-link> <clause 1> <clause 2> <clause 3> ...
The <yoke-link> must be one of either AndLink, SequentialAndLink or ChoiceLink. In the first two cases, all of the clauses must be satsified, in order for the whole pattern to be satisfied. In the ChoiceLink case, at least one clause must be satisfied.
Individual clauses can be satisfied in one of two ways; both ways can be combined and mixed together. One way is for a clause to specify a search-pattern; the other way is if a clause can be interpreted as being true/false-valued. We'll call the former 'pattern' clauses, and call the latter 'evaluatable' clauses, for now (details may change).
A 'pattern' clause is any clause that is not 'evaluatable'. Pattern clauses will have variables in them, and the pattern matcher will search the atomspace to be searched for a matching graph which grounds the variables.
An 'evaluatable' clause is a clause that consists of a combination of zero or more AndLinks, SequentialAndLinks, OrLinks, NotLinks, which are used to join together one or more 'evaluatable' links. Examples of 'evaluatable' links include GreaterThanLink, EqualLink, AbsentLink, and any EvaluationLink that contains a GroundedPredicateNode (GPN). Note that evaluatable clauses may contain variables, but they don't have to; they will be evaluated anyway, if prior clauses were satisfied.
When an 'evaluatable' clause is encountered, it is evaluated, and a crisp true or false value is obtained. A clause that evaluates to 'true' is considered to be satsified.
EvaluationLinks with GPN's in them are "black boxes"; it is not easy to know what will happen when the GPN is evaluated. Thus, several "clear-box" forms are provided: the EqualLink and the GreaterThanLink. The EqualLink evaluates to true when it's two arguments are in fact the same atom (after grounding). The GreaterThanLink evaluates to true if it's two arguments evaluate to NumberNodes after grounding and evaluation, the first is great than the second. The AbsentLink evaluates to true only if the term that it wraps is absent from the atomspace during grounding.
The default pattern match callback treats the results of evaluating a GPN as a crisp boolean yes/no true/false value. Thus, it also allows black-box and clear-box GPN's to be combined with NotLink, OrLink and AndLink to define an arbitrarily complicated clause. Other non-default callbacks may combine GPN's in other ways.
The variable groundings that lead to satisfaction can be found under a special key on the SatisfactionLink. See /examples/pattern-matcher/satisfaction.scm for an example.
SatisfactionLinks can be evaluated using the cog-evaluate! function. GetLinks can be executed with the cog-execute! function. The difference between the two is that the first returns a truth value, indicating whether satisfaction succeeded or not. The second returns the list of atoms that form the satisfying set.
Observe that GetLink is conceptually very similar to SatisfyingSetLink. The difference is that the former is imperative, while the later is declarative. That is, when the cog-execute! function encounters a GetLink, it takes it as an imperative command: 'you must perform a search, now', yielding a set of atoms that satisfies the given pattern (the given query). By contrast, SatisfyingSetLink is a declaration: 'here is a thing, and the set of things that satisfy it'. Attached to the SatisfyingSetLink is a TruthValue that indicates the truthiness of the link. The SatisfyingSetlink is neither evaluated or executed; it simply exists, just a static declaration.
See link comparison for other related links.
The BindLink resembles an ImplicationLink in its form. That is, it expresses the idea "if P then Q". The difference between a BindLink and an ImplicationLink is that the BindLink is imperative, whereas the ImplicationLink is declarative. In essence, the BindLink says "you must search for all P now, and then create the consequent Q", whereas the ImplicationLink is a declaration "when P holds, then Q does too".
The BindLink is a 2-ary or a 3-ary link, containing an optional variable declaration, followed by a pattern, followed by an implicand (consequent). If it is 3-ary, then only the explicitly-declared variables become bound; otherwise, all variables in the body become implicitly bound.
These BindLink is treated as a graph re-write rule by the pattern matcher. That is, if the graph P (the antecedent) is found, then the graph Q (the consequent or implicand) is instantiated in the AtomSpace. If the graph Q is an ExecutionOutputLink, then the specified scheme/python/C++/etc. routine is executed, with the rest of the ExecutionOutputLink contents passed as arguments. Conceptually, the pattern matcher acts to apply rules, specified as BindLinks, to the contents of the AtomSpace. It implements a single step of a forward chainer.
The general form of a graph rewrite rule, specified using a BindLink is:
BindLink $variable_declarations (optional) P Q
The expression or predicate P is the pattern to be matched. The variables appearing in P are listed in the $variable_declarations. If a match to P is found, the variables are grounded with their matching values. The graph Q, is then created (with the grounded values). One graph Q is created for each match to P. By default, the matcher runs until all possible matches have been found. A list of all of the created Q's is then returned by the cog-bind function.
A particularly useful form is having Q be an ExecutionOutputLink, that is, having the form
BindLink $variable_declarations (optional) P ExecutionOutputLink GroundedSchemaNode "some-function" ListLink Q_1 Q_2 ...
In this case, the various Q_1, Q_2, etc. are created for each match, and are then passed as the first, second, ... arguments to a function "some-function". This function, written in scheme or python (or, potentially, C++ or another language), can then perform any action whatsoever. Typically, it will adjust and alter truth values and attention values as desired, although it can perform any action at all.
Passing the above structure to the cog-bind function will cause the entire AtomSpace to be searched for the predicate P, and, if found, cause the implication to occur. A generic graph re-writing system can be implemented simply by repeatedly calling cog-bind with different rules and formulas, until some desired result is achieved.
A detailed presentation of these steps, developed through examples, is given below.
The basic structure of a BindLink is
BindLink <variable-decls> (optional) <pattern-to-be-matched> <created-pattern>
The <variable-decls> may be a single VariableNode, or a VariableList of VariableNodes. The allowed type range of a of a VariableNode may be restricted by using a TypedVariableLink together with a TypeNode; see examples below.
The <pattern-to-be-matched> atom may be any atom or link that can be interpreted as being true or false. It is usually (but not always) an AndLink (possibly with nested ChoiceLinks, NotLinks, AbsentLinks, and so on) specifying a set of pattern clauses that might be grounded in the atomspace. More generally, it can be any predicate bearing a truth value that can be interpreted as "accept" or "reject"; for example, it could be an EvaluationLink with a GroundedPredicateNode in it.
The <pattern-to-be-matched> must contain the VariableNodes appearing in the <variable-decls> section. If the <pattern-to-be-matched> evaluates to true (typically, because the pattern is found in the atomspace), then the <created-pattern> will be instantiated in the atomspace.
Below follow a series of examples showing how variable declarations are to be made. Variables may freely range over all types, or may be restricted to have only certain types.
The following shows the quantifier ForAllLink with a single variable, expressing "All humans are animals":
ForAllLink VariableNode $H ImplicationLink InheritanceLink $H ConceptNode human InheritanceLink $H ConceptNode animal
Caution: Currently, the pattern matcher does not recognize or treat ForAllLink's in any special way. Thus, while the above shows how, in principle, one could write quantified logical expression in OpenCog, there is currently no mechanism or function within OpenCog that does anything with ForAllLinks. NOTE: The previous C++ version of PLN, written by Ari Heljakka, Joel Pitt etc., did contain special logic for handling ForAllLinks. I (Ben G) am unclear how they are handled in the current (April 2013) python PLN code, but obviously their logic should be added to the current PLN implementation.
Multiple variables may be specified using a VariableList. So, for example, if a father has a child, then the child has that person as a father:
BindLink VariableList VariableNode $F VariableNode $C EvaluationLink PredicateNode fatherOf ListLink $F $C EvaluationLink PredicateNode childOf ListLink $C $F
Note that, if a VariableList is empty and their is a VaraibleNode in any of the clauses then the VariableNode will be quoted, that is, it won't be grounded.
It can be very convenient to restrict variable types, that is, to define a signature. For example, the above may be too general, but would hold if $F and $C were ConceptNodes:
BindLink VariableList TypedVariableLink VariableNode $F TypeNode "ConceptNode" TypedVariableLink VariableNode $C TypeNode "ConceptNode" EvaluationLink PredicateNode fatherOf ListLink $F $C EvaluationLink PredicateNode childOf List $C $F
The TypedVariableLink can enumerate several allowed types:
TypedVariableLink VariableNode $G TypeChoice TypeNode "ConceptNode" TypeNode "PredicateNode"
In principle, BindLinks may be nested, although, in practice, this is not currently supported. See github issue #1458 for current implementation status.
For example, in the following, $F and $C are visible within the ExistsLink, but $G is not visible in the outer BindLink:
So, "A father usually plays some game with his child." can be expressed as
BindLink VariableList TypedVariableLink VariableNode $F TypeNode "ConceptNode" TypedVariableLink VariableNode $C TypeNode "ConceptNode" EvaluationLink PredicateNode fatherOf List $F $C ExistsLink TypedVariableLink VariableNode $G TypeNode "ConceptNode" ANDLink InheritanceLink VariableNode $G ConceptNode game EvaluationLink PredicateNode playTogether ListLink $F $C $G
Caution: One can certainly write a hypergraph as above, but it is not treated as a nested expression by the pattern matcher. Upon finding an appropriate 'fatherOf' relation, the pattern matcher will simply instantiate an ExistsLink with the appropriate substitutions for the variables. It will not apply or evaluate the second, nested ExistsLink. This behavior could be changed, but would require additional discussion and deliberation before such a change could be made.
In principle, the expression specifying a variable type could be an equation, such as:
TypedVariableLink VariableNode $F TypeNode $T TypedVariableLink VariableNode $C TypeNode $T TypedVariableLink VariableNode $T TypeNode "Atom"
which says that $F and $C must take the same type, but doesn't specify what type that is.
At this time, type equations are not supported by the query code in opencog/query, and thus are not available to the pattern matcher.
OpenCog has multiple evaluators. There are several variants/implementations of PLN, none of which are fully functional at this time. It also has the pattern matcher. The pattern matcher is a completely general, user-definable rule evaluation system. It can be used to find any given subgraph occurring in the AtomSpace, and to perform any given action as a result. This can be done by using scheme, C++ or python, or any mixture of these. The remainder of this article is focused on the pattern matcher.
The pattern matcher consists of several components that have been hooked together to implement a single-step forward chainer (graph re-write rule applier). At the core is a component that is able to compare one hypergraph to another, and find a match, if one exists. If one (or more) matches are found, then the corresponding variables in the match are grounded. This component is coupled to another component, the 'initializer' or 'traverser', which traverses the entire AtomSpace. Used together, these two can locate all matching patterns in the AtomSpace. A third component, the 'instantiator', "does something" with the resulting match. The 'something that it does' is to instantiate a graph, specifically, to instantiate the second half of an ImplicationLink (using the grounded values for the variables).
If desired, a capable C++ programmer can use the core matcher without having to use the initializer/traverser. For example, one could limit the sub-hypergraph search to some subset of the AtomSpace, instead of all of it. A capable C++ programmer could also avoid using the instantiator, and instead create some completely different callback, to do something completely different. All three of these components are implemented in the libquery module, with source code in /opencog/query. However, in essentially all situations, this kind of modification is not needed. The triumverate of these three components combined is sufficient to handle essentially all cases. No C++ code is required. If you think that this is not enough for your purposes, please contact User:Linas to discuss.
This triumverate pattern-matcher is able to accept BindLinks that specify "rules" (in the sense of a graph rewrite rule), and apply those rules to the AtomSpace. Each "rule" is an ImplicationLink, given the interpretation "if P then Q", where P and Q are hypergraphs. If the hypergraph P is matched, then the graph Q is instantiated. The graph Q may be any hypergraph; but if Q is an ExecutionOutputLink, then a match results in some program/function being executed. This is a very powerful feature, because an general, generic computation can be performed by the ExecutionOutputLink. This is documented in greater detail in a later section, below.
The rule-applier can be invoked from the scheme shell, with the cog-bind command, as follows:
guile> (cog-bind h)
where h is an BindLink that was previously defined.
In the example below, the pattern matcher will find all InheritanceLinks of the given structure, will perform the implication, and return a list of all possible groundings of the VariableNode. The returned values can be ignored: the implicand is instantiated in the AtomSpace, and stays there 'forever' (until deleted).
This example can be run from the scheme shell:
; Define a hypergraph called "human-implies-animal" (define human-implies-animal (BindLink (VariableNode "$H") ; This is the pattern that will be matched ... (InheritanceLink (VariableNode "$H") (ConceptNode "human") ) ; This is the hypergraph that will be created if the ; above pattern is found. (InheritanceLink (VariableNode "$H") (ConceptNode "animal")))) ; Some data to populate the atomspace: (InheritanceLink (stv 1 1) ; a non-zero truth value is needed! (ConceptNode "Linas") (ConceptNode "human")) ; Run the actual implication (cog-bind human-implies-animal)
The result of running the above will produce the following output:
(SetLink (InheritanceLink (ConceptNode "Linas") (ConceptNode "animal")))
The truth and attention values of the newly created hypergraph (the implicand) can be set by using the ExecutionOutputLink, as shown below.
Optional and rejected subpatterns
The pattern matcher can detect optional clauses, and perform alternative handling when these are discovered. This can be done either with the NotLink or the AbsentLink. When optional clauses are discovered, the PatternMatchCallback::optional_clause_match() method is called, instead of the usual PatternMatchCallback::clause_match() method. This callback can provide alternative handling of optional clauses.
One of the most useful applications of optional clauses is to reject a pattern whenever a certain sub-pattern appears with it. For example: "Find all swans that do not have an associated color attribute". In this case, the collor attribute must be absent, in order for the pattern to be found.
In other words: if you want to match some pattern P, but only if it does not contain some other sub-pattern R, then build the predicate as:
AndLink P AbsentLink R
The above will match P only if R is not present.
In order to reject a pattern with an inverted truth value, use the NotLink instead. This, when used with the (cog-bind-crisp) matcher, will reject patterns that have the negated clauses present, unless those clauses have a truth value strength less than 0.5.
In other words: if you want to match some pattern P, but only if the sub-pattern R is false or is absent, then build the predicate as:
AndLink P NotLink R
This will match P only if R is not present, or, if R is present, then R is false.
After finding a match, the second phase of the operation is an 'instantiation' or 'grounding' phase, which creates an instance of the target pattern, by substituting the grounded values for the variables. If an ExecutionOutputLink is encountered during this phase, it will be executed. Typically, ExecutionOutput links are used to hold formulas for computing truth values for the newly created instance.
This example shows how to perform a forward-chained inference, using a custom formula to set the truth value on the result.
; Define a formula that sets the truth value to a constant. (define (my-formula h) ; Some debug printing showing the input argument. (display "Hola! Tengo esta:\n") (display h) ; print the atom that we were given. (display "Eso es lo que tengo\n") (cog-set-tv! h (cog-new-stv 0.75 0.42)) ; change the truth value on it. h) ; return the handle that we were given. (define human-implies-animal (BindLink (VariableNode "$H") ; This is the pattern to match ... (InheritanceLink (VariableNode "$H") (ConceptNode "human")) ; This will be called if the above pattern is found. (ExecutionOutputLink (GroundedSchemaNode "scm: my-formula") ; This is the list of arguments to pass to the formula. ; Its just one argument, in this example. (ListLink (InheritanceLink (VariableNode "$H") (ConceptNode "animal")))))) ; Some data to populate the atomspace: (InheritanceLink (stv 1 1) ; a non-zero truth value is needed! (ConceptNode "Linas") (ConceptNode "human")) ; Perform the inference (cog-bind human-implies-animal)
Example: Multiple arguments
ExecutionOutput links can pass two or more atoms to the schema node. The next example is almost the same as the above, except that the truth value is copied from the first atom, and scaled by 0.3. To make this clearer, some handy utility functions are defined first:
; Return the strength of a simple truth value (define (get-tv-strength tv) (cdr (assoc 'mean (cog-tv->alist tv)))) ; Return the confidence of a simple truth value (define (get-tv-confidence tv) (cdr (assoc 'confidence (cog-tv->alist tv)))) ; Return a truth value where the strength was multiplied by 'val' (define (scale-tv-strength val tv) (cog-new-stv (* val (get-tv-strength tv)) (get-tv-confidence tv))) ; Define a formula that computes a truth value (define (my-dual-formula ha hb) (display "Hola! Tengo esta:\n") (display ha) ; print the atom that we were given. (display hb) ; print the atom that we were given. (display "Eso es lo que tengo\n") ; Set the strength of the truth value on atom hb ; to be just 0.3 of the strength of atom ha. (cog-set-tv! hb (scale-tv-strength 0.3 (cog-tv ha))) hb ; return atom hb ) (define human-implies-animal (BindLink (VariableNode "$H") ; This is the pattern to match ... (InheritanceLink (VariableNode "$H") (ConceptNode "human") ) ;; This will be executed if the above pattern is found. (ExecutionOutputLink (GroundedSchemaNode "scm: my-dual-formula") ; This is the list of arguments to pass to the formula. ; Notice that *two* arguments are passed. The first ; argument is a repeat of the pattern that was matched, ; and the second argument is a hypergraph that will be ; created. (ListLink ; The schema will take the truth value from this link ... (InheritanceLink (VariableNode "$H") (ConceptNode "human") ) ; and it will set the truth value here, after scaling by 0.3. (InheritanceLink (VariableNode "$H") (ConceptNode "animal")))))) ; Some data to populate the atomspace: (InheritanceLink (stv 0.95 0.6) ; a non-zero truth value is needed! (ConceptNode "Linas") (ConceptNode "human")) (cog-bind human-implies-animal)
The above examples show new hypergraphs being created in the implicand. One does not need to do this; one can pass whatever is desired to the schema, including just the groundings, if desired. So for example:
(ExecutionOutputLink (GroundedSchemaNode "scm:some-schema") (ListLink (VariableNode "$varA") (VariableNode "$varB") (VariableNode "$varC")))
would not create any new hypergraphs, and instead would just call some-schema with, as arguments, the three atoms that the pattern matcher found.
Satisfiability is about taking an expression that contains variables, and discovering values for those variables, that renders the expression "true". The GetLink is able to find all of the atoms that satisfy a pattern.
Graph re-writing is about taking an existing graph, and creating a new graph derived from it. This is what PutLink does. The PutLink is a beta redex: when it is executed, a new graph is created, with values substituted for variables. That is, executing a PutLink causes beta reduction to occur. Thus, the GetLink finds possible groundings, while the PutLink then uses those groundings as values to replace variables with the values that GetLink found.
Relation to other systems
The paper A co-Relational Model of Data for Large Shared Data Banks by Erik Meijer and Gavin Bierman demonstrates how SQL and noSQL are category-theoretic duals. The notation used there corresponds to the following:
- BindLink corresponds to "SelectMany", and it is a monad.
- The AtomSpace is the functor of the monad.
- The GetLink corresponds to the SQL function (with P the pattern predicate)
- The PutLink corresponds to the SQL function (with F the re-write function).
The thing that makes BindLink different and unique is that it, too, is an element of the atomspace, rather than an abstract theoretical construction outside of the database.
Apache Gremlin Language/Tinkerpop
The Apache TinkerPop system has the Gremlin Language for defining a graph traversal. A Gremlin traversal is very much like a single-clause GetLink. There is no concept of a multi-clause traversal in Gremlin. There are many obvious and important differences:
- Tinkerpop edges always have two vertexes, one at each end of the edge. Atomspace edges (aka Links) can have an arbitrary number of vertexes in them, and can also contain other edges. Tinkerpop edges are untyped; AtomSpace edges are typed.
- Gremlin queries are compiled down to a bytecode, and can be distributed to OLAP or OLTP systems, where the actual query is performed. There is (currently) no such offloading ability in the atomspace.