Pattern engine

From OpenCog
(Redirected from Pattern Matcher)
Jump to: navigation, search

OpenCog has a pattern matcher or query engine or variable unifier that can be used to search the atomspace for specific patterns or arrangements or 'templates' of atoms. After specifying some (arbitrarily complex) arrangement of atoms, that is, a hypergraph, the pattern matcher can find all instances of that hypergraph in the atomspace. The pattern or template can have "holes" in it -- locations that are variable -- and so the pattern matcher can act to "fill in the blanks" when presented with a pattern that has blanks in it. For example, a pattern such as "John threw a _____." can be solved or grounded if the atomspace contains the sentence "John threw a ball." In this example, the solution or grounding is "ball". Patterns may have arbitrary numbers of blanks (variables) arranged in arbitrary ways, and may have multiple solutions. Examples of patterns can be found on the SatisfactionLink and QueryLink page. The QueryLink and related link types provide an easy-to-use, convenient API to the pattern matcher. Below follows a more technical description; if unsure, consult the QueryLink and MeetLink pages first, or look at the examples.

What should we call this?

Calling this a "pattern matcher" is misleading, because when people normally say "pattern matching", they are thinking of something much simpler. So this is a really bad name. What else could we call it?

It is a unifier, in the theoretical computer-science sense of Unification, because that is what it does. It is a query engine, because the act of performing unification over a database of relations is the same thing as querying that relational database, such as with SQL. See relational algebra in wikipedia, for example. The difference here is that Atomese relations are expressed as hypergraphs, so that the pattern matcher actually implements a hypergraph query language (HQL). These are three different names for the same ball of ideas. Since the results of a search can be used to create a new graph, the query engine implements graph rewriting.

The book Term Rewriting and All That, Franz Baader, Tobias Nipkow, 1998, Cambridge University Press (see Google or Amazon) provides a good intro (see page 80 of paperback edition). What is implemented here is considerably more sophisticated: it unifies multiple terms at once, and automatically handles unordered terms (terms that can be in any order), provides support for quotation, execution and evaluation. A "term" is any "link", in the sense of "term algebra". It is useful to think of "terms" because, in a term algebra, the notion of variables and arity &c. are well-developed. In the pattern matcher, a "clause" is a top-level term, that is, a pattern is a single AndLink that wraps N clauses.

Usual and unusual features

All of the "normal" features you might hope for are supported:

  • Perform graph re-writing. That is, when a match is found, the match results are used to create a new graph. Graph re-writing is done with the QueryLink. QueryLinks specify "rules" (in the sense of a graph rewrite rule), and apply those rules to the AtomSpace. Each "rule" has the form "if P then Q", where P and Q are hypergraphs. If the hypergraph P is matched, then the graph Q is instantiated.

The pattern engine provides support for more than a few unusual features (some very unusual) not found in any other systems:

  • Atomese is typed programming language. That means it has a good collection of type constructors, allowing arbitrary, complicated types to be specified. The search variables can be constrained by these types. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • An ordinary query can be thought of as demanding "find all the answers to this question". The DualLink provides an inverted search: "find all the questions for which this is an answer". This is pattern recognition, and is similar to the kind of search that chatbots do: given some utterance, a chatbot finds all rules that can be applied to that utterance. Most systems and programmers do not think of chatbots as a kind of database to be searched, we do. Roughly speaking, the DualLink provides a kind of "dynamic" Rete algorithm. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • Partially ordered sets come equipped with the notion of join and meet. Roughly speaking, a "join" is a graph that contains all of the given subgraphs. A "meet" is the intersection of a given set of graphs. Ordinary satisfiability (ordinary pattern matching or querying) is a "meet" operation, because it asks for each of the clauses to be satisfied. In boolean satisfiability, a "jin" is kind of silly: its like asking for everything that fails - surely a huge set! For graphs, it makes great sense: one is asking for the containing graph. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • UnorderedLinks are used for sets that do not have any preferred order. Thus, when searching for a pattern with unordered links in it, one must consider all possible permutations. This is not that hard if you have only one unordered link to deal with. If you have two or three or more, and they are nested, and they are constrained to have common sub-elements (via shared variables in them), this is not so easy. In fact. its just plain hard. The pattern engine handles them correctly. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • One can specify a menu of different subgraphs to be matched, with the ChoiceLink. These can be arbitrarily nested.
  • Once can match zero, one, two or more subgraphs with GlobNodes. This is similar to the idea of globbing in a regex, except that the globbing is performed on a graph. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • Executable graphs can be quoted, using the QuoteLink. Quotation means that the executable bit will not be run during the search. If the quoted thing is a variable, then you can search for that variable.... and so on. In particular, this means you can query for queries. (As of 2020, no other graph query system can do this ... as far as I know ...).
  • The PresentLink can be thought of as a form of "there exists": it requests tha the indicated subgraph must exist in the AtomSpace. The AlwaysLink provides a kind of "for all" search: the indicated subgraph must appear in every match. For example, find all baskets that contain only red balls in them. (As of 2020, no other graph query system can do this ... as far as I know ...).

Related concepts

In many ways, the pattern matcher resembles a hygenic macro language; one might even say that it is a macro language. See:

The difference, however, is this: a macro expansion can be applied only once. By contrast, patterns are more like production rules, and can be applied repeatedly, whenever the data fits.

The pattern matcher can be thought of as a building block for a:

The pattern matcher is meant to be used as a building block for more sophisticated functions. It is meant to be the foundation on which to build forward and backward chainers, as well as the (user-defined) indexes needed to operate these (e.g. to implement the Rete algorithm or variants).

The pattern matcher has NOT been influenced by miniKanren, but probably should be.

See also:

The Atomese API

The primary interface into the pattern matching engine is via the Atomese API, which allows graph queries to be written as graphs themselves.

General, generic patterns consist of two parts: optional variable declarations, and a body. The body specifies the pattern to be grounded (match-pattern). Thus, the general structure is:

 PatternLink
   <var-decls>       (optional)
   <match-pattern>

The QueryLink is similar, except that it performs graph re-writing, and so the body must consist of a predicate, interpreted as the pattern to search for, and an implicand specifies how the search results are to be instantiated.

 QueryLink
   <var-decls>       (optional)
   <match-pattern>
   <instantiated-pattern>

All pattern links can be understood as database query expressions; the atomspace is searched for all graphs that match the specified query.

The MeetLink is equivalent to a QueryLink with no rewriting. So, the following two examples are equivalent:

(define find-animals-query
  (QueryLink
    ;; The variable to be bound
    (VariableNode "$var")
    ;; The pattern to be searched for
    (InheritanceLink
       (VariableNode "$var")
       (ConceptNode "animal")
    )
    ;; The value to be returned.
    (VariableNode "$var")
  )
)
 
(define find-animals-meet
  (MeetLink
    ;; The variable to be bound
    (VariableNode "$var")
    ;; The pattern to be searched for
    (InheritanceLink
       (VariableNode "$var")
       (ConceptNode "animal")
    )
  )
)

;; Run the above patterns
(cog-execute! find-animals-query)
(cog-execute! find-animals-meet)

The result of running the pattern matcher will be a list of all possible groundings for the variable $var. So, for example, if the atomspace contains:

(InheritanceLink (ConceptNode "fox") (ConceptNode "animal"))
(InheritanceLink (ConceptNode "skunk") (ConceptNode "animal"))

then cog-execute! will return a SetLink with the two animals:

(ListLink (ConceptNode "fox") (ConceptNode "skunk"))

Because the implicand was simply a single, naked VariableNode, the instantiator had no work to do; the atomspace was not altered, and the list of results was returned to the scheme caller. The variable declaration was optional: the following would provide the same results:

(define find-animals
  (QueryLink
    ;; The pattern to be searched for
    (InheritanceLink
       (VariableNode "$var")
       (ConceptNode "animal")
    )

    ;; The value to be returned.
    (VariableNode "$var")
  )
)
 
;; Run the above pattern
(cog-execute! find-animals)

Design and Implementation

The pattern matcher is written in C++. It consists of two major components: the "analysis" (or "compilation") step, and the runtime step. The "analysis" step pulls apart to specified pattern, and extracts bound variables, free variables, quotes, executable links, evaluatable links, definitions, boolean expressions, virtual links and other assorted bits and pieces (AbsentLink, AlwaysLink, etc.) The aim of the analysis step is to have the runtime step run as quickly as possible. Analysis only needs to be done once per pattern; once analyzed, it can be run an arbitrary number of times. The analysis code is in the /opencog/atoms/pattern directory

The runtime code consists of seven major pieces. The search-initiation phase finds a good place to begin the graph-walk. The core search is a graph-walk, implemented as a stack machine. State is pushed and popped to/from a stack as partial matches are found. A dozen different callbacks interpret what should be done as matching proceeds; this allows the engine to be used for other tasks, provided one codes in C++. (This is generally discouraged; the engine has no stable, guaranteed API.) The "virtual link" subsystem handles graphs that are connected only by "virtual" links (such as GreaterThanLink) but are otherwise disconnected. Such disconnected graphs must be discovered independently of one-another, and then reassembled by running the virtual links, to see if they can connect (e.g. to perform an actual numerical compare, when using a GreaterThanLink). The "instantiator" takes the discovered variable groundings, and performs the actual graph rewrite. The "executor" executes any of the rewrites that are executable. Finally, there are a set of thin shims that adapt it to DualLink, QueryLink, MeetLink, etc. The implementation of the runtime part may be found in the /opencog/query directory; see opencog/query/README.md for details.

Performance vs. flexibility

The pattern engine is a beast. Its large and complex. Because it's written in C++, it has good performance (its as fast as we've been able to make it, so far). However, it has a significant setup cost (the analysis step, and even the first few parts of the runtime), so, for itsy-bitsy small, simple problems, you can outperform it with hand-written snippets of code. But once a problem starts getting complex, e.g. has more than one variable in it, or requires traversing a graph with loops, the pattern engine will beat ad hoc hand-written code.

The Engine

The core, underlying matching is implemented in class PatternMatchEngine. The main entry point is class PatternMatchEngine::match(). It takes four inputs: a list of variables, a list of clauses, a list of optional clauses (negations), and a user-defined callback class.

The callbacks govern the "semantics" of matching, and are completely user-definable. By providing an appropriate callback, the user can impose any semantics whatsoever on the concept of "what constitutes a match". The engine merely presents candidates to the callbacks, but it is up to the callbacks to accept or reject candidates. Thus, for example, the default callback implements "negation semantics" for the optional clauses: it rejects a match if any of the optional clauses are present. These are discussed in greater detail, below.

The list of variables is the set of bound variables that will appear in the clauses and the negations. Variables can "match anything"; more specifically, if some atom is discovered where the variable would be, then the user-defined callback is called to report the potential match. The callback can accept or reject the match; if it accepted, then that match is a "grounding" of that variable.

The list of clauses determine the hypergraph to be matched. Each clause should be thought of as a directed tree, the edges of the tree corresponding to the outgoing sets of the atoms; the terminal leaves of the tree are Nodes, and the internal vertexes of the tree are Links. The list of clauses should be "connected" to one-another: that is, each clause should have at least one node that also appears in another clause. The engine walks these trees, attempting to match them up to the contents of the atomspace; a match is declared only if each of the clause trees match some tree in the atomspace, connected the same way at the nodes, and having had a successful, consistent grounding of each of the variables. If a match has been found, then a user-defined callback is called, reporting this match. The callback can "accept" or "reject" the match. If the callback "accepts" the match, then all further searching is halted; if it is rejected, then a search for other matches continues. The user-defined callback is free to do as it wishes: for example, if it wants to find all matches, it could simply record the match that was found, "reject" it, and then wait for more matches.

The optional clauses are lists of clauses that may or may not be present to satisfy a match. The default callback implements negation semantics for the optional clauses: that is, it explicitly rejects a match if any of the optional clauses are present. Other, user-defined callbacks are free to implement a different semantics for the optional clauses.

The callbacks

The callbacks are defined by a virtual base class class PatternMatchCallback. It has a concrete implementation in class DefaultPatternMatchCB, which does "something reasonable" for most users. It is possible to write custom callbacks to do something different, but this is discouraged. There are two reasons for this: we don't want to guarantee the stability of the callback API; it's changed in the past, and may need to change in the future. The other reason if that, whatever problem you are trying to solve, there probably is a better solution already available in generic Atomese. Take the time to full digest the examples before trying to hack the pattern engine callbacks. That said, the main callbacks are:

  • node_match() -- this is called whenever two nodes are compared. The callback can accept or reject the match. The default is to accept a match if the two atom handles are identical. A non-default behavior might be more general, for example, allowing matches based on type, or based on truth values, or any other criteria.
  • variable_match() -- this is called when a variable appearing in a clause is to be matched to a candidate grounding. The default callback accepts a match if the candidate node is one of a given set of types (as described in the BindLink page -- that is, the default callback implements the BindLink semantics).
  • link_match() -- this is called to compare links. The default callback accepts a match if the link types and the link arities agree. The default callback does not compare the actual outgoing sets: this is the engine's job. A user-defined callback can take any action here, if desired, including modifying or computing truth values, etc.
  • clause_match() -- called when a single clause has been matched. This callback can accept or reject the match, or take the opportunity to compute truth values. The default callback always accepts matches.
  • optional_clause_match() -- called when an optional clause is matched. The default callback implements "negation semantics": it always rejects optional clauses. The central idea of negation is that a match is possible only if the negated clause is absent. Other callbacks can be used to define different kinds of optional-clause semantics. In particular, the very idea of "negation" depends on the interpretation of the strength and confidence of a truth value: "negation" might not make sense if the confidence is zero. Similarly, perhaps an optional clause should be rejected only if it is mostly "true", and accepted if it is sufficiently "false". The default callback implements crisp-logic semantics for the negation.
  • solution() -- called when all clauses have been matched (and all optional clauses have been appropriately handled). This provides an opportunity for the callback to record the solution, and to either terminate or continue the search.

In addition to the above, there are three informational callbacks:

  • push() -- called after a clauses has been fully grounded. This allows the callback class to save partial state onto a stack, if desired. The pattern matcher is a stack machine; as it advances, it maintains a state stack; when it retreats (back-tracks), it pops its own internal stack. This callback provides an opportunity for the user to deal with back-tracking as well; for example, it may want to maintain a cumulative truth value, or attention value; clause by clause. When forced to backtrack, it would then need to be able to discard partial results. This would be done with the pop() callback, below.
  • pop() -- called when back-tracking.
  • set_type_restrictions() -- called very early, before any pattern matching has commenced. It conveys the variable typemap that was specified by the BindLink.

In addition to the above, there is one very important callback that allows the user to limit or guide the search; that is, to determine the domain of the search.

  • perform_search() -- called when the engine is invoked. This callback must then turn around and invoke the main entry point to the engine: the PatternMatchEngine::do_candidate() method, as many times as it needs to, to perform the search over the desired domain. That is, the engine itself does not know where to start the search, and even when told where to start, it will not necessarily perform an exhaustive search. This callback allows the user to specify the initial start point, as well as to take the needed actions to make sure that the desired domain has been searched. The default callback attempts to search the entire atomspace, assuming "reasonable" semantics. Under certain unusual semantics, the default may fail to find all matches. In particular, it may fail if the link_match() and node_match() callbacks implement a broader matching semantics than the default.

Example Walk-through

Perhaps the easiest way to explain how the PM works is with a walk-through of what actually happens.  We start with the set of terms to be unified:

(AndLink
    (EvaluationLink
        (PredicateNode "whichmarker")
        (ListLink
            (VariableNode "$C")
            (VariableNode "$D")))
    (EvaluationLink
        (VariableNode "$D")
        (ListLink
            (VariableNode "$C")
            (VariableNode "$E")))
    (EvaluationLink
        (VariableNode "$A")
        (ListLink
            (VariableNode "$B")
            (VariableNode "$C")))

The first thing that happens is that the perform_search() callback is called.  This callback can be used to limit the domain of the search.  

The default callback does things so that it seems like the whole atomspace is "searched".  Note that they are all variables, except for one constant: PredicateNode "whichmarker".  This is an excellent place to start the "search": we've already narrowed the "search" down to just ONE atom!  Bravo!  

Next, the default callback loops over the entire incoming set of PredicateNode "whichmarker", passing control back to the pattern matcher.   The incoming set is not sorted or truncated in any way, all of it is explored.  But you could change this if desired. For every atom in the incoming set, the PM compares it to "EvaluationLink". It calls a callback called link_match() which is given two links: the EvaluationLink, and the atom in the incoming set.  The default callback declares a match if and only if the atom has type EvaluationLink.    But you could change this if desired; for example, to reject the match if the AV of the atom is too low.   Or you could reject the match if the atom is not in some set. 

If you reject the match, the PM backtracks to the next atom in the incoming set.

If you accept the match, then the PM tries to compare the outgoing sets of the two EvaluationLinks.  Since we already did the PredicateNode, it doesn't have to be re-done.  Next is the ListLink.  Again the link_match() callback is called ... with two arguments: the ListLink from the pattern, and the atom in position 2 of the outgoing set of the EvaluationLink, whatever it may be.  Again, you must accept or reject the comparison.  The default is to accept if and only if the types match.

If you reject the match, the PM backtracks upwards, twice.

If you accept the match,  then the PM tries to compare the outgoing sets of the ListLink's. This time, the variable_match() callback is called, with two arguments. The first one is VariableNode "$C"  and the second one is (for example) ConceptNode "tree".   You can choose to accept or reject this candidate grounding.

The default callback simply makes sure that ConceptNode is one of the allowed types that VariableNode "$C" is allowed to have.

You can write a callback that will reject the match if the tv or av of ConceptNode "tree" is too high or low, or if ConceptNode "tree"  is not a recently generated relex2atom atom. Perhaps you want to check some HebbianLink's to see if ConceptNode "tree" is hot stuff. 

Perhaps one of the best solutions would be to write a generic callback that, whenever its given a VariableNode "$whatever" and a ConceptNode "tree", it rejects the match unless the STI is over 1000.   That way, as long as relex2logic always sets the STI above 1000, and then decays it lower values over time, the search will always be limited to the newest relex2logic atoms.   Simple, easy, straightforward.  This would require only a small amount of new code.  You just have to decide what the selection criterion is.

Anyway, that's how the PM proceeds: it matches incoming and outgoing sets, atom by atom, calling a callback for each possible match.   This sounds real easy, but actually, its not, because of backtracking.  Backtracking requires a stack.   But this is an advanced topic that you don't need to worry about, unless you accept/reject criteria is accumulating data.

Backtracking

For example, if the first three matches had a mediocre TV, maybe you accept them, but, for the 4th one you decide that too much crappy TV has happened, and you reject. Backtracking might backtrack over the first three matches, and so, now you want to start with a clean slate.  There are push() and pop() callbacks that allow you to "clean the slate".

Some Q&A on Callbacks

These questions and answers are pasted here from an email exchange, with some reformatting; the goal being to avoid the same questions being posted and needing to be answered again....

Question:

We want to start with a certain set S of Atoms, and have the PatternMatcher look only at Atoms in the set S, during its whole search process.

Answer:

The PM never actually searches "the atomspace", it has absolutely no idea of the space its exploring. The space is entirely controlled by the user-defined callbacks. The callbacks determine the space S that is searched.

Let me take a step back and quickly review what the structure of an atom is, and what the PM does with it. Atoms(Links) have an outgoing set: this is explicitly hard-coded into a Link, and is immutable. Atoms also have an incoming set, which is only implicitly defined, as the set of all Links that point to it.

Given a single Link, the transitive closure of it's outgoing set can be pictured as a tree. Likewise, given a single Atom, the transitive closure of the incoming set can be pictured as an upside-down tree (viz, the relationship is going in the opposite direction).

The PM itself never actually touches, uses, or accesses the atomspace! (with some strange misc exceptions) Understanding this is key to answering your question. Since PM doesn't know about the atomspace, it doesn't know that its searching it... or any subset of it. It is up to the user-defined callbacks to control the search.

To start the search, the PM needs to be given at least one atom that is in common between the pattern, and the space S. (If its given none, then game over, and no match). To obtain this single atom, it calls the perform_search() callback. The user must define this callback; there's a reasonable default. in place.

The default arbitrarily picks some node in the pattern, any node that is not a variable, and then tells the PM to start searching there.

PM then proceeds by comparing the incoming trees, starting at that common node, looking for a match. To obtain the incoming set of an atom, the PM performs a callback. It is the job of the callback to provide the incoming set. There is a default callback for this: it does a reasonable thing. It returns the ENTIRE incoming set for that atom!

This is the key step to answer your question. What is the "entire incoming set"? How is it defined? Well, it is the set J of all Links X that have the current atom A appearing somewhere in the outgoing set.

To limit the search to your set S, you need to write a callback, called get_incoming_set(), that returns only those Links X that are in your set S (and are also actually in J) -- return the intersection of S and J. That's all.


Note however that to use the subset S, it has to be "computable": you have to run some algorithm to determine what S actually is. In particular, you need to have a set membership function that answers yes/no to indicate membership. If that's "very inefficient", don't blame the PM. You have to define S so that working with S is fast.

If computing set membership in S is excruciatingly slow, you can use a cache. Here's an example cache that is O(1) in the pattern matcher:

class myMiniSubspaceS : PatternMatchCB
{
    std::map<Atom,std::vector<Link>>  my_incoming_set_map;

    std::vector<Link> get_incoming_set(Atom a) {
            return my_incoming_set_map(a);
    }

   void add_atom_to_S(Atom a) {
       if (not a->is_a_link()) return;  // ignore nodes
       foreach(Atom ox in a->outgoing_set()) {
                iset = my_incoming_set_map(ox);
                iset.insert(a);  // because a is in the incoming set of ox
       }
}; // the end.

That's all there's to it, nothing more. The class just holds a cache of incoming sets for the atoms in S. The PM will then only search over stuff in S. It can be very efficient: so if you use hash_map instead of map, the lookup is in O(1) time, even if S is large.

The only "gotcha" is that you have to somehow define what belongs in the set S, and spend the cpu cycles to actually put atoms into S. I don't know how efficient that might be.

Again: the class above is just a cache, which is attempting to cover up the fact that S is very inefficient to compute ...

However, if you can compute the set S efficiently and quickly, then you don't need the above cache. Just compute S on the fly, every time the get_incoming_set() callback is called.

Question:

We want to have the Pattern Matcher prefer matches that have a higher STI (ShortTermImportance)

The get_incoming_set() callback returns a VECTOR of atoms. It searches the first atom first, the second atom second, etc. If you place the vector in sorted STI order, then it will "prefer" higher STI matches first.

Also note that, of the callbacks, one of the last ones to get called is named solution(). It presents the user with one single grounding that the PM thinks it has found, and asks for approval.

If the user-callback replies "I like it", then the search terminates. Otherwise the PM tries again.

The default CB always replies "I hate it", and thus the PM explores *all* possible matches, until it can find no more, Then it dies exhausted, a heart-broken death, having searched the entire kingdom far and wide, and having utterly failed in its quest to find a match.

Now hears the clever bit: the default CB secretly recorded ever match that was returned, and put it in its treasure chest, and then sent out the PM to find yet more. Thus, as the PM scoured the kingdom, the CB built up an exhaustive collection of groundings, because the PM was so through.

However, if the CB was not so greedy, it could save a lot of effort by the PM by simply terminating the search early.

Follow-up question:

Yes, but is there a way to randomize where the PM starts its search, via specifying a random seed for the starting point? Would this also be done in a custom callback?

Answer:

Yes, as already detailed above. However, perhaps a better randomization could be done with get_incoming_set(). Maybe this would work:

When asked for the incoming set of some atom A, perform a random draw to obtain some subset of the true set. Maybe even order this subset by STI. The return that. Since PM always searches the incoming set in order, this random draw will randomize the search. I can't say how efficient it would be ...

Formal definition

Below is a formal definition of the pattern matcher, of the kind you might see in math books. Let be a signature, or rather, a collection of both function symbols and predicate symbols. In OpenCog, these correspond to Atoms; that is a given atom type should be thought of as a signature. Let denote a set of variables. In OpenCog, these correspond to a collection of VariableNodes with different names.

Given this definition, is the term algebra of all possible arrangements of Atoms. Likewise, is the subset of terms that have no variables in them: these are the "grounded terms" or "closed forms".

At any fixed point in time, the AtomSpace stores some subset of . Let denote the current contents of the atomspace. Let denote the subset of the atomspace consisting only of grounded terms. The pattern matcher is then comes in three forms. The first form is the function , implemented by SatisfactionLink

that, given some term , it locates some (any) grounded term in the AtomSpace, and returns 1 if such a grounded term can be found; else it returns 0.

The above is slightly oversimplified: by employing the correct term definitions, one can search all of and not just , if desired; the same is true for the link types below. However, the current simplification makes things slightly easier to discuss.

The GetLink form is the function

that, given some term , locates all grounded terms in the AtomSpace, and returns a set containing all such lists Note that the operation of the GetLink causes the size of the AtomSpace to expand: it is almost certainly the case that the ListLink was not previously in the AtomSpace, and even if it was, it probably was not in a SetLink with other similar lists.

The BindLink form is the function

that, given a pair of terms and , locates all grounded terms in the AtomSpace, creates a corresponding , and returns a set of all such terms.

The DualLink is the category oppsite to the GetLink:

that, given a specific , finds all terms that have one or more variables in place of the grounded atoms. The DualLink becomes interesting and non-trivial, when the variables are globby: are GlobNodes that can match multiple sequential ground terms.

Hmmm. Please realize that the above actually over-simplifies what is really happening: it glosses over a number of very interesting details of the system: most importantly, that the logical connectives AndLink, OrLink are atoms themselves, so that the AtomSpace is not just a term algebra, but a language. Next, the types of the Atoms are atoms themselves... and so terms can have type specifications inside of them. There are even type constructors. The AbsentLink can be used to specify a form of negation... the ChoiceLink can be used to specify choices... and so on.

Possibly similar projects

These projects might be similar:

  • Burlap on github. Quote from IW: Burlap lets you define the problem as a network of nodes with vectors of features or attributes attached to it. The algorithms can search through the network using a combination of brute-force searching and statistically guided exploration. The higher level of the algorithm plans the search and deploys the best algorithms. The toolkit includes dozens of the most useful algorithms for agent-based search.
The tool is useful for data-driven worlds where the data can be mapped into a large collection of nodes or objects. The code is written in Java and includes a large assortment of debugging and profiling tools that are useful for keeping the code moving toward the optimal goal.
  • Guile-Log -- the DataLog subset of ProLog for Guile.
  • GraphQL for javascript

See also