JoinLink

From OpenCog
Jump to: navigation, search

The JoinLink is a Link type that can match (and rewrite) the contents of Links in the AtomSpace, when only some of the subgraphs of the Link have been specified. The goal is to be able to search for unknown containing graphs that contain known subgraphs. This is in contrast to, and "opposite" of MeetLink (aka GetLink), which is used to specify a search for a known container with unknown subgraphs. It implements the concept of a "join" of partially ordered sets.

Motivating Example

Suppose one wishes to find all graphs containing (Concept "A"). So, for example, the AtomSpace contains

(Member (Concept "A") (Concept "S"))
(Evaluation (Predicate "P") (List (Concept "A")))

but one does not know, a priori, the general form above: one does not know the general form of things containing (Concept "A"). Suppose one wished to rewrite the above into

(Member (Concept "B") (Concept "S"))
(Evaluation (Predicate "P") (List (Concept "B")))

That is, to substitute all occurrences of (Concept "A") with (Concept "B"). This can be done with a two-step process: first, find all of the Atoms containing (Concept "A"), and second, to perform the replacement. The search pattern to find the containers is then:

(MaximalJoinLink
   (TypedVariable (Variable "X") (Signature (Concept "A")))
   (PresentLink (Variable "X")))

Executing this will return

(LinkValue
   (Member (Variable "X") (Concept "S"))
   (Evaluation (Predicate "P") (List (Variable "X"))))

Several things are noteworthy, here. First, that (Concept "A") was replaced by (Variable "X") instead of (Concept "B"). The replacement step is covered below. Second is that the variable type declaration uses SignatureLink for the variable declaration. This may feel unusual, if you are used to the pattern matcher, but its needed to look like a proper type constructor.

Replacement

The motivating example considered the replacement of (Concept "A") by (Concept "B"). This can be done with the ReplacementLink. as follows:

(MaximalJoin
   (TypedVariable (Variable "X") (Concept "A"))
   (PresentLink (Variable "X"))
   (ReplacementLink (Variable "X") (Concept "B")))

This states that not only must the named component "X" be matched, but that, once matched, it is to be substituted by the specific replacement. The above demonstration was rather round-about; one does not have to give a name to the component being replaced. The same effect can be more easily achieved by writing

(MaximalJoin
   (PresentLink (Concept "A"))
   (ReplacementLink (Concept "A") (Concept "B")))

This directly states that one should find everything containing (Concept "A") and replace it with (Concept "B"). For such a simple example, there is no need to name the components being replaced (i.e. there is no need to tag them with VariableNodes). Tagged components become useful in more complex searches.

TODO: It may also be useful to provide support for the ReplacementLink inside of GetLink and BindLink!

Formal structure

The general form is:

  JoinLink
       <(optional) variable specification>
       <predicates to satisfy>

The <variable specification> must be either a single TypedVariable or a VariableList of typed variables. The variables provide names for the sub-patterns to be joined together. The result of the join will be a set of all of those links that contain each and every sub-pattern, in some way, with each sub-pattern replaced by the corresponding named variable. Further constraints between the sub-patterns can be specified in the <predicates to satisfy> section.

The <predicates to satisfy> is a list of constraints, all of which must be satisfied. In general, it will consist of one or more PresentLinks, AbsentLinks, AlwaysLinks or evaluatable link types. The PresentLink is used to insist that the indicated sub-pattern must be present in the Atomspace. (Every variable in the <variable specification> is implicitly present; that is, there is an implicit PresentLink that holds each variable). The AbsentLink forces the indicated pattern to not be present. It is possible to include other evaluatable predicates to further constrain the search; this includes grounded predicates, which can accept or reject a match, as well as built-in Atomese predicates, such as GreaterThanLink or EqualLink, combined with AndLink, OrLink, NotLink, as needed.

The goal of the <variable specifications> is to provide handles for the sub-patterns to be matched. Arbitrarily complex patterns can be specified with the SignatureLink.

Executing the JoinLink will return all atoms that contain all of the indicated sub-patterns, and also satisfy any other embedded conditions in the clauses. In practice, there are three forms of containment: minimal, upper-set and maximal, and thus, the concrete types are MinimalJoinLink, UpperSetLink and MaximalJoinLink. The minimal container returns the "supremum": the "smallest" or "lowest" links which satisfy all of the clauses, but no higher. No link in the minimal join contains any other link in the minimal join; all points are distinct. The maximal container returns the top-most links satisfying all the clauses: those links which have no incoming set. No link in the maximal join contains any other link in the maximal join; all points are distinct. Every atom that satisfies the constraints occurs as a sub-atom (i.e. somewhere in the outgoing set) for some link in the maximal set. Depending on the situation, the minimal and maximal containers may return the same result. However, in general, a link returned by the minimal container may have a non-empty incoming set. Obviously, everything in that incoming set will also satisfy the conditions. By contrast, the links returned by the maximal container will never have an incoming set. The UpperSetLink returns everything in between: it simply enumerates everything in the minimal set, or above it.

The minimal container corresponds to the concept of a "join" of partially-ordered sets (as used in topology and lattice theory). The maximal container corresponds to the concept of a "filter".

In the following, "JoinLink" is used as a stand-in for either the MinimalContainer or the MaximalContainer; there is no unqualified JoinLink, per se (in the implementation, it is a private link type).

Examples

This section consists of a sequence of progressively more complicated examples that illustrate typical things that one might want to accomplish.

Minimal vs. Upper vs. Maximal

The differences between these are best understood by example. Suppose the AtomSpace contained

(Evaluation (Predicate "ontology")
   (List
      (Concept "class")
      (Member (Concept "crow") (Concept "bird"))))

and one wished to obtain links that contain both "crow" and "bird". The the query

(MinimalJoin
   (Present (Concept "crow"))
   (Present (Concept "bird")))

will return

(Member (Concept "crow") (Concept "bird"))

as that is the smallest link in the atomspace that contains both. The maximal join

(MaximalJoin
   (Present (Concept "crow"))
   (Present (Concept "bird")))

returns the EvaluationLink, as it also contains these two, and is the largest such link to do so. The upper set:

(UpperSet
   (Present (Concept "crow"))
   (Present (Concept "bird")))

returns the two limits, and everything in between: the MemberLink, the ListLink and the EvaluationLink. In total, it returns these three atoms; it does NOT return either "crow" or "bird" as singletons, as these both fail to contain the other. They are below the lower bound.

PresentLinks are implicit

The motivating example provided the following:

(MaximalJoinLink
   (TypedVariable (Variable "X") (Signature (Concept "A")))
   (PresentLink (Variable "X")))

This helped establish the idea that the goal is to find everything that contains (Concept "A"). The PresentLink is superflouous in this situation; simply by asking for (Concept "A"), one is implicitly implying that it must be in the AtmSpace (of course, trivially!) Thus, the PresentLink isn't really needed; its just acting as an extra predicate to confirm something that is already known/assumed. Thus, one can also write

(MaximalJoinLink
   (TypedVariable (Variable "X") (Signature (Concept "A"))))

and get the same results.

Shallow types

One can find eveything holding some given type, with a very shallow type specification. For example

(MaximalJoin
    (TypedVariable (Variable "X") (TypeNode 'ConceptNode)))

will get everything that contains a ConceptNode. For the motivating example, this returns:

(Evaluation (Predicate "P") (List (Variable "X")))
(Member (Variable "X") (Variable "X"))

and so broadly clobbers everything.

--- TODO: Not implemented. It would be nice to be able to not use variables for tagging, and just do an anoymous replacement:

(MaximalJoin
    (Replacement (TypeNode 'ConceptNode) (Concept "B")))

which would find any ConceptNode, and replace it by "B", directly.

Naming subcomponents

Suppose one wishes to find all links containing

(Evaluation (Predicate "is blue") (List (Concept "sky")))

and that the AtomSpace contained

(Context 
   (Concept "moon")
   (Similarity 
       (Evaluation (Predicate "is blue") (List (Concept "sky")))
       (Concept "unlikely")))

which states that the EvaluationLink is similar to "unlikely", when one is on the moon. The actual meaning here is irrelevant; what's important for this example is that the EvaluationLink appears within some other "unknown" link. To find this, write

(MaximalJoin
   (TypedVariable (Variable "X")
       (Signature (Evaluation (Predicate "is blue") (List (Concept "sky"))))))

Executing this will then return

(Context 
   (Concept "moon")
   (Similarity
       (Variable "X")
       (Concept "unlikely")))

That is, a structure where all occurrences of (Evaluation (Predicate "is blue") (List (Concept "sky"))) have been replaced by (Variable "X").

--- TODO: Not implemented. As above, it would be nice to be able to perform anonymous replacements, without explicitly naming the thing to be replaced. In other words:

(MaximalJoin
   (Replacement (Evaluation (Predicate "is blue") (List (Concept "sky"))) (Concept "Xbar")))

This is slightly less verbose, and slightly more readable.

Variable regions in subcomponents

This example is similar to the above, but broadens the search. Assuming the same atomspace contents as above, one might want to find any kind of blue things. One writes

(JoinLink   ; Minimal or Maximal...
   (VariableList
      (TypedVariable (Variable "X")
          (Signature (Evaluation (Predicate "is blue") (List (Type 'ConceptNode)))))))

This search will return the same structure as before, including other things that are blue. The (TypeNode 'ConceptNode) specifies a wild-card match for anything that is a Conceptnode in teh indicated position. So, for example, if he atomspace contains

(Evaluation (Predicate "is blue") (List (Concept "sea")))
(Evaluation (Predicate "is blue") (List (Concept "sky")))

then the signature will match both of these. However ... both of these are then replaced by (Variable "X") so the multiplicity disappears.

--- TODO: Not implemented. As above, it would be nice to support anonymous replacement, without having to give a name to the variable.

Embedded names

Continuing as above, but one wishes to obtain a rewrite only for the "thing". In this case, an implicit PresentLink won't do: one must explcitly ask for what must be present.

(MaximalJoin
   (TypedVariable (Variable "$thing") (Type 'ConceptNode))
   (Present
      (Evaluation (Predicate "is blue") (List (Variable "$thing")))))

This will return

(Context 
   (Concept "moon")
   (Similarity 
       (Evaluation (Predicate "is blue") (List (Variable "$thing")))
       (Concept "unlikely")))

So, unlike the previous examples, the substitution is made at a different location.

The corresponding minimal container is perhaps surprisingly different from the earlier examples.

(MinimalJoin
   (TypedVariable (Variable "$thing") (Type 'ConceptNode))
   (Present
      (Evaluation (Predicate "is blue") (List (Variable "$thing")))))

The returns just

(Variable "$thing")

but only if there is something, anything in the AtomSpace that is actually blue. Otherwise, it returns the empty set!

Compound structures

Suppose one wanted to find all Links having two elements. For example, given the Atomspace contents

(Evaluation (Predicate "above") (List (Concept "sky")))
(Evaluation (Predicate "part of") (List (Concept "beach") (Concept "sand")))
(Evaluation (Predicate "part of") (List (Concept "beach") (Concept "sea")))
(Evaluation (Predicate "planetary")
    (List (Concept "sky") (Concept "ground") (Concept "ocean")))

Clearly, only the second and third elements have lists of arity two. Obtain these with

(MaximalJoin
   (VariableList
      (TypedVariable (Variable "X") (Type 'ConceptNode))
      (TypedVariable (Variable "Y") (Type 'ConceptNode)))
   (Present (List (Variable "X") (Variable "Y"))))

This will return

 (Evaluation (Predicate "part of") (List (Variable "X") (Variable "Y")))

Note that the sea/sand/beach elements have been erased and collapsed down. Specific values can be retained by modifying the type constraints on the variable declarations. See also the examples further below, for other variations on this theme.

Container type

Returning to the motivating example, suppose the AtomSpace contains

(Member (Concept "A") (Concept "S"))
(Evaluation (Predicate "P") (List (Concept "A")))

As in the motivating example, one wishes to find everything that contains (Concept "A") but only when the container is an EvaluationLink. One then writes

(MaximalJoin
   (TypedVariable (Variable "X") (Signature (Concept "A")))
   (TypeNode 'EvaluationLink))

In this example, the TypeNode acts as a predicate, stating that the container type must be as indicated. Beside just TypeNode, one can also use any of the type specifiers, including TypeChoice, TypeInh, TypeCoInh and of course SignatureLink. So, for example, to find all graphs where the top link is either the EvaluationLink, or a SimilarityLink, one writes:

(MaximalJoin
   (TypedVariable (Variable "X") (Signature (Concept "A")))
   (TypeChoice
       (Type 'EvaluationLink)
       (Type 'SimilarityLink)))

Multiple Appearances

Proposed but not yet implemented. The TypeSetLink can be used to limit how often a sub-graph appears in a containing graph. Consider, for example, an AtomSpace that contains

(Evaluation (Predicate "P") (List (Concept "A")))
(Evaluation (Predicate "Q") (List (Concept "A") (Concept "A")))
(Evaluation (Predicate "R") 
   (List (Concept "A") (Concept "A") (Concept "A")))
(Evaluation (Predicate "S") 
   (List (Concept "A") (Concept "A") (Concept "A") (Concept "A")))

and one wished to find only those graphs that contained (Concept "A") only two or three times -- the is, to find only the Q or R predicates. Then write:

(MaximalJoin
   (TypedVariable (Variable "X") 
      (TypeSet
         (Signature (Concept "A"))
         (Interval (Number 2) (Number 3)))))

Performing this search will then return

(Evaluation (Predicate "Q") (List (Variable "X") (Variable "X")))
(Evaluation (Predicate "R") (List (Variable "X") (Variable "X") (Variable "X")))

Discussion about why it's not implemented: The pattern engine cannot count these appearances for us; its too early in the game. So we would have to do the counting in the joiner. But this is (mildly) painful in two ways. First, we would have to crawl through the variable list, and find the intervals, and then we'd have to walk through the supremum, and discard those with with wrong count. This is .. doable, but currently seems uninteresting, unless actual users show up and ask for this. If you need this, open an issue in github!

Filter intersection

Suppose the AtomSpace contained the following:

(Evaluation (Predicate "Pa") (List (Concept "A")))
(Evaluation (Predicate "Pab") (List (Concept "A") (Concept "B")))
(Evaluation (Predicate "Pabc") 
   (List (Concept "A") (Concept "B") (Concept "C")))

If the goal is to find those that contain both (Concept "A") and (Concept "B") then write:

(MaximalJoin
   (VariableList
      (TypedVariable (Variable "X") (Signature (Concept "A")))
      (TypedVariable (Variable "Y") (Signature (Concept "B")))))

This will then return only Pab and Pabc, but not Pa. That is because Pa does not contain (Concept "B") and so it cannot be a part of the join. In other words, the return is

(Evaluation (Predicate "Pab") (List (Variable "X") (Variable "Y")))
(Evaluation (Predicate "Pabc") 
   (List (Variable "X") (Variable "Y") (Concept "C")))

This can be thought of as a "filter intersection", in the sense of "filter" as a concept in topology: The maximal join of (Concept "A") is a filter, and so is (Concept "B"). The intersection of these two is also a filter, and is strictly the intersection of the two.

Filter intersection, again

Here's another example of multiple subcomponents, with a similar, but looser match. It assumes the same AtomSpace contents as above.

(MaximalJoin
   (VariableList
      (TypedVariable (Variable "Y") (Signature (Concept "B")))
      (TypedVariable (Variable "P") (Type 'PredicateNode))))

The above requires that not only must the (Concept "B") be present, but also some node, any node that is a PredicateNode. The return value is thus:

(Evaluation (Variable "P") (List (Concept "A") (Variable "Y")))
(Evaluation (Variable "P") 
   (List (Concept "A") (Variable "Y") (Concept "C")))

User callbacks via GPN's

One can mix in GroundedPredicateNodes to obtain custom filtering of match results. Thus, continuing the example above:

(MaximalJoin
   (TypedVariable (Variable "P") (Type 'PredicateNode))
   (Evaluation (GroundedPredicate "foo:bar") (List (Variable "P")))
   (Replacement (Variable "P") (Predicate "yar har har")))

would result in the GPN "foo:bar" being called with the matching arguments. As always, "foo" can only be "py", "scm", "c++" or "lib" for python, scheme, c++ or haskell callbacks. If the EvaluationLink returns (SimpleTruthValue 1 1) then the match is accepted; otherwise it is rejected.

If, for example, "foo:bar" is replaced by "scm:want-only-R" and one writes

(define (want-only-R ATOM)
  (format #t "I was told about ~A" ATOM)
  (if (string-contains (cog-name ATOM) "R") (stv 1 1) (stv 0 1)))

running the above will print:

I was told about (PredicateNode "Q")
I was told about (PredicateNode "P")
I was told about (PredicateNode "R")
I was told about (PredicateNode "S")

but only the following will be generated:

(Evaluation (Predicate "yar har har")
    (List (Concept "A") (Concept "A") (Concept "A")))

because the want-only-R function rejected those matches whose string name did not contain the letter "R".

Naming the container

In order to pass the current candidate for the container to a GPN, it must be given a name. This can be done by providing a variable, and declaring it the special type "ContainerNode". For example

(MaximalJoin
   (VariableList
      (TypedVariable (Variable "P") (Type 'PredicateNode))
      (TypedVariable (Variable "$top") (Type 'JoinNode)))
   (Evaluation (GroundedPredicateNode "foo:bar") (List (Variable "$top")))))

So, in this example, the GPN will be called with $top set to those links that contain (Concept "B")

Replacement Examples

Some examples showing more intersting uses of replacement.

Example: Swapping

Proposed but not yet implemented An example of the power of replacement, consider the need to swap arguments. That is, suppose the AtomSpace contains

(Evaluation (Predicate "P") (List (Concept "S") (Concept "T")))

Suppose one wanted to find all ListLinks having two ConceptNodes in them, and swap the order of the two. This can be done with the following:

(MaximalJoin
   (VariableList
      (TypedVariable (Variable "X") (Type 'ConceptNode))
      (TypedVariable (Variable "Y") (Type 'ConceptNode))
      (TypedVariable (Variable "L") (Signature (List (Variable "X") (Variable "Y")))))
   (Replacement (Variable "L") (List (Variable "Y") (Variable "X"))))

The above would have the effect of finding everything that had a List of two Concepts, and rewriting that list with swapped positions. This would generate the result

(Evaluation (Predicate "P") (List (Concept "T") (Concept "S")))

That is, not only are S and T swapped, but everything that contained S and T gets recreated, with the swapped variant.

Not implemented. The current replacement scheme runs via a static lookup table. The above requires a dynamic replacement, pulling out values and plugging them in. (.. well, and now dynamic replacement is in the codebase, as part of the named-top-clause work. It's ugly and RAM-wasteful, but it works. See the c++ JoinLink::_need_top_map boolean, and all the code that it wraps.)

Height examples

Examples for looking at different heights.

HeightLink

Not implemented. Propose creating a HeightLink which constrains the heights of trees. For example, suppose the AtomSpace contiained

(Member (Concept "A") (Concept 'stuff"))  ;; "A" is one level below Member
(Evaluation (Predicate "foo") (List (Concept "A")))  ;; "A" is two levels below Evaluation
(Evaluation (Predicate "foo") (List (Set (Concept "A")))) ;; "A" is three levels below Evaluation

Suppose one wanted only the middle one. One would then

(UpperSet
   (VariableList
      (TypedVariable (Variable "X") (Signature (Concept "A")))
      (TypedVariable (Variable "$top") (Type 'JoinLink)))
   (Equal (Number 2) (HeightLink (Variable "$top") (Concept "A")))

Comments: Not implemented. It seems easy enough to do. If you want this, open an issue in github!