GlobNode

From OpenCog
Jump to: navigation, search

The GlobNode is a type of VariableNode that can match multiple successive atoms during pattern matching. A normal variable node can only match a single atom. See glob on wikipedia for a definition of globbing.

Example

The pattern immediately below, re-writes "I * you" to "I * you too".

  (BindLink
  (ListLink
     (ConceptNode "I")
     (GlobNode "$star")
     (ConceptNode "you"))
  (ListLink
     (ConceptNode "I")
     (GlobNode "$star")
     (ConceptNode "you")
     (ConceptNode "too")))

When applied to this:

(ListLink
  (ConceptNode "I")
  (ConceptNode "really")
  (ConceptNode "totally")
  (ConceptNode "need")
  (ConceptNode "you"))

it will produce the output

(ListLink
  (ConceptNode "I")
  (ConceptNode "really")
  (ConceptNode "totally")
  (ConceptNode "need")
  (ConceptNode "you")
  (ConceptNode "too"))

Typed globs

By default, GlobNode matches one or more sequential atoms in a list. One may want to match zero or more times, or match no more than N times. This can be accomplished with the TypedVariableLink, together with IntervalLink.

The example below specifies a Glob that must be matched at least twice, but no more than three times:

TypedVariableLink
    GlobNode   "$foo"
    IntervalLink
        NumberNode  2
        NumberNode  3

It makes use of the IntervalLink to specify a numeric interval. This can be used with the usual type specification mechanism. Thus,

TypedVariableLink
    GlobNode   "$foo"
    IntervalLink
        NumberNode  2
        NumberNode  3
    TypeNode "ConceptNode"

indicates that either two or three matches must be made, and the matching type must be ConceptNode. In place of TypeNode here, it should also be possible to use TypeChoiceLink, SignatureLink, and so on.

Matching an unbounded number of items can be specified by using a negative upper bound, like so:

    IntervalLink
        NumberNode  2
        NumberNode  -1

This specifies a match of 2 or more times, with no upper bound on the number of matches.

Specifying multiple constraints

By using the TypeSetLink, one can specify a general set of typing constraints. So for example:

TypedVariableLink
    GlobNode   "$foo"
    TypeChoiceLink
        TypeSetLink
            IntervalLink
                NumberNode  2
                NumberNode  3
            TypeNode "ConceptNode"
            
        TypeSetLink
            IntervalLink
                NumberNode  7
                NumberNode  8
            TypeNode "PredicateNode"
            

Which states that there could be either 2 or 3 matches to ConceptNode, or 7 or 8 matches to PredicateNode. The TypeSetLink is used instead of SetLink, to make it clear that its not just any set, but a set of type specifications. Hopefully, that makes it easier to read and understand such expressions. (This might not be implemented, yet. If it is, it might not be tested ... it might be broken!)

Variables as GlobNodes

Proposal: the use of intervals should apply to variables, as well as globs. Thus, one should be allowed to write:

TypedVariableLink
    VariableNode   "$foo"
    Interval
        Number 2
        Number 42

which would behave exactly the same way as if the variable was declared as GlobNode "$foo". This is not implemented yet.

If the above was implemented, then the only distinction between variables and globs would be that, by default, variables always match once and only once, while globs, by default, match one or more times.

Greedy vs. lazy matching

Unlike regex globbing, the pattern matcher does not distinguish between greedy and lazy matching; instead, it explores all possible groundings of globs. This is consistent with how other parts of the pattern matcher work: all possible permutations of unordered links are explored; all possible choices of choice links are explored, etc.