GlobNode

From OpenCog
Jump to: navigation, search

The GlobNode is a type of VariableNode that can match multiple successive atoms during pattern matching. A normal variable node can only match a single atom. See glob on wikipedia for a definition of globbing.

Example

The pattern immediately below, re-writes "I * you" to "I * you too".

  (BindLink
  (ListLink
     (ConceptNode "I")
     (GlobNode "$star")
     (ConceptNode "you"))
  (ListLink
     (ConceptNode "I")
     (GlobNode "$star")
     (ConceptNode "you")
     (ConceptNode "too")))

When applied to this:

(ListLink
  (ConceptNode "I")
  (ConceptNode "really")
  (ConceptNode "totally")
  (ConceptNode "need")
  (ConceptNode "you"))

it will produce the output

(ListLink
  (ConceptNode "I")
  (ConceptNode "really")
  (ConceptNode "totally")
  (ConceptNode "need")
  (ConceptNode "you")
  (ConceptNode "too"))

Typed globs

The current, default implementation of the GlobNode matches one or more sequential atoms in a list. However, there are plausible use cases where one may want to match zero or more times, or match no more than N times. This section describes an unimplemented proposal for how this could be done.

The core insight of the proposal is to use the TypedVariableLink, and all of its accompanying features, to specify how the GlobNode should work, and what it should match.

The example below specifies a Glob that must be matched at least twice, but no more than three times:

TypedVariableLink
    GlobNode   "$foo"
    IntervalLink
        NumberNode  2
        NumberNode  3

It makes use of the IntervalLink to specify a numeric interval. This can be used with the usual type specification mechanism. Thus,

TypedVariableLink
    GlobNode   "$foo"
    IntervalLink
        NumberNode  2
        NumberNode  3
    TypeNode "ConceptNode"

indicates that either two or three matches must be made, and the matching type must be ConceptNode. In place of TypeNode here, it should also be possible to use TypeChoiceLink, SignatureLink, and so on.

Greedy Matching

By default, matching is greedy. Non-greedy matches could be specified by a (not yet implemented) LazyGlobNode or explicitly demanded by a GreedyGlobNode.

Thus, for example,

    LazyGlobNode   "$foo"

would be a GlobNode that did lazy matching, while GreedyGlobNode would be the same thing as a GlobNode.

But perhaps a better API would be to invent a TypeStyleNode or a TypePredicateNode and then express greediness by stating:


TypedVariableLink
    GlobNode   "$foo"
    TypeStyleNode "lazy"

I think I like TypeStyleNode more than LazyGlobNode because its more "future-proof" -- more flexible for future enhancements.

Eliminating GlobNodes

We could use the above to eliminate GlobNodes entirely, and instead write:


TypedVariableLink
    VariableNode   "$foo"
    TypeSetLink
        TypeStyleNode "greedy"
        TypeStyleNode "glob"

which would mean exactly the same thing as GlobNode "$foo". Just an idea ...

Specifying multiple constraints

By using the TypeSetLink, one can specify a general set of typing constraints. So for example:

TypedVariableLink
    GlobNode   "$foo"
    TypeChoiceLink
        TypeSetLink
            IntervalLink
                NumberNode  2
                NumberNode  3
            TypeNode "ConceptNode"
            TypeStyleNode "lazy"
        TypeSetLink
            IntervalLink
                NumberNode  7
                NumberNode  8
            TypeNode "PredicateNode"
            TypeStyleNode "greedy"

Which states that there could be either 2 or 3 matches to ConceptNode, but that these should be lazy (so, taking 2 matches, if possible) or 7 or 8 matches to PredicateNode (being greedy, taking 8 matches if possible). The TypeSetLink is used instead of SetLink, to make it clear that its not just any set, but a set of type specifications. Hopefully, that makes it easier to read and understand such expressions.