Reimplementing MOSES in the Atomspace Atop the Pattern Miner
Currently, OpenCog’s MOSES “probabilistic evolutionary learning” component is implemented separately from the Atomspace (OpenCog’s core knowledge store), although one can export programs learned via MOSES into the Atomspace. This is acceptable but is an imperfect design that persists for historical reasons.
It seems to me that one could use the Pattern Miner infrastructure, with only fairly modest modification, to implement a variety of MOSES that operates in the Atomspace. This would have various advantages, including enabling the hybridization of pattern mining and MOSES in various ways.
Observe the basic parallel:
- the Pattern Miner works by taking an ensemble of patterns, and growing them into larger patterns by adding more Atoms to them
- MOSES works by taking an ensemble of program trees (the deme exemplars) and growing them into larger program trees by adding more nodes to them
The key difference is that in the PM case, the new Atoms being added are drawn from the Atomspace (with a goal of making the pattern one that is highly frequent, or surprising, across the Atomspace). Whereas in the MOSES case, the new nodes being added are “made up” with a goal of making the program maximize a certain externally given fitness function. (However, in an EDA-based version of MOSES, the new nodes added are drawn from a certain probability distribution, which is one that was mined by analysis of the program tree ensemble, with a view toward finding patterns of nodes that lead to high fitness).
If we view frequency/surprisingness as one fitness function among many; and “forming patterns already existing in the Atomspace” as one heuristic for pattern-expansion among many … then we can see the PM and MOSES as aspects of the same basic dynamic.
A representational difference is that in the PM, all the patterns in the ensemble are represented as parts of the same data structure, the H-Graph (a collection of Atoms in a special Atomspace). In MOSES the programs in the ensemble are stored as distinct objects, with no attempt to factor out the redundant subprograms in the representation (though redundant subprograms are identified in the EDA step).
In MOSES, Reduct is used to do program normalization. This concept does not exist in the current PM. However, it would enhance the PM considerably, via enabling semantically identical but syntactically different instances of a pattern to be recognized as identical.
EXPANDED PATTERN MINER ALGORITHM
Incorporating these ideas, the unified/expanded pattern miner algorithm would look like:
- Start with an ensemble of simple hypergraphs, constructed according to template T (which is itself represented as a hypergraph)
- Iterate through the hypergraphs H in the ensemble, expanding each hypergraph H into a set of larger hypergraphs, by applying an expansion template T1 to some Atoms within H. T1 is a pattern with variables, and some of the variables in T1 are bound to Atoms in H whereas others are bound to other Atoms not in H.
- The non-H Atoms filling in the variable slots of T1 are chosen from a certain probability distribution D, the nature of which is pre-specified as part of the configuration of the algorithm. For instance, D could draw from the Atomspace, meaning that only expansions of H that already exist somewhere in the Atomspace would be chosen. Or D could be proportional to the estimated fitness of the expansion of H, according to some fitness estimation method (which could be based on pattern-analysis of the growing pattern-ensemble itself; more on this below). Crossover is another possible expansion distribution.
(Note that the previous two steps could be implemented using SampleLink, i.e.
SampleLink application of constraint T1 in the context of H distribution D
- Once a particular expansion of H, call it H*, is formed, then its quality is evaluated by a pre-specified quality measure. One possible quality measure is the frequency or surprisingness of H* in the Atomspace (this is the PM approach). Another possible quality measure is the evaluation of H* according to some externally specified fitness function (this is the MOSES case).
- Expansions leading to relatively poor-quality H* are pruned.
This process encompasses both PM and MOSES and other possibilities. To run the process one specifies a template pattern for the initial population, a template pattern and distribution for pattern-expansion, and a quality measure for expanded-pattern assessment.
In some uses of MOSES, different demes involve different “feature sets.” This logic would go into the distribution for pattern-expansion: each feature-set is a different pattern-expansion distribution. The “feature selection inside MOSES” logic comprises a scheme wherein: the feature-sets leading to successful expansion operations on H, are tweaked and then the tweaked versions are used for a percentage of the new expansion operations on H.
Finally, an elegant aspect of this implementation of MOSES would be that: the EDA distribution-mining aspect of MOSES would be implemented using pattern mining on the evolving H-Graph (the program tree ensemble). Pattern mining would be applied to the ensemble to find patterns (expressed as Atoms) that occur surprisingly frequently in highly “fit” (high quality) patterns in the H-Graph. These patterns would then be used to constitute the pattern-expansion distribution of the H-Graph.
Reduction could be implemented as an operation occurring on a newly-expanded pattern H, immediately after H’s quality is evaluated (unless the choice is made to prune H immediately). As in MOSES, reduction would make pattern mining on the H-Graph work better. In a PM application, reduction could either be done in this way or done on the Atomspace as a whole, in advance.
The reason I mention this now is that we are shortly going to embark on some improvements to the Pattern Miner implementation. One possible way to do this would be to
- implement SampleLink
- modify the Pattern Miner to use SampleLink, and to use the abstract workflow implied in the above “expanded Pattern Miner” description
- then, in parallel with using the refactored Pattern Miner for pattern mining, build a new MOSES on top of the expanded Pattern Miner infrastructure
The Backward Chainer can probably be framed that way as well.
As currently implemented, the BC evolves atomese programs, where each program encodes a specific forward chaining strategies (FCS for short). There is a difficulty though, there are no easy way to evaluate the fitness of a FCS, either it proves the target or it doesn't. Or let's say that the fitness landscape is extremely chaotic, some FCS may prove nothing at all, while a tiny variation of it may prove our target completely.
BUT this can be overcome by meta-learning, i.e. learning a measure of success for FCS that are half-way there. So in this framework meta-learning would be used to reshape the fitness function, from a crisp chaotic one to a smooth regular one.