Pattern Mining via Evolutionary Learning
There are two major applications of evolutionary learning in OCP:
- procedure learning (learning procedures to control behaviors in the world, or control schemata for processes like inference or attention allocation)
- pattern mining — finding predicates recognizing patterns that are observable in the AtomSpace.
Within the scope of pattern mining, there are two approaches:
- supervised learning: given a predicate, finding a pattern among the entities that satisfy that predicate.
- unsupervised learning: undirected search for "interesting patterns".
The supervised learning case is easier and we have done a number of experiments using MOSES for supervised learning, on biological (microarray gene expression and SNP) and textual data. In the OCP application, the "positive examples" are the elements of the SatisfyingSet of the predicate P, and the "negative examples" are everything else. This can be a relatively straightforward problem if there are enough positive examples and they actually share common aspects ... the trickiness comes, of course, when the common aspects are, in each example, complexly intertwined with other aspects.
The unsupervised learning case is even trickier. The main problem issue here regards the definition of an appropriate fitness function. We are searching for "interesting patterns." So the question is, what constitutes an interesting pattern?
Evolving Interesting Patterns
Clearly, "interestingness" is a multidimensional concept. One approach to defining it is empirical, based on observation of which predicates have and have not proved interesting to the system in the past (based on their long-term importance values, i.e. LTI).
In this approach, one has a supervised categorization problem: learn a rule predicting whether a predicate will fall into the interesting category or the uninteresting category. Once one has learned this rule, which has expressed this rule as a predicate itself, one can then use this rule as the fitness function for evolutionary learning evolution.
There is also a simpler approach, which defines an objective notion of interestingness. This objective notion is a weighted sum of two factors:
- Surprisingness of truth value.
Compactness is easy to understand: all else equal, a predicate embodied in a small Combo tree is better than a predicate embodied in a big one. There is some work hidden here in Combo tree reduction; ideally, one would like to find the smallest representation of a given Combo tree, but this is a computationally formidable problem, so one necessarily approaches it via heuristic algorithms.
Surprisingness of truth value is a slightly subtler concept. Given a predicate, one can envision two extreme ways of evaluating its truth value (represented by two different types of ProcedureEvaluator). One can use an IndependenceAssumingProcedureEvaluator, which deals with all AND and OR operators by assuming probabilistic independence. Or, one can use an ordinary EffortBasedProcedureEvaluator, which uses dependency information wherever feasible to evaluate the truth values of AND and OR operators. These two approaches will normally give different truth values — but, how different? The more different, the more surprising is the truth value of the predicate, and the more interesting may the predicate be.
In order to explore the power of this kind of approach, we have tested pattern mining using BOA on Boolean predicates as a data mining algorithm on a number of different datasets, including some interesting and successful work in the analysis of gene expression data, and some more experimental work analyzing sociological data from the National Longitudinal Survey of Youth (NLSY).
A very simple illustrative result from the analysis of the NLSY data is the pattern:
OR (NOT(MothersAge(X)) AND NOT(FirstSexAge(X))) (Wealth(X) AND PIAT(X))
- being the child of a young mother correlates with having sex at a younger age;
- being in a wealthier family correlates with better Math (PIAT) scores;
- the two sets previously described tend to be disjoint.
Of course, many data patterns are several times more complex than the simple illustrative pattern shown above. However, one of the strengths of the evolutionary learning approach to pattern mining is its ability to find simple patterns when they do exist, yet without (like some other mining methods) imposing any specific restrictions on the pattern format.