# OpenCogPrime:PredicateMining

## PredicateNode Mining

We have seen how the natural dynamics of the OCP system, with a little help from special heuristics, can lead to the evolution of Predicates that embody patterns in the system's perceived or inferred world. But it is also valuable to more aggressively and directly create pattern-embodying Predicates. This does not contradict the implicit process, but rather complements it. The explicit process we use is called *PredicateNode Mining* and is carried out by a PredicateNodeMiner CIMDynamic.

Define an Atom structure template as a schema expression corresponding to a OCP Link in which some of the arguments are replaced with variables. For instance,

Inheritance X cat EvaluationLink (eats X) fish

are Atom structure templates. (Note, as an aside, that Atom structure templates are important in PLN inference control.)

What the PredicateNodeMiner does is to look for Atom structure templates and logical combinations thereof which

- Minimize PredicateNode size
- Maximize surprisingness of truth value

This is accomplished by a combination of heuristics.

The first step in PredicateNode mining is to find Atom structure templates with high truth values. This can be done by a fairly simple heuristic search process.

First, note that if one specifies an (Atom, Link type), one is specifying a set of Atom structure templates. For instance, if one specifies

(cat, InheritanceLink)

then one is specifying the templates

InheritanceLink $X cat

and

InheritanceLink cat $X

One can thus find Atom structure templates as follows. Choose an Atom with high truth value, and then, for each Link type, tabulate the total truth value of the Links of this type involving this Atom. When one finds a promising (Atom, Link type) pair, one can then do inference to test the truth value of the Atom structure template one has found.

Next, given high-truth-value Atom structure templates, the PredicateNodeMiner experiments with joining them together using logical connectives. For each potential combination it assesses the fitness in terms of size and surprisingness. This may be carried out in two ways:

- By incrementally building up larger combinations from smaller ones, at each incremental stage keeping only those combinations found to be valuable
- For large combinations, by evolution of combinations

Option 1 is basically greedy data mining (which may be carried out via various standard algorithms), which has the advantage of being much more rapid than evolutionary programming, but the disadvantage that it misses large combinations whose subsets are not as surprising as the combinations themselves. It seems there is room for both approaches in OCP (and potentially many other approaches as well). The PredicateNodeMiner CIM-Dynamic contains a parameter telling it how much time to spend on stochastic pattern mining vs. evolution, as well as parameters guiding the processes it invokes.

So far we have discussed the process of finding single-variable Atom structure templates. But multivariable Atom structure templates may be obtained by combining single-variable ones. For instance, given

eats $X fish lives_in $X Antarctica

one may choose to investigate various combinations such as

(eats $X $Y) AND (lives_in $X $Y)

(this particular example will have a predictably low truth value). So, the introduction of multiple variables may be done in the same process as the creation of single-variable combinations of Atom structure templates.

When a suitably fit Atom structure template or logical combination thereof is found, then a PredicateNode is created embodying it, and placed into the AtomSpace.