Inference Pattern Mining
Among the data used to guide the solution of the Evaluator choice problem, the most important component is explicit information regarding which Evaluators have been useful in which contexts during past inferences.
This information is stored in OCP in a data repository called the InferencePatternRepository — which is, quite simply, a special "data table" containing inference trees and patterns recognized therein. An "inference tree" refers to a tree whose nodes are Atoms (generally: Atom-versions), and whose links are inference steps (so each link is labeled with a certain Evaluator).
Note that, in many cases, PLN creates a variety of exploratory inference trees internally, in the context of doing a single inference. Most of these inference trees will never be stored in the AtomTable, because they are unconfident and may not have produced extremely useful results. However, they should still be stored in the InferencePatternRepository. Ideally one would store all inference trees there. In a large OCP system this may not be feasible, but then a wide variety of trees should still be retained, including mainly successful ones but also a sampling of unsuccessful ones for purpose of comparison.
The InferencePatternRepository may then be used in two ways:
- An inference tree being actively expanded (i.e. utilized within the PLN inference system) may be compared to inference trees in the repository, in real time, for guidance. That is, if a node N in an inference tree is being expanded, then the repository can be searched for nodes similar to N, whose contexts (within their inference trees) are similar to the context of N within its inference tree. A study can then be made regarding which Evaluators and Atoms were most useful in these prior, similar inferences, and the results of this can be used to guide ongoing inference.
- Patterns can be extracted from the store of inference trees in the InferencePatternRepository, and stored separately from the actual inference trees (in essence, these patterns are inference subtrees with variables in place of some of their concrete nodes or links). An inference tree being expanded can then be compared to these patterns instead of, or in addition to, the actual trees in the repository. This provides greater efficiency in the case of common patterns among inference trees.
A reasonable approach may be to first check for inference patterns and see if there are any close matches; and if there are not, to then search for individual inference trees that are close matches.
Mining patterns from the repository of inference trees is a potentially highly computationally expensive operation, but this doesn't particularly matter since it can be run periodically in the background while inference proceeds at its own pace in the foreground, using the mined patterns. Algorithmically, it may be done either by exhaustive frequent-itemset-mining (as in the Apriori or Relim algorithms), or by stochastic greedy mining. These operations should be carried out by an InferencePatternMiner.