# OpenCogPrime:AdaptiveInferenceControl

## Contents

## Adaptive Inference Control

In the previous subsection we have seen an example of what the PLN inference rules can do when applied to a small test database. They can draw plausible conclusions based on the knowledge at hand. But of course, inference on a small test database is quite different from real-world inference. There is little doubt that in principle PLN is capable of complex real-world inference, in the sense that there exist meaningful chains of PLN inferences leading from the data at an embodied AGI system's disposal to the conclusions the system needs to draw to achieve its complex goals. The real question, however, is how the meaningful inference-chains are found from amidst the vast mass of meaningless ones. This is the essence of the question of *inference control.*

It is clear that in humans, inference control is all about context. We use different inference strategies in different contexts, and learning these strategies is most of what *learning to think* is all about. One might think to approach this aspect of cognition, in the OCP design, by introducing a variety of different inference control heuristics, each one giving a certain algorithm for choosing which inferences to carry out in which order in a certain context. However, in keeping with the *integrated intelligence* theme that pervades OCP, we have chosen an alternate strategy for PLN. We have one inference control scheme, which is quite simple, but which relies partially on structures coming from outside PLN proper. The requisite variety of inference control strategies is provided by variety in the non-PLN structures such as

- OpenCogPrime:HebbianLinks existing in the AtomTable.
- Patterns recognized via pattern-mining in the corpus of prior inference trails

### Inference Control in PLN

We will now describe the basic "inference control" loop of PLN in OCP. We will discuss it in the context of backward chaining; the case of forward chaining is very similar.

Given an expression to evaluate via inference (according to any one of the query operations mentioned above), there is a collection of Evaluator objects that matches the expression.

First, each Evaluator object makes a preliminary assessment regarding how likely it is to come up with decent results. This assessment is made based on some preliminary explorations of what Atoms it might use to draw its conclusions — and study of various links (including HebbianLinks) that exist relating to its actions on these Atoms (we will give some examples of this shortly), and information regarding what Evaluators have been useful in what prior inferences in related contexts (stored in a structure called the InferencePatternRepository, to be discussed below).

Then, the overall evaluation process looks at the assessments made by each Evaluator and decides how much computational resource to allocate to each Evaluator. This we may call the "Evaluator choice problem" of inference control.

Finally, each Evaluator chosen, then needs to make choices regarding which Atoms to use in its inference — and this choice must be made by use of existing links, and information in the InferencePatternRepository. This is the "Atom choice problem" of inference control.

As an example of the choices to be made by an individual Evaluator, consider that to evaluate (Inheritance A C) via a deduction-based Evaluator, some collection of intermediate nodes for the deduction must be chosen. In the case of higher-order deduction, each deduction may involve a number of complicated subsidiary steps, so perhaps only a single intermediate node will be chosen. This choice of intermediate nodes must also be made via context-dependent probabilities. In the case of other Evaluators besides deduction, other similar choices must be made.

So the basic inference algorithm we have discussed is basic backward-chaining inference, but aggressive and adaptive pruning using HebbianLinks and other existing knowledge is done at every stage.

#### The Evaluator Choice Problem as a Bandit Problem

The evaluator choice problem, as described above, is an example of a "multi-armed bandit problem" as commonly studied in statistics. The atom choice problem is also a bandit problem. Both of these problems can be approached via using standard "bandit problem" heuristics.

The paradigm bandit problem involves a slot machine ("multi-armed bandit") with N levers to pull, each of which may have a different odds of yielding a reward each time it is pulled. The player's goal is to maximize his earnings, but when he starts out playing with the bandit, he has no idea which levers lead to which levels of reward. So, at any given point, he must balance two factors:

- Exploitation: pulling the level that seems to give maximum reward, based on experience so far
- Exploration: trying out various levers to get a sample of their performance, so as to get more confident estimates of the reward level offered by each lever

Obviously, as more experience is gained, the bias should shift from exploration towards exploitation. There is a substantial body of mathematics regarding bandit problems, but most of the results prove inapplicable in real-world situations due to making inappropriate statistical assumptions. In practice, the two most common algorithms for solving bandit problems are:

- epsilon-greedy: spend 1-epsilon of your time exploiting the best option found so far (with "best" defined in terms of expected reward), and epsilon of your time randomly exploring other options
- softmax: assign a probability to each option, using a heuristics formula based on thermodynamics that assigns higher probabilities to options that have proved more successful in the past, with an exponential scaling that favors successful options non-linearly over unsuccessful ones

The only modification we choose to make to these simple algorithms in the OCP context is to replace the probability with a product:

probability * weight_of_evidence

This evidence-weighted probability may be used within both epsilon-greedy and softmax.

Choosing an Evaluator for an inference step within PLN is a bandit problem, where prior experience is used to provide initial probabilities for the options (which are possible Evaluators rather than levers to pull), and then inferences done using each option are used to provide increasingly confident probabilities for each option. The default approach within OCP is softmax, though epsilon-greedy is also provided and may prove better in some contexts.

In Atom selection, the options (levers) are Atoms to use within an inference rule (an Evaluator).

It is important to note, however, that the statistical nature of the Atom-choice bandit problem is different from the statistics of the Evaluator-choice bandit problem, because there are not that many Evaluators, but there are a lot of Atoms. It would be possible to merge the two into a single bandit problem, whose options were (Evaluator, Atom-set) tuples, but this seems an inferior heuristic approach. The Evaluator bandit problem involves a relatively small number of options, which makes it more tractable. So we have chosen to break down the inference control process into two levels of bandit problems: Evaluator choice and Atom choice.

The complexity of the inference control problem can be well-understood by observing that each individual step poses two difficult statistical inference problems, one nested within the other! What allows pragmatic inferences to be achievable at all is, clearly, prior knowledge, which allows most of the bandit problems occurring in a complex inference to be "pre-solved" via assumption of prior probabilities. Normally, only a handful of difficult steps in an inference need to actually be studied statistically via numerous iterations of the epsilon-greedy or softmax algorithms. On the other hand, the first few inferences in an unfamiliar domain may not connect with the system's knowledge base of prior probabilities, and may thus need to be done in a much slower, more statistically thorough way.

## Chain of Thought

An alternate solution for inference control might be to try to segregate rules into pipeline stages. Thus, one might have two sets of rules: A and B, and we know (either a-priori, or from experience) that no rule in set B will produce results until some rule in set A has already produced results. By structuring the rule sets into such a pipeline, one has an alternate, and very attractive solution to the inference control problem.

In fact, one would want to push this to a hierarchical extreme: so, if rule A5 from set A triggered, we might know that rules B3, B8 and B9 from set B were almost sure to follow, and that rules B1, B2, B4, etc almost never fired. And so on for set C. These kinds of correlations can be discovered via entropy/mutual-information techniques.

This sort of chain-of-thought processing would be most useful for processing data from input sources (*i.e.* from the environment, from reading text, from chat, from VR interactions) where forward-chaining is the appropriate mechanism to transform data. Most processing for most situations would follow well-established, previously-learned chains of thought. Provided that chains of thought are not heavily branched (*i.e.* for any given rule in set A, only a small number of rules in set B follow, *etc.*), then the 'logical deduction' or data processing of input can be performed quite rapidly.

At the same time, one would need to have a learning mechanism running in the background, exploring other possible "chains of thought" to see if they might produce useful results.