From OpenCog

Attention Allocation and Credit Assignment

Attention Allocation

The critical factor shaping real-world general intelligence is resource constraint. Without this issue, we could just have simplistic program-space-search algorithms like AIXItl instead of complicated systems like the human brain or OCP. Resource constraint is managed implicitly within various components of OCP, for instance in the finite population size used in PEL algorithms, and the finite depth of forward or backward chaining inference trees in PLN. But there is also a component of OCP that manages resources on a global and cognitive-process-independent manner: the attention allocation component.

The general principles the attention allocation process should follow are easy enough to see: History should be used as a guide, and an intelligence should make probabilistic judgments based on its experience, guessing which resource-allocation decisions are likely to maximize its goal-achievement. The problem is that this is a difficult learning and inference problem, and to carry it out with excellent accuracy would require a limited-resources intelligent system to spend nearly all its resources deciding what to pay attention to and nearly none of them actually paying attention to anything else. Clearly this would be a very poor allocation of an AI system's attention! So simple heuristics are called for, to be supplemented by more advanced and expensive procedures on those occasions where time is available and correct decisions are particularly crucial.

In this page and its children

We will describe how these attention allocation issues are addressed in the OCP design. Concretely, they are addressed via a set of mechanisms and equations for dynamically adjusting importance values attached to Atoms and MindAgents. Different importance values exist pertinent to different time scales, most critically the short-term (STI) and long-term (LTI) importances.

Two basic innovations are involved in the mechanisms attached to these importance values:

  • treating attention allocation as a data mining problem: the system records information about what it's done in the past and what goals it's achieved in the past, and then recognizes patterns in this history and uses them to guide its future actions via probabilistically adjusting the (often context-specific) importance values associated with internal terms, actors and relationships, and adjusting the "effort estimates" associated with Tasks
  • using an artificial-economics approach to update the importance values (attached to Atoms, MindAgents, and other actors in the OCP system) that regulate system attention

The integration of these two aspects is crucial. The artificial economics approach allows the system to make rough and ready attention allocation judgments in real time, whereas the data mining approach is slower and more resource-intensive but allows the system to make sophisticated attention allocation judgments when this is judged to be worth the effort.

In its particulars, this sort of thing is definitely not exactly what the human brain does, but we believe this is a case where slavish adherence to neuroscience is badly suboptimal. Doing attention allocation entirely in a distributed, formal-neural-nettish way is, we believe, extremely and unnecessarily inefficient, and given realistic resource constraints it necessarily leads to the rather poor attention allocation that we experience every day in our ordinary waking state of consciousness. Several aspects of attention allocation can be fruitfully done in a distributed, neural-nettish way, but not having a logically centralized repository of system-history information (regardless of whether it's physically distributed or not) simply can't be a good thing in terms of effective attention allocation. And we argue that, even for those aspects of attention allocation that are best addressed in terms of distributed, vaguely neural-nettish dynamics, an artificial-economics approach has significant advantages over a more strictly neural-net-like approach, due to the greater ease of integration with other cognitive mechanisms such as forgetting and data mining (more on this later).

A note on the role of inference in OCP attention allocation may be of value here. Although we are using probabilistic inference for attention allocation (along with other tools), the nature of this application of PLN is different from most other uses of PLN in OCP. Here — unlike in the ordinary inference case — we are not directly concerned with what conclusions the system draws, but rather, with what the system bothers to try drawing conclusions about. PLN, used in this context, effectively constitutes a nonlinear-dynamical iteration governing the flow of attention through the OCP system.

But, nevertheless, one should not underestimate the impact of attention allocation on the inferential conclusions that the OCP system draws. For example, if dynamics causes X to get such a low long-term-importance value that it's forgotten, then X will never get a chance to be used in reasoning again, which may lead the system to come to significantly different conclusions. Furthermore, importance updating and credit assignment guided schema execution guide the formation of new schemata and predicates, which are then incorporated into the system's declarative knowledge base by inferred higher-order links.

The structure of these wiki pages on attention allocation is as follows. After some brief conceptual comments, the semantics of STI and LTI values are discussed, with special attention to the role of LTI in the forgetting process. Then, the data mining approach to importance estimation is described, in terms of the definition of the SystemActivityTable that collects information about the OCP system's activities and its progress toward its goals. Finally, the economic approach to importance updating is described, including how it may incorporate information from the data mining approach when available.

Philosophy of Importance Updating

The "attention allocation" process — embodied in OCP in the process of dynamically adjusting "importance values" associated with Atoms, MindAgents and other OCP actors — is very simple: things that are important to the mind at any given time, should undergo more cognitive processing. And, things that are estimated likely to be important to the mind in the future, should be retained in memory.

But how do you tell what's important to the mind at a given time? Or what is likely to be important to the mind in the future? There are several aspects.

First, the external world can give relevant indications — something directly related to perception or action can demand attention.

Second, there is Peirce's Law of Mind — the heuristic that, if X is important, everything related to X should become a little more important. And, as a corollary, if several parts of a certain "holistic pattern" are important, then the other parts of the pattern should become important.

Finally, there is goal-driven importance. If something has to be thought about in order to achieve an important goal, then it should become important. Critical here is the perennial systemic goal of keeping the mind full of valuable relationships: if thinking about X has been valuable lately, then X should be a good candidate to be thought about again in the near future, unless other reasons override.

OCP's importance updating process provides explicit mechanisms that take all these aspects into account.

In the brain, of course, the importance updating process is achieved by a complex combination of chemicals acting in different parts of the brain in response to various signals. But neurophysiology and neuropsychology really give us very little specific guidance as to how this process works, except to suggest various general factors that are involved. We have designed our approach via a combination of introspective intuition, pragmatic simulation-running and hand-calculation, and practical experience with importance updating in OCP itself.

The closest analogue of the importance updating process in symbolic AI systems is the "blackboard system." Such a system contains an object called the "blackboard," onto which the most important entities in the system go. These systems are acted on in appropriate ways, the least important of them are removed from the blackboard, and new entities are placed onto the blackboard. What is unique about the OCP approach to importance updating is the way it integrates blackboard-system-style functionality with the nonlinear dynamics of huge population of agents.

Basic Atom Importance Measures

There are many kinds of patterns to be detected regarding the importance of attending various Atoms and sets of Atoms in various contexts. In some cases it is worth the system's while to study complex and subtle patterns of this form very closely. We will discuss the recognition of subtle attentional patterns later on. In many cases, though, only the simplest level of analysis can be done, for reasons of computational resource preservation.

Every Atom in OCP has a minimum amount of importance-analysis done on it, the result of which is contained in two numbers: the Short-Term Importance (STI) and the Long-Term Importance (LTI), which are contained in the AttentionValue objects associated with Atoms. STI and LTI are also associated with other OCP actors such as MindAgents, but for the moment we will discuss them only in the context of Atoms.

Roughly speaking, for Atoms, STI determines CPU allocation, in cases where a rough generic measure is needed. More concretely, many MindAgents use STI to make a default decision regarding which Atoms to pay attention to in the context of a particular cognitive process. Long-Term Importance (LTI), on the other hand, is used to determine which nodes will be swapped out of RAM and onto disk, or deleted altogether. These numbers give a very coarse view of the value of attending the Atom in various circumstances, but they're far better than nothing; and collecting and calculating finer-grained information is often not practical for computational resource reasons.

The von Neumann architecture is obviously rearing its ugly head here: The split between Short-Term Importance and Long-Term Importance reflects the asymmetry between processor time and memory space in contemporary computers. Importance determines how much processor time an Atom should get; Long-Term Importance determines whether it should be allowed to remain in memory or not. The equations governing these quantities are tuned so that STI is allowed to change more quickly than LTI, because it's easy to de-allocate processor time to an Atom and then reallocate it again later. LTI can't be as volatile, because once something is expelled from RAM and saved to disk, it's fairly computationally expensive to get it back. (In some cases it's very expensive to get it back — if one really wipes all direct trace of it from memory. On the other hand, if one keeps an index to an on-disk Atom without any other information about the Atom, then retaining it from disk is slow only because of disk access time rather than because of search effort.)

It can also be useful to consider further sorts of importance beyond STI and LTI. For instance, one may introduce VLTI (Very Long Term Importance), defined as the importance of saving an Atom on disk rather than deleting it permanently. VLTI should be even less volatile than LTI as permanent deletion is a very serious decision. However even disk space is not unlimited, and even more crucially, saving Atoms to disk consumes powerful processor time; so, for instance, it may make sense in some configurations for most perceptual data to have VLTI=0, so that when its LTI decays to 0 it is simply deleted rather than expending further effort by being saved. For most of the following we will discuss only STI and LTI; but VLTI may be handled identically to LTI, though with different parameter values, and this will be mentioned occasionally as appropriate.

How LTI and STI levels of Atoms are assigned and adjusted is a subtle issue to be discussed in later sections of this chapter.

Long-Term Importance and Forgetting

The LTI of an Atom indicates how valuable it is to the system to keep the Atom in question in memory. That is, it is roughly interpretable as the expected usefulness of having A in memory in the future. This has to do with the probability:

P(A will be useful in the future)

The main point of calculating this number and keeping it around is to guide decisions regarding which Atoms to remove from memory (RAM) and which ones to retain.

How does the forgetting process (carried out by the Forgetting MindAgent) work? The first heuristic is to remove the Atoms with the lowest LTI, but this isn't the whole story. Clearly, the decision to remove an Atom from RAM should depend on factors beyond just the LTI of the Atom. For example, one should also take into account the expected difficulty in reconstituting A from other Atoms. Suppose the system has the relations:

dogs are animals

animals are cute

dogs are cute

and the strength of the third relation is not dissimilar from what would be obtained by deduction and revision from the first two relations and others in the system. Then, even if the system judges it will be very useful to know dogs are cute in the future, it may reasonably choose to remove dogs are cute from memory anyway, because it knows it can be so easily reconstituted. Thus, as well as removing the lowest-LTI Atoms, the Forgetting MindAgent should also remove Atoms meeting certain other criteria such as the combination of:

  • low STI
  • easy reconstitutability in terms of other Atoms that have LTI not less than its own

The goal of the "forgetting" process is to maximize the total utility of the Atoms in the AtomSpace throughout the future.