From OpenCog
Jump to: navigation, search

Cognitive Configuration

This page discusses "cognitive configuration," or the particular use of the distributed computing architecture provided by the Mind OS to enable effective partitioning of cognitive processes across multiple Units.

The need for cognitive configuration is rooted in a familiar philosophical point: the difficulty for pragmatic AGI posed by the limited computational capability of realizable substrates for intelligence. The assumption of massive processing power renders the AI problem trivial. Limitations in processing power require practical intelligent systems to adopt a host of heuristics that constrain generality and flexibility but permit reasonably flexible general intelligence together with effective specialized intelligence in selected domains. And, one reflection of this notion of generalization/specialization balance is the need for functional specialization. In practical terms, what this means is that intelligent systems will generally be divided into parts, each part being concerned with some particular sort of task, some particular aspect of intelligence. In abstract terms, this can be viewed in the SMEPH framework in terms of cluster hypergraphs.

A partitioned hypergraph is simply a hypergraph that is approximately divided into sub-hypergraphs, with the property that, on average, each node has a lot more nontrivial-strength connections to other nodes within its sub-hypergraph than to nodes outside it. This is a purely structural notion. A cluster hypergraph, then, is a partitioned hypergraph in which the set of vertices associated with each partition are roughly clustered together. In general, clustering-together may be measured in terms of any metric defined on the set of vertices. So there tend to be a lot of "nearby" nodes within a partition, and not so many between partitions.

In a SMEPH context, the metric is the Jaccard metric associated with SimilarityLinks between ConceptEdges and SchemaEdges. So in a clustered SMEPH hypergraph, each partition, roughly, deals with a distinct sort of static and dynamic patterns.

Hypergraph clustering may occur on many different levels. The notion of functional specialization as we'll discuss it here has to do mostly with the highest level of clustering — with the division of an intelligent system into a set of modules dealing with distinct sorts of declarative and procedural knowledge. But the same notion also applies on finer levels of granularity.

As an example of functional specialization, the human brain is roughly divided into regions dealing with language processing, regions dealing with visual perception, regions dealing with temporal perception, regions dealing with motor control, and so forth. The consequence of this brain-level partitioning is that the derived hypergraph of the brain is correspondingly partitioned — and the structures and dynamics of the ConceptEdges and SchemaEdges corresponding to the different regions of the brain tend to cluster together (language Concepts/Schemata in one partition, vision Concepts/Schemata in another partition, etc.).

The key reason why this heuristic of functional specialization is valuable for intelligence is that solving hard problems is often rendered easier by hierarchical modular problem decomposition. Functional specialization is basically the implication, on the whole-intelligent-system level, of the simple problem-solving heuristic of breaking a hard problem down into components, solving each component, and then piecing together the componentwise solutions to form a whole solution. This heuristic doesn't work for all hard problems, of course. But we conjecture that if there's a hard problem this doesn't work for, then it's incredibly unlikely that an intelligence will evolve to solve this problem (because evolution tends to create somewhat modular solutions), and it's also unlikely that human beings will design a system to solve the problem (because humans naturally think in terms of modular-breakdown heuristics).

The term we have adopted for the construction of appropriate functionally specialized sub-units (analogous to functionally specialized brain regions) is cognitive configuration. The OCP design reflects the need for cognitive configuration directly, via the notion of ComplexAtomspaces. In a complex OCP configuration, each part of a multipart AtomSpace is supplied with MindAgents and Atoms that cause it to correspond to a certain specialized domain — which, assuming the dynamics unfolds as intended, causes the derived hypergraph of the system to take the form of a SMEPH cluster hypergraph.

More technically, in terms of the OCP design, cognitive configuration has several aspects:

  • How the overall AtomSpace is divided into parts (assuming a multipart AtomSpace, the general case)
  • What types of Atoms are initially placed into each part
  • What particular Atoms (from what data sources) are placed into each part
  • Which of the general CIM-Dynamics are placed into each part, and what parameters they're given
  • What specialized new CIM-Dynamics, if any, are placed into each part (this comes up only occasionally)

All this is "CIM configuration" as distinct from underlying hardware configuration. There are relations between CIM configuration and underlying hardware configuration, but they're not strict ones; the same CIM configuration may be realized hardware-wise in many different ways. Generally speaking, each pat of the multipart AtomSpace will correspond either to a single machine or to a tightly-connected cluster of machines.

In addition to the main reason for functional specialization mentioned above — the general power of the hierarchical-decomposition heuristic — there are also other reasons for constructing a OCP with a complex cognitive configuration, instead of just a big, teeming, self-organizing AtomSpace.

The first reason is related to hardware configuration. Messaging between machines in a distributed-computing system is relatively slow, and if we can break much of the business of mind down into a collection of Units many of which can operate on individual machines, then — given the realities of von Neumann hardware — we'll have a more efficient OCP.

The second reason is purely cognitive in nature. Each collection of CIM-Dynamics has its own emergent nature as a dynamical system. The dynamics of two loosely connected Units is not going to be the same as the dynamics one would get if one threw all the Atoms and CIM-Dynamics from the two Units into the same big pool. In some cases, it seems, the loosely connected dynamics are more cognitively desirable. We will consider some examples of this below. For instance, it may be useful to have a Unit in which inference acts very speculatively, and another one in which it acts very conservatively. Mixing these two up will result in a Unit in which inference acts moderately speculatively — but consistently moderately speculative inference does not generally lead to the same results as the combination of conservative and highly speculative inference. In fact, there is ample psychological evidence indicating that it is important to creativity for the mind to have some kind of partially isolated subsystem in which anything goes.... (see Ben Goertzel's book "From Complexity to Creativity").

The basic cognitive configuration to be used for an embodied learning focused OCP is described in the overall architecture diagram in the Introduction. Elaboration of aspects of the configuration described there is given in later chapters, where for instance the cognitive arguments underlying the AttentionalFocus Unit are described (very loosely, this corresponds to the notion of "working memory" in the human mind: it is a collection of Atoms that are marked as extremely important and deserving of intensive processing on a pool of dedicated machines).

In the remainder of this section we discuss a couple critical ways in which the cognitive configuration described in the Figure may be extended. Some potential extensions are fairly basic, such as:

  • A Psynese Unit, for dealing with interactions with other OCPs using the Psynese language, a scheme that's been developed for separate OCP systems to communicate via exchanging sets of Atoms directly, rather than by exchanging information in English, Lojban or other human languages
  • A "speculation" Unit, similar to the GlobalAttentionalFocus Unit, but with parameters of CIM-Dynamics specifically tuned for the creation of potentially interesting low-confidence Links and hypothetically interesting Nodes.

Others are more involved and subtle.

Multiple Interaction Channels

The architecture diagram given in CognitiveArchitecture is somewhat simplified in that it shows a single "interaction channel" for interaction with the outside world. This is correct for a OCP instance that is specialized to control a single agent in a simulation world but is not the most general case. One big difference between humans and OCPs is that, in principle, a OCP can carry on a large number of separate interactions at the same time — for instance, a single OCP could control a large number of agents in various different simulation worlds, along with a number of physical robots and some disembodied chatbots. The human brain is not designed for this kind of multitasking but OCP, with a proper cognitive configuration, can be. In OCP lingo, we describe this difference by saying that a OCP can have several interaction channels, not just one like humans have.

Each interaction channel (InteractionChannels), in an interaction-configured OCP, should come along with a dedicated AttentionalFocus. There may also be a need for additional CIM-Units corresponding to different sensation and action modalities. For instance, in an embodied-agent-control scenario, an InteractionChannel would correspond to a single agent being controlled. If a single OCP instance were controlling several agents, this would correspond to multiple InteractionChannels. But a single InteractionChannel may wind up dealing with a number of qualitatively different modalities, such as vision, hearing, smell, rolling on wheels, arm movement, and noisemaking. Some of these modalities may be sufficiently complicated as to require a whole specialized Unit on their own, feeding into the channel-specific AttentionalFocus (as opposed to simpler modalities which may just feed data directly into the channel-specific AF). The prototypic example of a complex sensory modality is vision. Action-wise, for instance, rolling on wheels clearly doesn't require its own Unit, but merely the execution of simple hard-wired schemata; but, moving legs with complexity comparable to human legs may require enough complexly coordinated decisions to merit its own Unit.

A channel-specific AttentionalFocus Unit is dedicated to rapid analysis of things that are important as regards the particular interaction channel in question. This is a different CIM-Unit from the global AF of the whole system, because what's important in the context of the current conversation isn't necessarily what's important to the long-term goals of the system. On the other hand, there is still a need for a part of the mind that embodies a deep and thoroughgoing quasi-real-time integration of all mental processes: which is the global AF. The AF bifurcation enables the system to respond in a rapid way to perceptual stimuli (including conversations), without interrupting systematic long-term thought about important things. In the human mind there's a constant tension between (the analogues of) interaction-specific AF and global AF, but there's no need to port this kind of confusion into the OCP design.

Self-Modification Configuration

Next, what if we want a OCP that can modify its own CIM-Dynamics? This is the first step toward a deeply, fully self-modifying OCP. This is provided in the architecture diagram by a "control schema learning" Unit — but in the most general case, things become more complex than that. To enable maximally powerful and general self-modification, we need a configuration that is a step up in abstraction from what we've been discussing so far. We need to introduce the notion of a UnitSet — a set of Units that work together, each forming a coherent group carrying out cognitive processes in a collective way.

Define a schema-adaptive configuration as follows:

  • A collection of UnitSets, each one devoted to carrying out some set of standard cognitive tasks
  • A process devoted to controlling these experimental Units: feeding them new cognitive schema and new tasks, and recording their performance
  • An Unit devoted to thinking about candidate cognitive schema to try out in the population of evolving UnitSets
  • An AttentionalFocus Unit devoted to thinking hard about extremely good candidate cognitive schema to try out in the population of evolving UnitSets

This is a generic self-modification-oriented configuration, devoted to the learning of new cognitive schema.

Now, in a schema-adaptive configuration, the CIMDynamics are fixed. The next step is to do something similar for CIM Dynamics. One may define a CIMDynamic-adaptive configuration as

  • A collection of UnitSets, each one devoted to carrying out some set of standard cognitive tasks
  • A process devoted to controlling these experimental Units: feeding them new CIMDynamics and new tasks, and recording their performance
  • A Unit devoted to thinking about candidate CIMDynamics to try out in the population of evolving UnitSets
  • An AttentionalFocus Unit devoted to thinking hard about extremely good candidate CIMDynamics to try out in the population of evolving UnitSets

One may define a fully adaptive configuration similarly, except that each UnitSet in the population is not merely adapting CIMDynamics, it is adapting other aspects of its sourcecode. This requires recompilation of each experimental UnitSet at the time it is launched. Of course, at this stage, one may arrive at a new UnitSet whose evolved code implies discarding the whole framework of UnitSets, evolutionary programming, and so forth — although it is unlikely that this will be an early-stage result.

Finally, it is clear that if one wanted to take the next step and have the system modify its own source code, the same kind of architecture could be used. Instead of experimental Units, one would have experimental OCPs, and the controller process would monitor and launch these OCPs. Analysis of the results of experimentation, and creation of new candidate OCPs, would proceed basically as in the picture sketched above. One merely has a learning problem that's an order of magnitude more difficult. Clearly, it makes sense to have the CIM-Dynamics of the current (or a near descendant) OCP implementation optimized as far as possible through self-modification, before turning this optimized OCP to the much harder task of modifying its own source.

By this point, we have gone rather far out into speculative-design-space, relative to the current state of practical OCP experimentation. However, we believe it's important to go this far out at the design phase, in order to determine, with reasonable confidence, whether one's AGI design is in principle capable of going all the way. We believe that the OCP design is sufficiently flexible and sufficiently powerful to support full, radical self-modification. We know exactly what OCP structures and dynamics will be useful for radical self-modification, what configuration will be required, and so forth. The path ahead is fairly clearly visible, as are the various obstacles along it.

Use of Globally Distributed Processing

The role of globally distributed computing in OCP's quest for intelligent self-modification is a supporting one, rather than a central one, but is worth briefly noting nonetheless. The evolution of ensembles of slightly varying OCP systems is something that must take place on powerful dedicated machines, at least in the relatively near future. However, much of the schema learning that is involved in self-modification can be farmed out to a population of weaker machines. For instance, suppose the system wants to learn a better deductive reasoning rule; this may involve a great deal of empirical testing of candidate deductive reasoning rules on datasets, which is a kind of operation that is easily farmed out to a globally distributed network.