DestinOpenCog

From OpenCog
Jump to: navigation, search

NOTE: Most of the ideas on this wiki page have now been written up more carefully and in more detail, in papers that are linked from http://wp.goertzel.org/?p=404

This page presents some new, somewhat speculative but (in the author's opinion) rather interesting ideas about how to map DeSTIN-recognized patterns into OpenCog. This is a controversial approach because it involves some significant changes (simplifications, really) to DeSTIN itself. But it's the best approach I've thought of so far.

(NOTE: After reading this posting, Abram Demski observed that he had made quite similar suggestions to Uniform DeSTIN Part One on an email list previously ... but I believe with regard to other similar architectures like HTM, rather than DeSTIN. But he didn't make the explicit suggestions I've done here regarding pattern mining on state-trees, which is really the key point -- Uniform DeSTIN is proposed here basically as a means to that end.)

Apologies that the discussion on this page is somewhat compressed, and will probably only be comprehensible to someone who already understands how DeSTIN works.

Also, after this page was written, a new page was written suggesting a radical DeSTIN Redesign. This radical redesign would more easily and cleanly support many of the ideas described on this page, but is not necessary to enable these ideas to be realized.

Uniform DeSTIN, Part One

First of all, I'm going to propose a somewhat radical revision of the DeSTIN approach.

In this revision, the algorithmic operation within a single DeSTIN node remains the same, and the overall compositional spatiotemporal deep learning network architecture remains the same (i.e. each node still refers to the same region of the input pixel network as in classic DeSTIN). The basic logic of the CUDA implementation can be about the same too.

Hah, you may be wondering -- so then what could possibly be so different?

My basic suggestion is: All the nodes on the same level of the DeSTIN hierarchy should share the same library of patterns. That is, all the nodes on the same level should share the same list of centroids!

That is what I mean by "uniform DeSTIN."

Note that, in this approach, it's computationally feasible to have a much larger library of patterns utilized by each node. I think this is important, because if one is going to have a DeSTIN network recognize a large number of visual patterns in a great variety of visual scenes, you're going to need a heck of a lot of centroids at the middle and higher level nodes. Using just a few centroids in each node, as has been done in DeSTIN experiments so far, is OK for supervised classification experiments with a relatively small number of categories, but IMO will not work well for use in a robot vision or other AGI-ish context where a vastly greater variety of visual patterns must be recognized.

Suppose we have a nxn pixel grid, where the lowest level has nodes corresponding to 4x4 squares. Then, there are n^2/16 nodes on the lowest level, and on the k'th level there are n^2 / 4^k nodes. This means that, without increasing computational complexity (actually decreasing it), in Uniform DeSTIN we can have a factor of n^2/4^k more centroids on level k.

To see a decrease in computational complexity (with a factor of n^2/4^k more centroids) one would have to use a clever data structure like a cover tree to store the centroids at each level. Then the nearest-neighbor matching of input patterns to the library (centroid) patterns would be very rapid, much faster than linearly comparing the input to each pattern in the list.

The algorithms for updating centroids and matching inputs to centroid can stay the same as in classic DeSTIN.

Conceptual Justification for Uniform DeSTIN

But why is Uniform DeSTIN OK? Why is it the same to use the same library of centroid patterns in every node on the same level?

Because as time goes on and the system sees many things, to within a close enough degree of approximation, the set of visual patterns to be perceived in each kxk region of the input pixel grid, are going to be the same.

For instance, consider a 16x16 pixel subgrid A near the bottom right of the input pixel grid, and a 16x16 pixel subgrid B near the top middle of the input pixel grid. If the input pixel grid is looking at somebody's headshot, and the face is in the center of the pixel grid, then: A will show the person's right shoulder, and B will show his forehead and his hair. But if later the input pixel grid is looking at someone's head from a different angle -- say, looking at the wall above someone's head -- then B may show nothing, and A may show his forehead and hair. If later the system is looking at a person's body, then B may show his shoulder, and A may show part of his arm, etc. In general, there will be few systematic patterns regarding which kinds of visual patterns are shown to which part of the pixel grid.

Generally speaking, it seems to me that: IF the class of images that the system will see is invariant with respect to linear translations, THEN without loss of generality, we can assume that the library of patterns at each node on the same level are the same.

In reality this assumption isn't quite going to hold. For instance, for an eye attached to a person or humanoid robot, the top of the pixel grid will probably look at a person's hair more often than the bottom ... because the person stands rightside-up more often than they stand upside-down, and because they will often fixate the center of their view on a person's face, etc. For this reason, we can recognize our friend's face better if we're looking at them directly, with their face centered in our vision.

However, I suggest that this kind of peculiarity is not really essential. There's no reason you can't have an intelligent vision system that recognizes a face just as well whether it's centered in the visual field or not. (In fact you could easily realize this kind of bias within Uniform DeSTIN if you wanted to, by doing the cross-DeSTIN-state pattern recognition appropriately, but I don't really see the point of that.)

By and large, in almost all cases, it seems to me that in a DeSTIN system exposed to a wide variety of real-world inputs in complex situations, the library of patterns in the different nodes at the same level would turn out to be pretty much the same. Even if they weren't exactly the same, they would be close to the same, embodying essentially the same regularities. But of course, this sameness would be obscured, because centroid 7 in a certain node X on level 4 might actually be the same as centroid 18 in some other node Y on level 4 ... and there would be no way to tell that centroid 7 in node X and centroid 18 and node Y were actually referring to the same pattern, without doing a lot of work.

But Uniform DeSTIN is so Biologically Unrealistic !!

So bloody what....

The brain has a lot of neurons ... adding new neurons was fairly easy and cheap for evolution ... so that it tends to do things in a massively parallel manner, with great redundancy. For the brain, it's not so problematically expensive to have the functional equivalent of a lot of DeSTIN nodes on the same level, all simultaneously using and learning libraries of patterns that are essentially identical to each other. Using current computer technology, on the other hand, it's rather inefficient.

In the brain, messaging between separated regions is expensive, whereas replicating function redundantly is cheap. In current computers, messaging between separated regions is fairly cheap (so long as those regions are stored on the same machine, anyway), whereas replicating function redundantly is expensive. Thus, even in cases where the same concept and abstract mathematical algorithm can be effectively applied in both the brain and a computer, the specifics needed for efficient implementation may be quite different.

Mapping Uniform DeSTIN into OpenCog

And now we come to the punchline you've been waiting for.

Mapping classic DeSTIN's states into a symbolic pattern-manipulation engine like OpenCog is possible, but is sort of a pain. Doing the same thing with Uniform DeSTIN is, relatively speaking, a piece of pie.

In Uniform DeSTIN, Cluster 7 means the same thing in ANY node on level 4. So after a Uniform DeSTIN system has seen a fair number of images, you can be pretty sure its library of patterns is going to be relatively stable. Some clusters may come and go as learning progresses, but there's going to be a large and solid library of clusters at each level that persists, because all of its member clusters occur reasonably often across a variety of inputs.

Define a DeSTIN state-tree as a (quaternary) tree with one node for each DeSTIN node; and living at each node, a small list of (integer pattern_code, float weight) pairs. That is, at each node, the state-tree has a short-list of the patterns that closely match a given state at that node. The weights may be assumed between 0 and 1. The integer pattern codes have the same meaning for every node on the same level.

As you feed DeSTIN inputs, at each point in time it will have a certain state, representable as a state-tree.

So, suppose you have a large database of DeSTIN state-trees, obtained by showing various inputs to DeSTIN over a long period of time. Then, you can do various kinds of pattern recognition on this database of state-tree.

Define a state-subtree as a (quaternary) tree with a single integer at each node.

Two state-subtrees may have various relationships with each other within a single state-tree -- for instance they may be adjacent to each other, or one may appear atop or below the other, etc.

An interesting kind of pattern recognition to do is: Recognize frequent state-subtrees in the stored library of state-trees; and then recognize frequent relationships between these frequent state-subtrees. The latter relationships will form a kind of "image grammar."

Temporal patterns may be recognized in the same way as spatial ones, as part of the state-subtree grammar (e.g. state-subtree A often occurs right before state-subtree B; state-subtree C often occurs right before and right below state-subtree D; etc.).

The flow of activation from OpenCog back down to DeSTIN also should work fairly straightforwardly in this set-up. If relationships have been stored between concepts in OpenCog's memory and grammatical patterns between state-subtrees, then whenever concept C becomes important in OpenCog's memory, this can cause a top-down increase in the probability of matching inputs to DeSTIN node centroids, that would cause the DeSTIN state-tree to contain the grammatical patterns corresponding to concept C.


Details of Mining Patterns in DeSTIN States

Assume one has Uniform DeSTIN, in its simplest version -- i.e. uniformity on *each level*. This means that "centroid #4", for instance, means the same thing at any node on level 2. (or on any level...) But in this sort of Uniform Destin, Centroid #4 may mean something different on level 2 versus on level 3 (fancier versions of Uniform DeSTIN fix this problem, but we can ignore that for now...)

Then, consider the following six predicates:

  • hasCentroid(node N, int k)
  • hasParentCentroid(node N, int k)
  • hasNorthNeighborCentroid(node N, int k)
  • hasSouthNeighborCentroid(node N, int k)
  • hasEastNeighborCentroid(node N, int k)
  • hasWestNeighborCentroid(node N, int k)

For instance

hasNorthNeighborCentroid(N, 3)

means that N's north neighbor has centroid #3

(I'm using the four compass directions to denote the four directions in the DeSTIN square lattice. I'd actually rather DeSTIN had a hexagonal lattice, but let's not mess with that right now...)

Consider also the predicates

  • hasParent(node N, Node M)
  • hasNorthNeighbor(node N, Node M)
  • hasSouthNeighbor(node N, Node M)
  • hasEastNeighbor(node N, Node M)
  • hasWestNeighbor(node N, Node M)

Now suppose we have a stored set of DeSTIN states, saved from the application of DeSTIN to multiple different inputs...

What we want to find are predicates P that are *conjunctions* of instances of the above 10 predicates, which occur frequently in the stored set of DeSTIN states.

For example, a simple example of such a predicate would be the conjunction of

  • hasNorthNeighbor($N, $M)
  • hasParentCentroid($N, 5)
  • hasParentCentroid($M, 5)
  • hasNorthNeighborCentroid($N,6)
  • hasWestNeighborCentroid($M,4)

This predicate could be evaluated at any pair of nodes ($N,$M) on the same DeSTIN level. If it is true for atypically many of these pairs, then it's a "frequent pattern", and should be detected and stored.

We have a software tool called Fishgram (Frequent SubHyperGraph Miner) in OpenCog, that is able to mine this sort of conjunction from sets of relationships that are stored in OpenCog's hypergraph (Atomspace) format.

To use Fishgram for this sort of problem, basically, one would need to

  • Write a script translating each DeSTIN state into a set of relationships drawn from: asNorthNeighbor, hasSouthNeighbor, hasEastNeighbor, hasWestNeighbor, hasCentroid, hasParent
  • Import these relationships, describing each DeSTIN state, into the OpenCog Atomspace
  • Run Fishgram on this AtomSpace.

And so...

All in all, we seem to have a pretty smooth way of mapping subsymbolic visually-recognized states into symbolic pattern-grammars.

The first step along this path would be to tweak DeSTIN into Uniform DeSTIN. The next step would be to implement a database of state-trees. The next step after that, to do some basic state-subtree pattern mining on that database. And so on -- intelligent computer vision, just a few moderately difficult steps away....

Uniform DeSTIN, Part Two

After writing the above, I thought of a further step in the same direction. (Sorry, I just noticed I use Level and Layer interchangeably in the below -- ignore that, as I mean the same thing by both of them, but I'm too lazy to fix that right now...)

Uniform DeSTIN Part One, as discussed above, makes DeSTIN's pattern recognition intrinsically translation invariant.

Uniform DeSTIN Part Two, as I'll discuss just below, makes DeSTIN's pattern recognition mostly (not wholly) scale invariant.

What I'll describe here is a way to map centroids on one level of DeSTIN, into centroids on the other levels of DeSTIN. This means that when a centroid has been learned on one level, it can be experimentally ported to all the other levels, to see if it may be useful there too.

Recall some DeSTIN basics:

Centroid on layer N = spatial arrangement (e.g. kxk square lattice) of beliefs of layer N-1

Belief on layer N = probability distribution over centroids on layer N. For simplicity we can think about this as a mixture of Gaussians

So: Belief on layer N = probability distribution over spatial arrangements of beliefs on layer N-1

On layer 1, the role of centroids is played by simple kxk squares of pixels. Layer 1 beliefs are probability distributions over these small pixel squares.

Layer 2 centroids are hence spatial arrangements of probability distributions over small pixel-squares; and layer 2 beliefs are probability distributions over spatial arrangements of probability distributions over small pixel-squares.

A small pixel-square S may be mapped into a single pixel P via a heuristic algorithm such as:

  • if S has more black than white pixels, then P is black
  • is S has more white than black pixels, then P is white
  • if S has an equal number of white and black pixels, then use some heuristic. For instance if S is 4x4, you could look at the central 2x2 square and assign P to the color that occurs most often there. If that is also a tie, then you can just arbitrarily assign P to the color that occurs in the upper left corner of S.

A probability distribution over small pixel-squares may then be mapped into a probability distribution over pixel values (B or W). A probability distribution over the two values B and W may be approximatively mapped into a single pixel value -- the one that occurs most often in the distribution, with a random choice made to break a tie.

This tells us how to map Layer 2 beliefs into spatial arrangements of pixels. Thus, it tells us how to map Layer 2 beliefs into Layer 1 beliefs.

But this tells us how to map Layer N beliefs into Layer N-1 beliefs, inductively.

Remember, a Layer N belief is a probability distribution (pdf for short) over spatial arrangements of beliefs on layer N-1

For example: A Layer 3 belief if a pdf over arrangements of Layer 2 beliefs. But since we can map Layer 2 beliefs into Layer 1 beliefs, this means we can map a Layer 3 belief into a pdf over arrangements of Layer 1 beliefs -- which means we can map a Layer 3 belief into a Layer 2 belief.

Etc.

Of course, this also tells us how to map Layer N centroids into Layer N-1 centroids. A Layer N centroid is a pdf over arrangements of Layer N-1 beliefs; a Layer N-1 centroid is a pdf over arrangements of Layer N-2 beliefs. But Layer N-1 beliefs can be mapped into layer N-2 beliefs, so Layer N centroids can be represented as pdfs over arrangements of Layer N- beliefs, and hence mapped into Layer N-1 centroids.

In practice, one could implement this idea by moving from the bottom up. Given the mapping from Layer 1 "centroids" to pixels, one could iterate through the Layer 1 beliefs and identify which pixels they correspond to. Then one could iterate through the Layer 2 beliefs and identify which Layer 1 beliefs they correspond to. Etc. Each Level N belief could be explicitly linked to a corresponding level N-1 belief. Synchronously, as one moves up the hierarchy, Layer N centroids could be explicitly linked to corresponding Layer N-1 centroids.

Note, there are in principle more possible Level N beliefs than Level N-1 beliefs. So the mapping from level N beliefs to level N-1 beliefs is many-to-one. This is a reason not to simply maintain a single centroid pool across levels.

However, when a new centroid C is added to the Level N pool, it could be mapped into a Level N-1 centroid to be added to the Level N-1 pool (if not there already). And, it could also be used to spawn a Level N+1 centroid, drawn randomly from the set of possible Level N+1 centroids that map into C.

And so, Bob's your uncle ;-)

Uniform DeSTIN, Part Three

An even further step in the same direction was suggested by Jared Wigmore to me, in conversation.

What Jared suggested was that, in comparing

  • an input A to a Level N node

to

  • a Level N centroid B

one could consider various rotations of A, and see which rotation gives the closest match.

The tweak one needs to make this work well would seem to be: When you match a centroid to an input observation-or-belief, you could record the rotation angle corresponding to the match...

In other words, one should use the tweaked definitions

Centroid on layer N = spatial arrangement (e.g. kxk square lattice) of beliefs of layer N-1

Belief on layer N = probability distribution over (angle, centroid) pairs on layer N. For simplicity we can think about this as a mixture of Gaussians

So: Belief on layer N = probability distribution over (angle, spatial arrangement of beliefs) pairs on layer N-1

An additional complexity here is that two different (angle, centroid) pairs (on the same level) could be (exactly or approximately) equal to each other. This necessitates an additional step of "centroid simplification", in which ongoing checks are made to see if there are any two centroids C1, C2 on the same level so that: There exist angles A1, A2 so that (A1, C1) is very close to (A2, C2). In this case the two centroids may be merged into one.

Given these modifications, everything seems to work fine (on paper ;)

Oh and by the way -- what works for rotation, in this context, should work for shear as well. Although shear invariance is somewhat of a stranger idea. One could simply replace "rotation angle" in the above by "(rotation angle, shear factor) pair" and the whole story seems to work just fine.

Comments on Temporal Perception

Note that Uniform DeSTIN Part One and Part Two could potentially be applied perfectly well if the inputs to DeSTIN, at level One, were movies rather than static images. Then, in the simplest version, Level 1 would consist of pixel cubes instead of pixel squares, etc. (the third dimension in the cube representing time).

The scale invariance achieved by Uniform DeSTIN Part Two would then be scale invariance in time as well as in space. (Rotation invariance through time seems unnecessary to deal with ;-)

Another comment is, it seems to me that in this sort of scenario, we might want to enable rectangular shapes as well as cubes. That is, we might want to look at a Level N centroid consisting of m time-slices of a kxk arrangement Level N-1 beliefs -- without requiring that m=k .... But this wouldn't affect the Uniform DeSTIN points particularly. It would just make the centroid learning algorithm a little more complex, because at each level one would want to consider centroids with various values of m, from m=1,...,k (or maybe m>k also).

Interpretation of What DeSTIN Is Doing

This is just a brief comment on the topic: What is DeSTIN doing, exactly? That is: What is the theoretical/conceptual justification for what DeSTIN does?

There are a lot of ways of looking at this, here is one way that makes sense to me -- and seems to hold up regardless of the details of how one calculates beliefs in each node.

The centroids in the DeSTIN library represent points in "spatial pattern space", i.e. they represent exemplary spatial patterns. DeSTIN's beliefs, as probability distributions over centroids, represent guesses as to which of the exemplary spatial patterns is the best model of what's currently being seen in a certain space-time region.

This matching between observations and centroids might seem to be a simple matter of "nearest neighbor matching", but the subtle point is, it's not immediately obvious how to best measure the distance between observations and centroids. The optimal way of measuring distance is going to depend on context; that is to say, on the actual distribution of observations in the system's real environment over time.

DeSTIN's algorithm for calculating the belief at a node, based on the observation and centroids at that node PLUS the beliefs at other nearby nodes, is essentially a way of tweaking the distance measurement between observations and centroids, so that this measurement accounts for the context (the historical distribution of observations).

There may be many ways of doing this tweaking. Ideally one could use probability theory explicitly, but that's not always going to be computationally feasible, so heuristics may be valuable, and various versions of DeSTIN have contained various heuristics in this regard.

The various ways of "uniform-izing" DeSTIN (i.e. making its pattern recognition activity approximately invariant with respect to affine transformations), don't really affect this story -- they just improve the algorithm's ability to learn based on small amounts of data (and its rapidity at learning from data in general), by removing the need for the system to re-learn transformed versions of the same patterns, over and over again. So the uniformization just lets DeSTIN carry out its basic activity faster and using less data.

DeSTIN's Assumption of Hierarchical Decomposability

DeSTIN will work well to the extent that: The average distance between each part of an actually observed spatial pattern, and the closest centroid pattern, is not too large (note: the choice of distance measure in this statement is potentially subtle). That is: DeSTIN's set of centroids is supposed to provide a compact model of the probability distribution of spatial patterns appearing in experience of the cognitive system of which DeSTIN is a part.

DeSTIN's effectively functionality relies on the assumption that this probability distribution is hierarchically decomposable -- i.e. that the distribution of spatial patterns appearing over a kxk region can be compactly expressed, to a reasonable degree of approximation, as a spatial combination of the distributions of spatial patterns appearing over (k/4)x(k/4) regions. This assumption of hierarchical decomposability greatly simplifies the search problem that DeSTIN would otherwise face, but also restricts DeSTIN's capability to deal with more general spatial patterns that are not easily hierarchically decomposable. However, the benefits of this approach seem to outweigh the costs, given that visual patterns in the environments humans naturally encounter do seem (intuitively at least) to have this hierarchical property.

A Comment on Distance and Utility

Above I said the choice of distance measure involved in the assessment of DeSTIN's effective functionality is subtle. Further above, I said that the function of DeSTIN's belief assessment is basically to figure out the contextually best way to measure the distance between the observation and the centroids at a node. These comments were both getting at the same point.

But what ultimately is the right measure of distance between two spatial patterns? Ultimately, the right measure is: the probability that the two patterns A and B can be used in the same way. Ultimately, the system wants to identify observation A with centroid B if it has useful action-patterns involving B, and it can substitute A for B in these patterns without loss.

This is difficult to calculate in general, though. A rough proxy, often OK, is to measure the distance between A and B in terms of both

  • the basic (extensional) distance between the physical patterns they embody (e.g. pixel by pixel distance)
  • the contextual (intensional) distance, i.e. the difference between the contexts in which they occur

Via enabling the belief in a node's parent to play a role in modulating a certain node's belief, DeSTIN enables contextual/intensional factors to play a role in distance assessment.

Integrating Image Grammar into DeSTIN

The above suggestions (regarding various kinds of uniformization of DeSTIN, to embed invariance with respect to affine transformations into the architecture), are all still within the basic scope of a DeSTIN-like design. The following suggestions deviate further from the original DeSTIN concept, and constitute a sort of "DeSTIN++" algorithm that would fundamentally incorporate coupling with an external pattern recognition system (which could be OpenCog, or could be something simpler) into the perceptual pattern recognition process.

The basic question explored is: How might one integrate DeSTIN with an external pattern recognition network, to achieve a greater intelligence than DeSTIN (in its current form) can realize on its own?

Here I will describe one simple approach -- not the only possible approach by any means, but a good exemplar to help understand the concept.

Suppose one stores DeSTIN's states, over a period of time, in a database outside DeSTIN.

Then, suppose one uses an external pattern recognition algorithm to look for frequent patterns, and also surprising patterns, in this set of stored states. These patterns could be expressions in a simple language (let's call it STPL, for Spatiotemporal Language) involving

  • DeSTIN beliefs (i.e. probability distributions over centroids or STPL expressions)
  • Boolean operations (AND, OR, NOT)
  • the inheritance relation
  • some basic spatial relationships (next_to, above, below, right_of, left_of, near, etc.)
  • maybe some RCC spatial relationships also
  • temporal relationships (before, after, simultaneous, temporally_near)
  • tokens indicating patterns in STPL

One could then add these frequent or surprising patterns to DeSTIN's pattern library, so that the pattern library contains not only original-style DeSTIN centroids, but also these corresponding "image grammar" style patterns.

Then, when a new input comes into a DeSTIN node, in addition to being compared to the centroids at the node, it could be fed as input to the STPL expressions associated with the node.

What would be the advantage of this approach, compared to DeSTIN without a database, pattern recognition and STPL? The potential is to get more compact representation of a variety of spatial patterns. In many cases, a spatial pattern that would require a large number of DeSTIN centroids to represent, can be represented by a single, fairly compact STPL expression. It is an open question whether these sorts of patterns are really critical for human-like vision processing. But my intuition is that they are. Ultimately, DeSTIN is based on a fancy version of nearest-neighbor search, applied in a clever way on multiple levels of a hierarchy, using context-savvy probabilities to bias the matching. But there are going to be some visual patterns that are more compactly and intuitively represented using a more flexible language.

For example, consider the archetypal spatial pattern of a face as: {either two eyes that are next to each other, or sunglasses}, above a nose, which is in turn above a mouth. (Yes, I know this is an oversimplified toy example, but I'm positing it for illustration only. The same point applies to more complex and realistic patterns.) One could represent this in STPL as something like (using OpenCog notation)


AND
  inheritance(N, B_nose)
  inheritance(M, B_mouth)
  above(E, N)
  above(N, M)
  OR
    AND
       member(E1, E)
       member(E2, E)
       next_to(E1, E2)
       inheritance(E1, B_eye) 
    AND
       inheritance(E, B_sunglasses)

where e.g. B_eye is a DeSTIN belief that corresponds roughly to recognition of the spatial pattern of a human eye. To represent this using ordinary DeSTIN centroids, one couldn't represent the OR explicitly; instead one would need to split it into two different sets of centroids, corresponding to the eye case and the sunglasses case… unless the DeSTIN pattern library contained a belief corresponding to "eyes or sunglasses." But the question then becomes: how would DeSTIN actually learn a belief like this? In the suggested architecture, pattern mining on the database of DeSTIN states is proposed as an algorithm for learning such beliefs.

The STPL-enhanced DeSTIN will have advantages over the traditional version, only if the actual distribution of images observed by the system contains many (reasonably high probability) images modeled accurately by STPL expressions involving disjunctions and/or negations as well as conjunctions. If the system's perceived world is simpler than this, then good old DeSTIN will work just as well, and STPL is a needless complication.

Without STPL, how might DeSTIN be extended to include beliefs like "eyes or sunglasses"? One way would be to couple DeSTIN with a reinforcement learning subsystem, that reinforced the creation of beliefs that were useful for the system as a whole somehow. If reasoning in terms of faces (independent of whether they have eyes or sunglasses) got the system reward, presumably it could learn to form the concept "eyes or sunglasses."

Note: one could also define a neural-net based approach similar in spirit and function to the above suggestion. Instead of a database plus a pattern mining algorithm, one could use a neural net performing unsupervised pattern recognition, getting fed the same inputs as the proposed database. My belief is that the suggested approach will work better given the nature of contemporary computers.

Comparing DeSTIN Centroids from Different Time Points, or Different DeSTIN Systems

Suppose we store a database of DeSTIN states, obtained from a single DeSTIN system as it evolved over time.

In order to mine patterns from this database, we need to address the problem that a centroid labeled "k" at a certain point in time (whether with respect to a particular node as in classic DeSTIN, or a layer or the whole network as in uniform DeSTIN), won't necessarily have the same meaning as the centroid labeled "k" at some other point in time.

That is: one needs some sort of "objective" way to compare centroids with each other.

Note that: On the lowest level (assuming that nodes refer to 4x4 pixel grids), a centroid may be identified with a point in the 16-dimensional unit cube. So computing the distance between two centroids on the lowest level, is just a matter of using a metric on 16D space.

On higher levels, things are a little trickier. A centroid C is defined as a point in some M-dimensional cube, in an M-dimensional space whose axes correspond to

  • centroids on the level below, from the previous time-step
  • external signals (e.g. conjectured classification judgments)
  • centroids from the level above, from the previous time step
  • centroids from the present level, from the previous time step

Generally, most entries of the M-dimensional vector characterizing a centroid C will be close to zero -- so it will be a pretty sparse vector. This is because the entries of the vector largely denote the similarity of different parts of C to various centroids at the level below, and most of these similarities will be near zero.

Given two centroids on the same level, we can measure the basic distance

d_B(C1, C2)

defined as the distance between C1 and C2 in M-dimensional space (e.g. using the cosine metric). However this is not necessarily a good measure, since potentially the axes defining C1's M-dimensional space, may have a different meaning than the axes defining C2's M-dimensional space.

So, to measure the distance between two higher-level centroids (on the same level), one needs to do a recursive comparison, i.e.

d_R(C1, C2) = d_B(C1,C2) + d_child(C1, C2) + d_parent(C1,C2) + d_previous(C1, C2)

where

d_child(C1, C2)=
SUM [ C1' is a centroid in a child of C1, C2' is a centroid in a child of C2 ;
w(C1',C1) w(C2', C2) d_R(C1', C2') ]

w(C1',C1) = the similarity between C1' and the corresponding part of C1
etc. (i.e. the coordinate value of C1 in the coordinate corresponding to C1').

d_parent(C1, C2) = d_R(state of parent of C1, state of parent of C2)

d_previous(C1, C2) = d_R(C1'', C2'')

where 

C1'' = the previous state of C1

How do we measure the distance between two *states*? The state of a node is a probability distribution over centroids, so we could measure the distance between these distributions. e.g. using Jensen-Shannon divergence (http://en.wikipedia.org/wiki/Jensen–Shannon_divergence )

And finally, if C1 and C2 are on level 1, then

d_R(C1, C2) = d_B(C1, C2)

(thus bottoming out the recursion)

This tells us how to measure distances between two centroids, which is nice. But it's a complex formula.

The trickiest aspect of this distance measure seems to be: It defines the distance between the centroids at two nodes C1 and C2 in terms of the distance between the centroids at their parents C1 and C2; but it also defines the distance between the centroids at the parents C1' and C2', in terms of the distance at the child nodes C1 and C2. So it gives us a system of simultaneous equations that needs to be solved, presumably iteratively. The top-level nodes have no parents, and the bottom-level nodes have d_R=d=B, which gives the "boundary conditions" for the simultaneous equations.

After one has done all this, then to make a simpler map of the centroids, one approach is to use dimensional embedding. That is, we can choose some N (say, N=50) and use OpenCog's dimensional embedding code to map all the centroids into N-dimensional vectors. Any metric space can be dimensionally embedded, and d_R is a metric on the space of centroids, so this should work OK.

Then, we can find clusters in the N-dimensional embedding space. These are persistent centroids that are defined in an objective, stable way. Or we can partition the N-dim embedding space into cells, and use these cells as persistent visual forms...

A Simpler Approach to Maintaining Consistent Semantics for Centroids with the Same Name

The following seems a simpler approach to ensuring that centroids with the same name have the same meaning.

  • Store a record of "deprecated centroids", along with current centroids
  • Store a record, for each centroid based on more than N pieces of evidence, of how much it has moved since creation; and also of some of its prior versions
  • If a centroid named C1 [by "name" I mean "index number"] has moved too much, then
1) check if the new position of the centroid C1 matches any of the centroids D in the deprecated centroids list, 
1a) if yes, then assign the new position of the centroid C1 the name D (and now D is no longer deprecated)
1b) if no, then create a new name C2 for the new position of C1

2) check what happens to the centroids in the parent nodes of nodes using C1, due to the replacement of C1 with D or C2.   
Record the amount of change to each of these centroids, induced as a result of the change to C1.  This may cause one of these
centroids to "have moved too much", thus propagating the renaming process up the hierarchy.

3) add an earlier version of the centroid C1 (before all the recent movement)  to the deprecated centroids list (with the name C1 still attached), 

...

This approximatively preserves the idea that iff two centroids (at different points in time in the same DeSTIN system) have the same name, then they have the same meaning. But it does so in an organic, "bottom up" manner.... OTOH it requires tweaking DeSTIN, whereas the more complex approach suggested above (involving distance measures and dimensional embedding) just requires postprocessing of saved DeSTIN states..