Interfacing Between OpenCog and Deep Neural Nets for Vision Processing

From OpenCog
Jump to: navigation, search


(This page was written by Ben Goertzel, Dec 27 2016)

The set-up considered here is: a DeSTIN-like collection of pattern-recognizers (they will be referred to as “neural nets” or NNs here but they could just as well be something else), each one associated with a certain spatial region of an input image or video coming from a camera. The input image may be 2D or 3D. We assume that the actual NN evaluation and learning is done outside OpenCog, via some "Server" that communicates with OpenCog, and that wraps up calls to third-party deep-learning tools (e.g. Tensorflow, Theano, whatever).

There is also a case where there are NNs corresponding to spatial regions and temporal extents, e.g. for video processing one might have a 1-second and a 5-second network corresponding to the same spatial region. But we will ignore this case for the time being; it can be handled similarly.)

The goal is to represent the outputs of each of these NNs in OpenCog, both for purposes of real-time behavior guidance, and for storage for later offline pattern recognition. We also want OpenCog to be able to suggest new features to the NN learning software.

SpaceMaps and Regions

OpenCog has the capability to contain multiple SpaceMaps, and it seems to make sense to use this capability here. The human brain has an allocentric SpaceMap (top-down) as well as an egocentric SpaceMap (face-centered) and two eye-centered SpaceMaps. Similarly, in the case where OpenCog is controlling a Hanson robot, I think it should have

  • an allocentric SpaceMap
  • an egocentric SpaceMap, associated with the chest camera
  • a SpaceMap for each eye camera

Coordinating among these SpaceMaps is an important topic. For the purpose of this page, however, we will consider the SpaceMap associated with the chest camera. This is the SpaceMap that will be associated with low-level visual pattern recognition from the chest camera…

Each region of the input image from the chest camera, which has a NN associated with it, should have a RegionNode associated with it in the Atomspace. (Note that this is the case whether the regions are disjoint squares, overlapping squares, hexagons, or whatever…). If region A is contained in region B, we can create a link

SpatialContainsLink
   RegionNode “A”
   RegionNode “B”

and we can create other links denoting spatial relationships between RegionNodes as needed. For instance if we have a 2D hierarchy of square regions, we would probably want links like

EvaluationLink
    PredicateNode “upper right corner”
    ListLink
        RegionNode “A”
        RegionNode “B”

that uniquely specify the relationship between each RegionNode and each of its children.

Streaming NN Output into Atomspace

We assume there is some external (non-Atomspace/CogServer) server that applies the hierarchy of deep NNs to the image from the chest camera, in real time. This “ NN server” also takes care of updating and learning the NNs themselves. (OpenCog may send these NNs new input features, but we’ll deal with that a little later — for now let’s just consider the streaming of output from the NN server into OpenCog.)

I suggest we want to use two Atomspaces here: a main Atomspace, and a PerceptionHistory AtomSpace.

What I’m thinking is: Every N seconds (N may be <1 ), the NN server pushes to OpenCog the output of each of its NNs. This output goes to both the main Atomspace, and the PerceptionHistory Atomspace...

So basically the NN server is just streaming Atoms into the two Atomspaces, sort of like the face-tracker and our other perception sources stream Atoms into OpenCog now...

Representing NN Output in the PerceptualHistory Atomspace

In the PerceptionHistoryAtomspace, the output of the NN server should be recorded something like

AtTimeLink
    TimeNode "time-stamp-here"
    EvaluationLink <s>
          PredicateNode “NN output"
          ListLink
                SchemaNode “chest_camera-1-3-4-autoencoder”
                NumberNode "k"

Here

  • s is the k’th output of the neural net labeled “chest_camera-1-3-4-autoencoder” scaled into [0,1]
  • k is an index into the output vector of the neural net
  • the name “chest-camera-1-3-4-autoencoder” is for human consumption, and indicates that this NN comes from the chest camera, that it corresponds to RegionNode on layer 1 and with coordinates (3,4), and that it is an autoencoder NN

To associate this SchemaNode with a RegionNode in a semantically transparent way, we need something like

EvaluationLink
      PredicateNode "models"
      ListLink
            SchemaNode “chest_camera-1-3-4-autoencoder”
            RegionNode “chest_camera-1-3-4”

In this way the PerceptionHistoryAtomspace contains a complete record of the output of the NNs corresponding to RegionNodes over time, in a format suitable for pattern mining ...

This record would then be used as input to pattern mining, and other AI processes aimed at recognizing patterns in the states of the deep-NN hierarchy over time (the understanding being that the “states” we care about are the outputs of the NNs corresponding to the different regions, not the internal states inside the various NNs).

Representing NN Output in the Main (Real-time Processing) Atomspace

On the other hand, in the main Atomspace, what we probably want for representing NN output is a StateLink such as

StateLink
    ListLink
             ConceptNode “NN output set”
             SchemaNode “chest_camera-1-3-4-autoencoder”
             NumberNode "k"
    NumberNode "s"


We may then want a function something like

DefineLink
     DefinedPredicateNode "get NN output for region”
     LambdaLink
            VariableList
                  RegionNode $R
            >code to find the neural net corresponding to
                       RegionNode $R, and then look up in the appropriate
                       StateLink to find the current output of that neural net <


Real-time evaluation of predicates identifying visual patterns in the input, would occur via applying these predicates to these StateLinks or other Atoms derived therefrom.

OpenCog Providing New Features to the NN Server

Now let us discuss a slightly later step.

At a later stage, we will want OpenCog processes to suggest new features for input to the deep NNs associated with different RegionNodes.

This might be done via a function such as

DefineLink
        DefinedSchemaNode "add features to neural net"
        LambdaLink
              VariableList
                  RegionNode $R
                  ListLink $L \\ a list of StateLinks
                > code that tells the external NN server to add features to the NN
                for region $R, corresponding to the StateLinks in the List <
                > code that causes the states of these StateLinks to be pushed from the Atomspace
                to the NN server on a real-time basis, at a certain default frequency <

This requires some process on the OpenCog side to keep track of which StateLinks need their state sent where at what frequency; and it also requires some subtlety behind the scenes in the NN server, because the NN server has to decide when to train new models (based on the new features), when to use the new models versus the old models for sending outputs to OpenCog, etc. (Of course, additional parameters could be added to the DefinedSchemaNode to give the NN server or the OpenCog StateLink-based-messaging process more instructions on what to do…)

Note that, depending on the configuration of the NN server, it might be the case that

  • adding features to one RegionNode adds the same features to all RegionNodes on that layer
  • adding features to one RegionNode adds the same features to RegionNodes on various layers, where the NN server decides which layers are able to use the features
  • etc.