Deep Learning in OpenCog: Proposal
Deep Learning for Vision in OpenCog (Design Notes/Suggestions)
This page is somewhat obsolete. See Interfacing Between OpenCog and Deep Neural Nets for Vision Processing for newer ideas.
This rough draft document gives some slightly more detailed ideas about how one might do deep learning for vision in OpenCog. (first version Ben Goertzel: Dec. 3, 2015 ...)
The core approach is to use OpenCog as a toolkit for connecting together nodes corresponding to different spatial or spatiotemporal regions, and for invoking lower-level (e.g. neural network) processes associated with these nodes (where the lower-level processes are then executed outside of OpenCog). So in a sense one is using OpenCog similarly to how one uses Theano or Tensorflow (though in fact one is using it as an additional wrapper around one of these toolkits, via making the GSNs that OpenCog invokes use one of these toolkits, most likely Tensorflow initially). But one is also gaining more than one gains from a toolkit like Theano or Tensorflow, via representing one’s initial, final and intermediate knowledge structures in a general knowledge representation such as the Atomspace. This opens up many possibilities for future perception-cognition integration, and generally gives one great flexibility regarding how to do things.
NOTE: these are just ideas, there may be better ways to do these things, and improvisation and creative ideas are encouraged on the part of anyone working on this…
Some Preliminary Thoughts on Design/Architecture
See this page of rough notes on Outsourcing Array Operations From OpenCog
Some Design Ideas …
See Outsourcing Array Operations From OpenCog for ideas on defining and implementing ArrayNodes for representing multidimensional arrays in the Atomspace.
A RegionNode should correspond to a certain region within a 3DspaceMap (though it could be a 2D or 3D region). Initially we only need to support rectangular regions, but we don’t want to wire this assumption too deeply into the infrastructure; we want people to be able to play with regions of other shapes as well, later on.
Question: What is the best way to implement RegionNode and link it to the 3DspaceMap? .
Array processor schema
This is my informal term for a GroundedSchemaNode whose input is a ArrayNode , and whose output is a ArrayNode .
could refer to a denoising autoencoder that takes in one array and outputs another, so we might have
ExecutionOutputLink GroundedSchemaNode “DenoisingAutoencoder_345.py” ArrayNode “3456”
(which would produce an ArrayNode ).
Array processor training schema
This is my informal term for a GroundedSchemaNode whose input is: A GroundedSchemaNode, plus a ArrayNode (or a list thereof) … and whose action is to create a new GSN that is "trained" based ont he ArrayNodes. See the example at the end of Outsourcing Array Operations From OpenCog .
One can also have schema for updating trained models. For instance, we might have
ExecutionOutputLink GroundedSchemaNode “UpdateDenoisingAutoencoder.py” ListLink GroundedSchemaNode “DenoisingAutoencoder_345.py” ArrayNode “3456”
What I mean by tensor normalization is the process that takes in an array with nonnegative real entries, and outputs an array whose entries sum to 1. E.g.
ExecutionOutputLink GroundedSchemaNode “NormalizeArray.py” ArrayNode “3456”
In a normalized array, each entry may be interpreted as a probability (though the particular intepretation depends on the context in which the array is constructed).
This is useful for getting PredicateNodes out of arrays.
A spatiotemporal network is a set of RegionNodes connected by RegionChildLinks and RegionNeighborLinks. So e.g. we might say
RegionChildLink RegionLink “666” RegionLink “777”
meaning that region “777” is a child of the region “666”. The child and neighbor links specify the topology of the network.
For instance, in a 2D image processing application, where the image is divided into square sub-regions on multiple layers (e.g. the typical DeSTIN setup), the RegionNodes on all but the bottom layer would have 4 children (connected via ChildLinks), and the RegionNodes except for those on the edges would have 4 neighbors (connected via NeighborLinks).
A spatiotemporal network may be considered divided into layers, where
- Layer 1 = RegionNodes with no children
- Layer 2 = RegionNodes with only Layer 0 nodes as children
- Layer 3 = RegionNodes with only Layer 0 and Layer 1 nodes as children
(Note that if we wanted to have multiple different spatiotemporal networks involving the same spatiotemporal regions, in the above approach we would need to construct multiple RegionNodes corresponding to the same spatiotemporal regions. This doesn’t seem a big problem at the moment, so I think the above representation is OK for now.)
There are many different ways to formally specify a spatiotemporal network. One approach would be to create SetLinks corresponding to the layers of a network, so that e.g. one could say
EvaluationLink PredicateNode “Layer” ListLink SpatiotemporalNetworkNode “1234” NumberNode “3” SetLink RegionNode “1234” RegionNode “555” …
to specify the 3rd layer of a certain network (where the SetLink lists all the regions in the 3rd layer), and
EvaluationLink PredicateNode “depth” SpatiotemporalNetworkNode “1234” NumberNode “5”
to specify that the network in question has 5 layers.
Simple Deep Spatiotemporal Learning Schemes
A “simple spatiotemporal deep learning scheme” (or “deep learning scheme” for short) consists of a Spatiotemporal Network,
- all of whose included RegionNodes use the same kind of array processor schema and array processor training schema (e.g. they all use denoising autoencoders, or they all use k-means clustering, or whatever)
- all of whose included RegionNodes use the same input-gathering schema, with obvious changes for the bottom and top layers as needed
- each RegionNode is associated with a single GSN denoting the array processor schema associated with that node (e.g. RegionNode “456” might be associated with (GroundedSchemaNode “DenoisingAutoencoder_456.py” ))
An input-gathering schema is a GroundedSchemaNode whose input is a RegionNode contained in a SpatiotemporalNetwork, whose job is to build an ArrayNode for input to the array processor schema and array processor training schema corresponding to that RegionNode. E.g.
ExecutionOutputLink GroundedSchemaNode “BottomUpInformationGatherer.py” RegionNode “3456”
would produce an ArrayNode to be used as input for tensor scheme and tensor processor training schema associated with RegionNode “3456”.
So, formally, we could define e.g. the “simple 2D DeSTIN” deep learning scheme via
ConceptNode “Simple 2D DeSTIN”
EvaluationLink PredicateNode “input gatherer” ListLink ConceptNode “Simple 2D DeSTIN” GroundedSchemaNode “BottomUpInformationGatherer.py” EvaluationLink PredicateNode “ array processor schema” ListLink ConceptNode “Simple 2D DeSTIN” GroundedSchemaNode “makeDenoisingAutoencoder.py” EvaluationLink PredicateNode “ array update schema” ListLink ConceptNode “Simple 2D DeSTIN” GroundedSchemaNode “updateDenoisingAutoencoder.py”
ConceptNode “Simple 2D DeSTIN”
simply serves as a hook to find the various dynamical functions associated with the “simple 2D DeSTIN” learning scheme.
The GSN “GroundedSchemaNode “makeDenoisingAutoencoder.py” , when run on a RegionNode “345” would create the GSN “DenoisingAutoencoder_345.py” as output and link it to the RegionNode, i.e.
ExecutionLink GroundedSchemaNode “makeDenoisingAutoencoder.py” RegionNode “345” EvaluationLink PredicateNode “ array processor schema” ListLink RegionNode “345” GroundedSchemaNode “DenoisingAutoencoder_345.py”
Note that the use of “_345” (i.e. of the RegionNode’s name) in the name of the GSN is for human convenience only, for internal processing the important thing is the EvaluationLink joining this GSN with the RegionNode.
One can also define the “output array ” of a RegionNode in a Spatiotemporal Network as the array resulting from: applying an input gathering schema, then applying a array processor schema, e.g.
DefinedSchemaNode “getOutputArray” RegionNode $R ExecutionOutputLink DefinedSchemaNode “get array processor schema” RegionNode $R ExecutionOutputLink GroundedSchemaNode “getInputArray” RegionNode $R
Conceptually, the learning associated with a deep learning scheme may be executed as follows, for a spatiotemporal network with N layers:
Repeat until bored… For L = 1 to N (iterate through layers) Iterate through RegionNodes R on layer L (or process them in parallel if possible) Apply the input gatherer, to gather the input tensor for R If no array processor exists for R, create one (using the schema associated with the network) If in training mode, run the array processor training schema on R Run the array processor schema for R
This can be written as a DefinedSchemaNode, of the general nature
DefinedSchemaNode “iterate deep learning scheme” ListLink ConceptNode $C \\ e.g. $C = “Simple 2D DeSTIN” SpatiotemporalNetwork $S SequentialAndLink \\ apply the above Repeat Until Bored loop \\ using the spatiotemporal network $S and \\ the schema associated with $C \\ (note we need to use tail recursion here)
in such away that the same DefinedSchemaNode can be used even if one substitutes various different GSNs for the input gatherer, and the array and array processor training schema, and even if one switches from 2D to 3D, etc.
If the gathering of the input array includes parent as well as child information, then we have top-down feedback going on, meaning that running the top-level iteration over and over again may keep yielding different results.
Note that the learning here is totally localized, i.e. the learning occurs only locally within each RegionNode. This can only yield good results if the top-down feedback works well, i.e. if the input-gathering schema includes parent and neighbor information and this information is usefully incorporated in learning.
How can information get read out of one of these networks, for use e.g. by reasoning algorithms? One may construct predicates from normalized tensors, e.g. DefinedPredicateLink “get output property value”
ListLink RegionNode $R NumberNode $N ExecutionOutputLink GroundedSchemaNode “getArrayElement” ListLink ExecutionOutputLink GroundedSchemaNode “getOutputTensor.py” RegionNode $R NumberNode $N
would produce a probability value corresponding to the $N’th entry of the output tensor from the RegionNode $R.
So the logical output of a deep network, in this framework, is obtained from the predicatized, normalized versions of the output ArrayNodes of the various RegionNodes. Properties are of the form “the k’th property of the output tensor of the RegionNode R” --- meaning, “the k’th property that has been perceived/cognized to be currently true, as regards the region corresponding to R. “
A Simple Initial Example
As an illustrative example and first case of the above approach, it might be a good idea to make a 2D unsupervised DeSTIN-style stacked autoencoder network using the above ideas. Suppose one began with a 32x32, 2D gray-scale image. Then one would have a simple SpaceMap, effectively 2D, divided into 32x32 little squares. Input images would be streamed into this SpaceMap.
One could have
- Layer 1 = 4x4 squares, i.e. 64 RegionNodes here
- Layer 2 = 8x8 squares, i.e. 16 RegionNodes here
- Layer 3 = 16x16 squares, i.e. 4 RegionNodes here
- Layer 4 = 1 RegionNode corresponding to the whole SpaceMap
ChildLinks would be built from a RegionNode R on layer k+1 to the RegionNodes on layer k representing R’s sub-regions
NeighborLinks can be ignored for this simple example
The input of a node on Layer 1 consists of a 16-dim real vector built from its pixels’ gray-scale values (a vector being a 1D tensor…)
The input of a node on Layer k, for k>1, consists of a vector formed by concatenating the output vectors of its children.
The tensor processor schema associated with a RegionNode applies a denoising autoencoder that has been trained on the input to that RegionNode.
The tensor processor training schema associated with a RegionNode does incremental (online) updating of the denoising autoencoder associated with the RegionNode
NOTE: If there is not convenient available code for incremental training of denoising autoencoders, then we can use any other learning scheme here. It’s important that the learning be incremental rather than batch-mode though. Batch learning is not suitable for an AGI system…. The above-suggested system includes no feedback, only feedforward processing. To include feedback in the simplest way, the parent RegionNode’s output could be included as part of the child node’s input….