Deep Learning Perception in OpenCog
This page (initially written by Ben Goertzel in early November 2015) makes some suggestions regarding how one could usefully implement modern "deep learning perception" algorithms/structures in OpenCog, so as to make them work better via interfacing them with OpenCog's cognitive algorithms.
For a few notes by way of general background, see this OpenCog Brainwave blog post
What Kind of Deep Network?
Firstly, what kind of network am I thinking of implementing inside OpenCog?
This is open to discussion and revision, but my current thinking, for the vision use-case, is to implement
- a stacked convolutional autoencoder, similar to the “convae-destin” network that Tejas Khot (working closely with Yuhuang Hu) implemented for OpenCog GSoC 2015. We would like to try this in either 2D or 3D.
- a variation where the denoising autoencoder is replaced with an LSTM network
- a variation (with either autoencoders or LSTM) where the receptive fields of the network corresponding to a node on layer N, are made to take input from the layers below and above N (and layer N itself). This is a way of getting DeSTIN-style “parental advice” that I suggested during GSoC 2015, but nobody had time to implement…
We want flexibility to make networks with different topologies, and with different learning algorithms and different sorts of NNs (or other systems) associated with the internal nodes. This flexibility will let us use the same framework for processing auditory data, or any other kind of data with hierarchical structure.
Implementation in the Atomspace
How to implement this (or some other) deep learning architecture in the Atomspace? Several strategies are possible, here I will summarize the one that seems easiest to me for starters.
In this design I am inspired at a high level by pylearn2 — note that pylearn2 lets one specify a deep network architecture in python, but in such a way that the “heavy lifting” is done in CUDA on a GPU, invoked “automagically” via calls to appropriate python functions. I suggest we want to follow broadly similar strategies in OpenCog. Architect the network in Atomese, pass information around the network in Atomese, but do the hard float-array-crunching work on GPUs, using functions invoked from within the Atomspace.
(In fact I am going to suggest shortly that we actually use some of the pylearn2 infrastructure inside OpenCog; but even if this design isn’t followed, pylearn2 will still have value as a general inspiration…)
A particular deep network could be a particular Atomspace — e.g. a robot could have an Atomspace containing its vision-processing deep network, and another Atomspace containing its audition-processing deep network. The Atoms within these could, of course, link to each other and to Atoms in other Atomspaces handling cognition.
In a DeSTIN-type deep learning architecture, we have nodes on a number of layers, and the nodes in each layer are typically arranged in a square (for 2D) or cubical (for 3D) array. Each node has a parent on the layer above, neighbors on its own layer, and children on the layer below. Each of these nodes could be represented in OpenCog by an Atom, say of a new type “HierarchicalLearningNode” (HLNode).
A few notes on particulars:
- The truth value strength of an HLNode does not seem useful, but the confidence may be useful (as a measure of how much evidence the HLNode’s output is based on).
- The AttentionValues (specifically, STI values) of these nodes should be useful. STI values of HLNodes may be used to guide how much attention gets paid to that node during perception processing.
- HLChildLink and HLNeighborLink could be used to link HLNodes, forming a deep learning network with flexible topology.
- In many cases, an HLNode would refer to a specific region of observed spacetime (or just space in the case of 2D vision processing, or just time in the case of nonlocalized sound processing). In this case, the HLNode would be linked by atTime and atLocation links indicating the temporal and spatial intervals it refers to.
An HLNode would also generally have (at least) two GroundedSchemaNodes associated with it, e.g.
ExecutionOutputLink GroundedSchemaNode “runTrainedAutoencoder.py” HLNode $H
would run the trained autoencoder network created for $H on the output of $H’s children (or in the case of the HL nodes on the lowest level of the network, on an appropriate subset of raw data). The schema would then put the output in a FloatArrayNode linked to $H, e.g.
EvaluationLink PredicateNode “AutoencoderOutput” HLNode $H FloatArrayNode $F
ExecutionOutputLink GroundedSchemaNode “trainAutoencoder.py” HLNode $H
would train an autoencoder network corresponding to $H.
Behind the scenes, we would need a table storing trained autoencoders (or other networks, as appropriate) as indexed by HLNodes.
The actual python functions training and running autoencoders (or other networks) could make use of the Theano library (also used by pylearn2), which takes python multiarray functions and efficiently executes them on a GPU. So in this way the “hard work” is done on the GPU, but the network is defined in OpenCog, and the results of applying the trained neural nets to data are represented in OpenCog.
Using Theano here, rather than rolling our own library for mapping operations onto the GPU, has the advantage of allying our efforts with an already active open-source effort (Theano and pylearn2 are under active development and maintenance by Yoshua Bengio’s lab at U. Montreal).
Note that in this proposed approach, the actual contents of the trained autoencoders, LSTMs or other network are not made available for inspection by OpenCog cognitive processes. This is not ideal for the long term but seems fine for the next state of experimentation. When we are ready, it wouldn’t be hard to create functions for expanding these neural nets into Atom form, though we still might want to have them exist in another “compiled or whatever” form for rapid evaluation.
How would pattern mining work here? One approach would be to maintain a separate Atomspace as an “HL Network state history.” When a new “Autoencoder Output” (for example) is recorded for an HLNode, the old one is not just deleted, it’s moved to the HL History Atomspace, with an appropriate time-stamp. In the HL History Atomspace, the FloatArrayNodes are expanded into sets of EvaluationLinks representing properties (one property per coordinate of each array). The HL History Atomspace then has the right format for running the Pattern Miner.
A pattern recognized in this way could then be automatically transformed into a pattern recognizable in a network of HLNodes and FloatArrayNodes. So then as the state of the network of HLNodes changes, as new perceptions come in, the library of “common and surprising perceptual patterns” can be applied to the network, and patterns identified as currently present can be linked to the appropriate HLNode. (Linas will note that this is “reverse pattern matching” of the same sort used in AIML or Sureal…)
And how would giving cognitive feedback to guide the deep learning network work here? Via adding extra features derived via inference (e.g. PLN or other methods) or long-term conceptual memory (recognition of familiar patterns extracted by the pattern miner, in the current network state), as additional entries appended onto the FloatArrayNodes used as inputs to the HLNodes’ NN-training schema.