New DeSTIN Redesign Proposal

From OpenCog
Jump to: navigation, search

This page makes some speculative suggestions regarding a possible major redesign/reimplementation of the DeSTIN approach to machine perception.

Initially written June 18, 2014 by Ben Goertzel.

Goals

  • To create a flexible, easily tweakable framework for implementing, exploring and testing "deep learning" based perception systems
  • To implement a variant of DeSTIN within this framework (perhaps a variant combining aspects of the current DeSTIN with aspects of other deep learning algorithms)
  • Goals for this DeSTIN variant are:
    • Performance on standard test problems that is not dramatically worse than the state of the art
    • Exportation, during any brief interval of time, of discrete symbols indicating what it recognizes in each region of its input. (E.g. in DeSTIN for image processing, this takes the form of the centroids identified as the best match in each patch). This enables easy interfacing with symbolic systems like OpenCog.
  • Initially to test this DeSTIN variant on image classification; and then on video classification
  • Then to test the DeSTIN variant on speech data, and on other time series data (e.g. EEG)
  • To exploit GPU processing inasmuch as is possible without making the system difficult to experiment with


Implementation suggestion

I suggest to use python as the language for implementation, and to make use of the Theano library developed by Yoshua Bengio's deep learning group.

Motivations:

  • Theano allows judicious usage of GPU for multidimensional array manipulation, thus providing much of the advantage of GPUs without needing to code everything in CUDA or similar
  • OpenCog works nicely with python already, allowing MindAgents (cognitive processes) to be written in python as well as in C++
  • python code is relatively easy to experiment with, compared to C, C++ or CUDA. Having a DeSTIN version in python should avoid the problem of developers wanting to experiment with algorithm variants in MATLAB and then needing to port them back to the main codebase
  • As a bonus, taking this approach would potentially get us collaboration from Bengio's students and others in his software ecosystem

Speculative algorithm suggestion (DeSTIN-CNN)

One suggestion for a DeSTIN variant to try, for image processing, is as follows.

As a first step, I would try a layered network with alternating layers of: convolutional layers, and k-means clustering layers

That is, recall that the standard LeNet architecture as described in

http://deeplearning.net/tutorial/lenet.html

uses alternating convolutional and max-pooling layers…. I would replace the max-pooling layers with k-means clustering layers (that identify the cluster characterizing each local patch, based on a dictionary of cluster centroids that is common across a whole layer).

As it happens, Gong et al

http://arxiv.org/pdf/1403.1840.pdf

have implemented a variant of this idea already, with good results.

(note the paper relies heavily on methods from http://hal.archives-ouvertes.fr/docs/00/63/30/13/PDF/jegou_aggregate.pdf as well...)

Now, the Gong et al paper does not do everything DeSTIN does by any means. For instance, it lacks DeSTIN-style top-down feedback.

However, one could put DeSTIN-style top-down feedback into the Gong style architecture perfectly well, simply by taking the state of a higher k-means level and feeding it as "advice" into the vectors being clustered in the lower k-means level...

The main technical problem with putting top-down feedback into a CNN-based system is that one can no longer use plain vanilla backpropagation for learning (as is done in typical CNNs). One would need to use recurrent backpropagation, which is more complicated and much slower, perhaps prohibitively slow for large networks. However, it is possible to use evolutionary learning to learn the weights in CNNs

http://sprocom.cooper.edu/sprocom2/pubs/masters_thesis/cheung2011thesis.pdf

and I suspect that is the best option here.

Finally, one could handle time in this sort of architecture similarly to what they do in

http://research.microsoft.com/pubs/200804/CNN-Interspeech2013_pub.pdf

(in that paper they are analyzing speech so they're using 2D time/frequency inputs. But it seems one could just as well use 3D image/time or 4D image/depth/time inputs.... The convolution layer of a network doesn't really care about the dimensionality of the inputs...)

From an OpenCog perspective, an important thing about this sort of approach (or any DeSTIN-like approach) is that the k-means clusters, at each level, provide some sort of "symbolic" indication of what's happening in each patch on each level This is lacking in CNNs in their traditional form. Also it would be possible to share cluster centroids among different levels, though one would need to take a different approach than we're taking in DeSTIN now....

Overall, the high-level idea here is take CNNs and "DeSTIN-ize" them…. The main conceptual difference from DeSTIN is that the CNN weights are learned globally rather than locally. This moves away from a narrowly brainlike learning mechanism. However, it is a reasonably efficient approach on current computers.

Software Design Suggestions

So as to be able to experiment flexibly with different approaches, it would be good to have python software components such as:

  • 1) A general-purpose CNN layer, which can handle inputs of configurable dimensionality. (That is, a given instance of the CNN layer will deal with inputs of dimension k, for some fixed k. But we want different instances to deal with different dimensions k.)
  • 2) A general-purpose DeSTIN layer, which can handle inputs of configurable dimensionality; and which also has a pluggable learning algorithm inside it (so that e.g. one could experiment with k-means, or EM, or autoencodors, or autoconceptors, or whatever).
  • 3) An evolutionary learning algorithm hooked up to learn the weights of a layer of a deep learning network.
  • 4) A deep evolutionary learning framework enabling the different layers of a network to co-evolve. For instance, in a network with alternating CNN and DeSTIN layers, the CNN layers would be evolved concurrently with the DeSTIN layers doing clustering.
  • 5) A python wrapper for the frequent subtree mining code that works on the current DeSTIN version (or other code with similar functionality)
  • 6) A replication in python of the "self-organizing maps" centroid visualization code that exists in the current DeSTIN version

Experimentation Suggestion

Having built the layers mentioned above, we would want to experiment with python/theano versions of

  • A) pure DeSTIN
  • B) a LeNet style CNN built using the general-purpose CNN layer
  • C) hybrid DeSTIN-CNN without feedback
  • D) hybrid DeSTIN-CNN with feedback

For B we would want to experiment with both backprop and evolutionary learning, and compare the two options.

Testing of these versions could be done on standard image classification corpora, such as CiFAR. Then we could move on to video, audio, etc.