GSoC 2008: MOSES: the Pleasure Algorithm

From OpenCog
Jump to: navigation, search

Basic Details

Abstract

Learning: the cognitive process of acquiring skill or knowledge. There are now a lot of ways for machines to learn, but the question of speed, quality and overall effectiveness is always present. “Program Learning via Ensemble Analysis and SUbstitution of Repeated Entities.” or in short PLEASURE is a learning algorithm aimed at taking effective use of the strong bias toward hierarchically structured programs. The goal is to implement and test the effectiveness of the algorithm during the summer.

Contributors

Mentors

List of Suggestions

  • Ben Goertzel suggested using Moses deme management system and the Reduct library

Work Log 2008

Prior to May 26

  • Finished reading the Moshe Looks PhD thesis on MOSES
  • Started reading Pelican's thesis on BOA

May 26 - July 26

  • Read a few papers on computer learning (including Pelican's and Moshe Looks' thesis)
  • During discussions with Nil and Ben worked on specifying the algorithm, searching for the right source packages available for using withing the algorithm
  • Semi-final decision (if there are no serious arguments against) of using program tree enumeration up to a certain size as a sub-algorithm for PLEASURE. Since it is easy to code, adding support to tokens and subtrees isn't that difficult.
  • For the algorithm for the main algorithm, the idea of using the FREQT [1] package was brought up. Currently Nil and I are discussing, if it would be a good choice because some of the definitions seem different from the ones in PLEASURE, but the parts like tree representation are very similar, so it wouldn't be difficult to use it withing PLEASURE.
  • Worked out the clear specification on the algorithm:
    • Counting the node usage and defining its importance based on the score. Possible implementations include simple iteration on running the FREQT package with the size limit of one.
    • Counting the frequent subtrees in the Best and Worst trees. Until now, the idea is to use the FREQT package for the frequent subtree mining, and use relative entropy to check, if the subtree fits into the FU category.
    • SUE would include adding "template tokens" to the node list to use in program generation. The possible idea of finding the nodes that fit into the categories would be adding the templates to the program ensembles and then running the FREQT package to see, if there are other identical subtrees.
  • Started learning to use the classes and structures already available in the latest release of MOSES (such as the Metapopulation structure for storing the program trees)

July 26 - August 02

  • Discussed with Nil, Ben and Moshe about the possibility of using MOSES deme management in PLEASURE. For the moment, the idea is to go with a simple first implementation with only using Combo/Reduct from the MOSES code base, and later (probably after summer) to explore other different possibilities and implementations.

August 02 - August 09

  • Decided on how the enumeration algorithm in the primary implementation will work. Using the type-checker engine, a table will be created, containing each node with all its possible children, for example
  1. action_boolean_if [with type ->(boolean action_result action_result)]
  2. is_food [with type boolean]
  3. is_empty[with type boolean]
  4. go_forward [with type action_result]
  5. turn_left [with type action_result]
  6. eat_food [with type action_result]
  • The table from this list of nodes would look something like action_boolean_if : ( {is_food, is_empty} {go_forward, turn_left, eat_food} {go_forward, turn_left, eat_food} ). Since other nodes are leaves, their table is empty. Having these tables, generating trees up to a certain size is a trivial matter. Since the type checker supports custom nodes, subtrees can certainly be supported, and if creating the categories out of nodes with same signatures, then category nodes can be supported too.

August 09 - August 23

  • Implemented tree generation, reduction and evaluation.
  • For the tree generation, a simple tree enumeration up to a certain size is used. The method is described in the weekly update above. As suggested by Nil, additional features are added. First of all, there has to be a list of possible nodes to be passed to the program (the current implementation used a file to do that). The trees are only created from the nodes given to them. In addition, a signature is passed to the tree enumeration algorithm, so that it is ensured, that the generated programs have the same signature as intended. Since the algorithm works by adding new layers to the trees each run, on the first run the output type of the program is made to be the correct one. After that, each level added, the trees are checked to eliminate the ones that are impossible to meet the requirements. The main problem that arises, is that the number of trees increases exponentially, so a lot of memory needs to be used. Some optimization can be done here.
  • After the program trees are generated, they go through reduction. Again, the procedure takes in the trees and reduction rules and gives a set of reduced programs. This is needed to avoid the multiple programs that do the same things. One possible way of optimizing the work is reducing programs after generating each level. This might drastically improve the memory usage.