ImitationLearningHowTo (Embodiment)

From OpenCog
Jump to: navigation, search



Imitation Learning How To

Installation

First of all you compile OpenCog code and the multiverse proxy code

Getting and installing OpenCog

Get OpenCog with

  bzr branch lp:opencog

then go under it

  cd opencog

create a build directory

  mkdir bin

and compile using CMake

  cmake ..
  make && make examples && make test

For more information on how to compile OpenCog click here

Getting and installing Multiverse Proxy

To get the multiverse proxy:

  bzr branch lp:~opencog-dev/opencog/embodiment_MV1.5-Proxy

Then all instructions on how to compile the proxy can be found in README under the root of the proxy. README also contains instruction to install the Multiverse Client (MV client for short) under windows.

Running the MV client and performing imitation trick learning

Running the MV client

Instruction to run the petBrain, the MV server and the MV client are given in the README under the root of the proxy code.

Imitation learning involving only the owner

Generic instructions

A pet's owner can teach his pet how to perform trick by imitation learning. That is, he shows what we call an exemplar of the behavior he wants the pet to mimic and correct it by a reward mechanism to help it reach the desired trick.

The generic commands are (X must be replaced by the name of the trick) :

  learn X

The pet switches in learning mode, the owner can inform the pet he is going to show it the trick :

  I will X

Then the owner has to perform the trick, once performed it must inform the pet that this is the end of the trick :

  done X

Then the owner has to wait a bit to let the pet discover what program is hidden behind the exemplar that has just been given by the owner. At any moment the owner can decide to ask the pet to make an attempt of reproducing the trick by the following command :

  try X

The pet is going to perform the trick, if after a dozen of seconds it does not (maybe it is doing something more important to itself like eating) the owner can reformulate try command as many times as it takes to make the pet try the trick. After the pet has tried the owner can reward the pet positively :

  good boy

or negatively

  bad boy

At this point if the owner is satisfied by the result he can stop learning with the stop learning command :

  stop learning X

And he can later recall the trick learned while in playing mode by issuing the command :

  do X

Or if not satisfied he can either wait a bit more (maybe with more computation the pet is gonna reach the right program) or he can produce another exemplar (some tricks can only be learned by showing several exemplars, like for instance tricks involving conditions on the context of the execution of the trick, there are called contextual tricks). To show another exemplar the owner just needs to type again :

  I will X

Show the other exemplar

  done X

just like for the first exemplar. He can repeat that as many times as he wishes to show many exemplars of the trick.

Examples

Going to sit near the bone

  learn sit_near_bone
  I will sit_near_bone

The owner must go to the bone and then sit (by type the command /sit)

  done sit_near_bone

Then wait for a few seconds (on a modern computer)

  try sit_near_bone

The pet is eating and does not reproduce the trick, so let's ask it again :

  try sit_near_bone

The pet goes to the bone but does not sit, we can inform it that it is not the right trick :

  bad boy

And wait a bit more...then after a few seconds we ask it to try again :

  try sit_near_bone

(Note that if when asking the pet is near the bone it's better to grab the bone and move it away to be sure the pet has caught on the right trick). The pet goes to the bone and sit we must reward it :

  good boy

Then we stop learning :

  stop learning sit_near_bone

Contextual trick, sit if bone is near ball, bark if bone is not near ball

First of all before starting learning grab the bone move it near the ball.

  learn context_trick
  I will context_trick

Enter the sit command /sit

  done context_trick

Then take grab the bone and move it away from the ball, then type :

  I will context_trick

Enter the bark command /bark

  done context_trick

Then let the pet think a bit about what he's just seen... Note that we have entered 2 exemplars contiguously without asking the pet to try the first exemplar.

  try context_trick

The pet should bark because the bone is away from the ball. The problem here is that it is not possible to ask the pet to retry the same trick (we would want to do that to check that it sit when the ball is near the bone), if the owner retype =try context_trick= the learningServer is gonna send a new trick as it interpret that retry by the fact that the owner is not happy with the current proposition and wants another one. We fix this problem later by adding another command like retry or applying a different control strategy that makes sense to the user.

But for the developer there is another way to check the trick that has been learned, by consulting the log file of the learning server.

Consulting the log file of the learning server to know what trick has been learned

The log file of learning server is located at /tmp/USER_NAME/Petaverse/Logs on the machine that runs the learning server and is named LS. There are usually very large files (several GBs) so you can open then with the command less, vim or simply greping their contents (emacs would not open such a huge files).

Let's grep LS to find out which trick has been learned after asking the pet to try :

   grep Trying LS

Which returns all the tricks that has been sent to the OAC every time a successful try command is treated by the learning server.

Imitation learning involving the owner and another avatar

It is possible to teach trick by the help of another avatar, in this case the other avatar provides the exemplar.

Generic commands

In the given commands bellow X corresponds to the trick name and Y to avatar name.

  learn X with Y
  Y do X

Then Y shows an exemplar of X. All other commands are identical to the imitation learning with the owner only.

Examples

Playing fetch with Jill

Let say the owner is Fred and the avatar to imitation is Jill

  learn fetch_the_ball with Jill
  Jill will fetch_the_ball

Jill goes to the ball, grab it, goes to the owner Fred and drops it.

  done fetch_the_ball

Then we wait for a few seconds, during this time take the ball and put it away from Fred.

  try fetch_the_ball

The pet fetches the ball and brings it to Fred.

  good boy
  stop learning fetch_the_ball

Checking the effect of trick learning

To check that learning is taking place as expected one can obviously watch the effect of the learned trick by the command

try

in learning mode or

do TRICK_NAME

in playing mode.

But one can also print the exact Combo program that has been selected by LS as well as the trace of the search. For that one can look at the log file /tmp/$USER/Petaverse/Logs/LS. To find the Combo program suggested by LS upon request (by a try command) one can search the word "Trying", and to get the trace one can search "SPCTools", all log messages containing the word "SPCTools" are used for calibrating the Occam's razor penalty function, see OccamsRazorCalibration_(Embodiment), and contain a lot of useful information regarding the trace of the search algo.

More materials

A list of tricks that we're interested in and their status can be found at :

ImitationLearningTrickList

Videos of imitation learning on YouTube:

Also it is good to know that when the execution is done with the logger on the DEBUG level or above the traces of the entire learning process is recorded in the learning server log, that can constitute a great source of information to understand why the learning process fails or succeeds.

Tweaking the learning server to modulate the efficiency of learning for a given class of tricks

When learning is not doing as fast as one would wish there are several things he can to improve that, at least from a particular trick or a class of tricks.

All options described in this section can be found in the file src/dev_config/config.cfg or the embodiment project or directly in the file dist/REV_NUMBER/config.cfg, the later affecting only the current installation.

Tweaking shared by all learning algorithms

Number of fitness estimations per cycles

  NUMBER_OF_ESTIMATIONS_PER_CYCLE = N

Determine how many fitness estimation the learning server run without interruption, if that number is too high then the learning server may be less reactive but if it is too low then more resources are wasted for learning server management rather than learning itself.

Perception and action filter

  ENTROPY_PERCEPTION_FILTER_THRESHOLD = F

determine the threshold above which perceptions should be considered for learning, if too low then too many perceptions are going to pass the filter slowing down learning, if too high then the perceptions involved in the desired trick may not pass the filter and the trick may not be learnable. F measure a threshold of entropy of perception occurring in the scene during the time interval of the exemplars.

  SIMILARITY_ACTION_FILTER_THRESHOLD = F

determine the threshold above which actions should be considered for learning, if too low then too many actions are going to pass the filter slowing down learning, if too high then the actions involved in the desired trick may not pass the filter and the trick may not be learnable. F measure a threshold of similarity of actions with the actions given in the exemplars.

  ACTION_FILTER_SEQ_MAX = N

determine the maximum number of actions in a sequence of actions that matches the exemplars that must be provided as building blocks for the trick to be learned. For instance if the exemplar is the following action sequence A B C and N=2, then and_seq(A), and_seq(A B), and_seq(B), and_seq(B C) and and_seq(C) are provided as building blocks of action sequence in the learning algo. Having a high value can speed up greatly learning involving long action sequences but usually slows down learning involving short action sequences.

Weighting operators

It is possible to weight a few operators in order to favor them or not in the search process (this affects greatly hillclimbing, maybe less MOSES but still probably a bit).

  WHILE_OPERATOR_SIZE = N

A small size would favor operators involving loops, like boolean_while or action_while. Negative values are possible as well.

  CONDITIONAL_SIZE = N

A small size would favor operators involving conditionals, like action_boolean_if, action_action_if. Negative values are accepted as well.

Solomonoff bias

The fitness estimation includes a size penalty in its calculation as to favor shorter candidates. It is possible to parameterize the that size penalty function with 2 values :

  SIZE_PENALTY_COEF_A = F
  SIZE_PENALTY_COEF_B = F

These 2 coefficients can be changed manually of course but can also be found more optimally using the SPCTools which is located under the scripts directory of the embodiment project.

Choosing the learning algorithm

There are currently 2 algorithms coded for imitation learning, HillCimbing and MOSES. The option name is :

  IMITATION_LEARNING_ALGORITHM = X

with X being HillClimbing or MOSES.

HillClimbing although rather limited in principle has been largely tweaked and optimized for imitation learning and offers quite good results, this the default algo.

MOSES remains to be experimented, tweaked and optimized and should later replace HillClimbing or simply be used instead of HillClimbing when it is expected to provide better results (it might be possible to introduce a controller that decides which algorithm to run with which particular parameters according to a given problem, such a controller could be subject to learning as well, or should I say meta-learning).

Parameters affecting HillClimbing only

Neighborhood extension paramters

  ACTION_BOOLEAN_IF_BOTH_BRANCHES_HC_EXPENSION = B

If B=1 then the neighborhood extension algorithm used in hillclimbing fills the 2 branches with atomic actions when a conditional is added inserted. That is for instance if and_seq(A) is the candidate to extend, then and_seq(A action_boolean_if(P and_seq(B) and_seq(C))) will be considered as neighbor.

Otherwise only extension with the first branch will be considered (it doesn't matter that we do not considered the second branch because not(condition) will automatically swap the 2 branches after reduction).

Policy when a new exemplar is received

  HC_NEW_EXEMPLAR_INITIALIZES_CENTER = B

If B=1 then when a new exemplar is received then hillclimbing restart the search for the empty candidate. This may sounds a bad idea but in some case it is preferable, when for instance the first exemplar would drive away hillclimbing from the solution because the solution of the first exemplar alone is syntactically too different than the solution of the trick itself.

Parameters affecting MOSES only

None for the moment.

-- Main.NilGeisweiller - 29 Oct 2008