- 1 Imitation Learning How To
- 1.1 Installation
- 1.2 Running the MV client and performing imitation trick learning
- 1.2.1 Running the MV client
- 1.2.2 Imitation learning involving only the owner
- 1.2.3 Generic instructions
- 1.2.4 Examples
- 1.2.5 Going to sit near the bone
- 1.2.6 Contextual trick, sit if bone is near ball, bark if bone is not near ball
- 1.2.7 Consulting the log file of the learning server to know what trick has been learned
- 1.2.8 Imitation learning involving the owner and another avatar
- 1.2.9 Generic commands
- 1.2.10 Examples
- 1.2.11 Playing fetch with Jill
- 1.2.12 Checking the effect of trick learning
- 1.2.13 More materials
- 1.3 Tweaking the learning server to modulate the efficiency of learning for a given class of tricks
- 1.3.1 Tweaking shared by all learning algorithms
- 1.3.2 Number of fitness estimations per cycles
- 1.3.3 Perception and action filter
- 1.3.4 Weighting operators
- 1.3.5 Solomonoff bias
- 1.3.6 Choosing the learning algorithm
- 1.3.7 Parameters affecting HillClimbing only
- 1.3.8 Neighborhood extension paramters
- 1.3.9 Policy when a new exemplar is received
- 1.3.10 Parameters affecting MOSES only
Imitation Learning How To
First of all you compile OpenCog code and the multiverse proxy code
Getting and installing OpenCog
Get OpenCog with
bzr branch lp:opencog
then go under it
create a build directory
and compile using CMake
make && make examples && make test
For more information on how to compile OpenCog click here
Getting and installing Multiverse Proxy
To get the multiverse proxy:
bzr branch lp:~opencog-dev/opencog/embodiment_MV1.5-Proxy
Then all instructions on how to compile the proxy can be found in README under the root of the proxy. README also contains instruction to install the Multiverse Client (MV client for short) under windows.
Running the MV client and performing imitation trick learning
Running the MV client
Instruction to run the petBrain, the MV server and the MV client are given in the README under the root of the proxy code.
Imitation learning involving only the owner
A pet's owner can teach his pet how to perform trick by imitation learning. That is, he shows what we call an exemplar of the behavior he wants the pet to mimic and correct it by a reward mechanism to help it reach the desired trick.
The generic commands are (X must be replaced by the name of the trick) :
The pet switches in learning mode, the owner can inform the pet he is going to show it the trick :
I will X
Then the owner has to perform the trick, once performed it must inform the pet that this is the end of the trick :
Then the owner has to wait a bit to let the pet discover what program is hidden behind the exemplar that has just been given by the owner. At any moment the owner can decide to ask the pet to make an attempt of reproducing the trick by the following command :
The pet is going to perform the trick, if after a dozen of seconds it does not (maybe it is doing something more important to itself like eating) the owner can reformulate try command as many times as it takes to make the pet try the trick. After the pet has tried the owner can reward the pet positively :
At this point if the owner is satisfied by the result he can stop learning with the stop learning command :
stop learning X
And he can later recall the trick learned while in playing mode by issuing the command :
Or if not satisfied he can either wait a bit more (maybe with more computation the pet is gonna reach the right program) or he can produce another exemplar (some tricks can only be learned by showing several exemplars, like for instance tricks involving conditions on the context of the execution of the trick, there are called contextual tricks). To show another exemplar the owner just needs to type again :
I will X
Show the other exemplar
just like for the first exemplar. He can repeat that as many times as he wishes to show many exemplars of the trick.
Going to sit near the bone
I will sit_near_bone
The owner must go to the bone and then sit (by type the command /sit)
Then wait for a few seconds (on a modern computer)
The pet is eating and does not reproduce the trick, so let's ask it again :
The pet goes to the bone but does not sit, we can inform it that it is not the right trick :
And wait a bit more...then after a few seconds we ask it to try again :
(Note that if when asking the pet is near the bone it's better to grab the bone and move it away to be sure the pet has caught on the right trick). The pet goes to the bone and sit we must reward it :
Then we stop learning :
stop learning sit_near_bone
Contextual trick, sit if bone is near ball, bark if bone is not near ball
First of all before starting learning grab the bone move it near the ball.
I will context_trick
Enter the sit command /sit
Then take grab the bone and move it away from the ball, then type :
I will context_trick
Enter the bark command /bark
Then let the pet think a bit about what he's just seen... Note that we have entered 2 exemplars contiguously without asking the pet to try the first exemplar.
The pet should bark because the bone is away from the ball. The problem here is that it is not possible to ask the pet to retry the same trick (we would want to do that to check that it sit when the ball is near the bone), if the owner retype =try context_trick= the learningServer is gonna send a new trick as it interpret that retry by the fact that the owner is not happy with the current proposition and wants another one. We fix this problem later by adding another command like retry or applying a different control strategy that makes sense to the user.
But for the developer there is another way to check the trick that has been learned, by consulting the log file of the learning server.
Consulting the log file of the learning server to know what trick has been learned
The log file of learning server is located at /tmp/USER_NAME/Petaverse/Logs on the machine that runs the learning server and is named LS. There are usually very large files (several GBs) so you can open then with the command less, vim or simply greping their contents (emacs would not open such a huge files).
Let's grep LS to find out which trick has been learned after asking the pet to try :
grep Trying LS
Which returns all the tricks that has been sent to the OAC every time a successful try command is treated by the learning server.
Imitation learning involving the owner and another avatar
It is possible to teach trick by the help of another avatar, in this case the other avatar provides the exemplar.
In the given commands bellow X corresponds to the trick name and Y to avatar name.
learn X with Y
Y do X
Then Y shows an exemplar of X. All other commands are identical to the imitation learning with the owner only.
Playing fetch with Jill
Let say the owner is Fred and the avatar to imitation is Jill
learn fetch_the_ball with Jill
Jill will fetch_the_ball
Jill goes to the ball, grab it, goes to the owner Fred and drops it.
Then we wait for a few seconds, during this time take the ball and put it away from Fred.
The pet fetches the ball and brings it to Fred.
stop learning fetch_the_ball
Checking the effect of trick learning
To check that learning is taking place as expected one can obviously watch the effect of the learned trick by the command
in learning mode or
in playing mode.
But one can also print the exact Combo program that has been selected by LS as well as the trace of the search. For that one can look at the log file /tmp/$USER/Petaverse/Logs/LS. To find the Combo program suggested by LS upon request (by a try command) one can search the word "Trying", and to get the trace one can search "SPCTools", all log messages containing the word "SPCTools" are used for calibrating the Occam's razor penalty function, see OccamsRazorCalibration_(Embodiment), and contain a lot of useful information regarding the trace of the search algo.
A list of tricks that we're interested in and their status can be found at :
Videos of imitation learning on YouTube:
Also it is good to know that when the execution is done with the logger on the DEBUG level or above the traces of the entire learning process is recorded in the learning server log, that can constitute a great source of information to understand why the learning process fails or succeeds.
Tweaking the learning server to modulate the efficiency of learning for a given class of tricks
When learning is not doing as fast as one would wish there are several things he can to improve that, at least from a particular trick or a class of tricks.
All options described in this section can be found in the file src/dev_config/config.cfg or the embodiment project or directly in the file dist/REV_NUMBER/config.cfg, the later affecting only the current installation.
Number of fitness estimations per cycles
NUMBER_OF_ESTIMATIONS_PER_CYCLE = N
Determine how many fitness estimation the learning server run without interruption, if that number is too high then the learning server may be less reactive but if it is too low then more resources are wasted for learning server management rather than learning itself.
Perception and action filter
ENTROPY_PERCEPTION_FILTER_THRESHOLD = F
determine the threshold above which perceptions should be considered for learning, if too low then too many perceptions are going to pass the filter slowing down learning, if too high then the perceptions involved in the desired trick may not pass the filter and the trick may not be learnable. F measure a threshold of entropy of perception occurring in the scene during the time interval of the exemplars.
SIMILARITY_ACTION_FILTER_THRESHOLD = F
determine the threshold above which actions should be considered for learning, if too low then too many actions are going to pass the filter slowing down learning, if too high then the actions involved in the desired trick may not pass the filter and the trick may not be learnable. F measure a threshold of similarity of actions with the actions given in the exemplars.
ACTION_FILTER_SEQ_MAX = N
determine the maximum number of actions in a sequence of actions that matches the exemplars that must be provided as building blocks for the trick to be learned. For instance if the exemplar is the following action sequence A B C and N=2, then and_seq(A), and_seq(A B), and_seq(B), and_seq(B C) and and_seq(C) are provided as building blocks of action sequence in the learning algo. Having a high value can speed up greatly learning involving long action sequences but usually slows down learning involving short action sequences.
It is possible to weight a few operators in order to favor them or not in the search process (this affects greatly hillclimbing, maybe less MOSES but still probably a bit).
WHILE_OPERATOR_SIZE = N
A small size would favor operators involving loops, like boolean_while or action_while. Negative values are possible as well.
CONDITIONAL_SIZE = N
A small size would favor operators involving conditionals, like action_boolean_if, action_action_if. Negative values are accepted as well.
The fitness estimation includes a size penalty in its calculation as to favor shorter candidates. It is possible to parameterize the that size penalty function with 2 values :
SIZE_PENALTY_COEF_A = F
SIZE_PENALTY_COEF_B = F
These 2 coefficients can be changed manually of course but can also be found more optimally using the SPCTools which is located under the scripts directory of the embodiment project.
Choosing the learning algorithm
There are currently 2 algorithms coded for imitation learning, HillCimbing and MOSES. The option name is :
IMITATION_LEARNING_ALGORITHM = X
with X being HillClimbing or MOSES.
HillClimbing although rather limited in principle has been largely tweaked and optimized for imitation learning and offers quite good results, this the default algo.
MOSES remains to be experimented, tweaked and optimized and should later replace HillClimbing or simply be used instead of HillClimbing when it is expected to provide better results (it might be possible to introduce a controller that decides which algorithm to run with which particular parameters according to a given problem, such a controller could be subject to learning as well, or should I say meta-learning).
Parameters affecting HillClimbing only
Neighborhood extension paramters
ACTION_BOOLEAN_IF_BOTH_BRANCHES_HC_EXPENSION = B
If B=1 then the neighborhood extension algorithm used in hillclimbing fills the 2 branches with atomic actions when a conditional is added inserted. That is for instance if and_seq(A) is the candidate to extend, then and_seq(A action_boolean_if(P and_seq(B) and_seq(C))) will be considered as neighbor.
Otherwise only extension with the first branch will be considered (it doesn't matter that we do not considered the second branch because not(condition) will automatically swap the 2 branches after reduction).
Policy when a new exemplar is received
HC_NEW_EXEMPLAR_INITIALIZES_CENTER = B
If B=1 then when a new exemplar is received then hillclimbing restart the search for the empty candidate. This may sounds a bad idea but in some case it is preferable, when for instance the first exemplar would drive away hillclimbing from the solution because the solution of the first exemplar alone is syntactically too different than the solution of the trick itself.
Parameters affecting MOSES only
None for the moment.
-- Main.NilGeisweiller - 29 Oct 2008