This system described here is obsolete! The code for this system was removed from github circa 2015. This page is maintained only as a historical archive of what used to be.
This document presents the LearningServer (LS) design and also links to pages about specific parts of the LS
The LearningServer is responsible for handling the learning of new tricks. The LS behaviour consists mostly of receiving messages from the OAC and controlling the execution of the learning algorithms. The LS is based on the EmbodimentCogServer class since it must retrieve and send messages. The message dynamics behaviour is based on the LearningModeDynamics document.
Before presenting the LS design itself it is interesting to define the four message types that the component will process and disseminate.
This message carries the name of the trick/schema to be learned, and a =NMXml= representation of the pet's AtomSpace where are the Behavior Descriptions of the actions enacted by the owner.
The message also contains the latest map from the SpaceServer. Although this component is part of the AtomSpace, currently there is no NMXml representation for it, so the data is serialized into a string with the following representation:
TODO: update the doc of the XML representation
1. timestamp of the the lastest map 2. map dimensions (width, height) 3. number of objects in the map 4. for each object 5. type and name from the object handle 6. number of points for the object 7. for each point 8. point coordinates (x, y)
In order to create a new message there is no need to perform convertions, the LearnMessage class does all the work. It receives a reference to an AtomSpace and creates a NMXml representation of the data and also serialize the latest map from SpaceServer. After that the data is ready to be sent to LS. In the LS side with a pointer to an AtomSpace object is possible to reconstruct the original data including the latest map.
This message carries the name of the trick/schema to be learned, the name of the candidate schema tried by the pet and a feedback from the user. Currently this reward are binary (good or bad) but, if necessary, the range can be expanded.
This message carries the commands used to interact with the LS and the learning process:
stop learning X try X
This message carries a =combo::vtree= representing the learned schema, if this is the final learned schema. It is also used to sent candidate schema to be tried out by the pet. Again, the user does not need to worry about data conversion. The message constructor receives a combo::vtree and produces its string representation to be sent over the network connection. In the OAC side this combo representation is reconstructed and sent to the ComboInterpreter (if this is the case) or saved into the ProcedureRepository.
In the design of the LS for the AI Skeleton there is no concurrent learning, that is, the LS can hold only one learning task at a given time. Also, there is no learning queue since it doesn't make sense to let the pet waiting the start of a learning period. Post-Skeleton LS design can be incremented as needed. As an extension to the NetworkElement interface the LS may perform tasks during idle time. Currently there are no idle task associated with LS component, although it may be necessary in the near future.
Every message received is processed by the =processNextMessage(Message * msg)= method. In this method according to the message type a specific function is called (which remain empty since they are specific for the LearningAlgorithm in question).
If a LearnMessage is received and there is no learning task being executed by the LS then start a new learning process calling the initLearn(LearnMessage *) method. This method should retrieve the behavior descritors and world map and set up the learning environment according to the algorithm being used. When the LS is busy, but the LearningMessage received is related to the trick/schema being learned then a new example is added to the learning process via addLearnExample(LearnMessage *)= method. Otherwise the message is discarded.
If a RewardMessage is received then this information should be used to adjust the learning algorithm . The rewardCandidateSchema(RewardMessage *)= should be used.
If a LSCmdMessage is received the command it carries is used to determine the action to be taken. When a stop learning X command is received then the learning algorithm should be stopped and the best candidate so far should be sent to the OAC to be stored in the schemaRepository. When a try X command is received the algorithm should be paused and the best candidate schema so far should be sent to the OCP to be executed by the ComboInterpreter. All these messages sent are encapsulated in a SchemaMessage.
For the conceptual methodology in the LS, see the paper
For some details on tricks use to make hillclimbing work effectively for simple pet trick learning, see