Enabling MOSES for Reinforcement Learning in the Unity3D World

From OpenCog

Goal: To enable use of MOSES for procedure learning in the Unity3D game world

Example of desired behavior: Have MOSES learn a procedure that will make the agent wander around and pick up batteries, to be triggered when the system needs energy


T1) Create EmbodiedProcedureLearningServer (EPLServer), a separate CogServer that communicates with the OAC and runs MOSES learning processes associated with learning procedures to be carried out in the game world. For now we will assume EPL Server runs in the same machine as OAC, so it can make easy reference to the OAC's Atomspace via inter-process communication.

T1.1) Make a headless version of Unity3D, to enable rapid experimentation with procedures that don't require interaction with human players

T1.2) Write communication code that, based on a signal from the GoalDrivenProcedureLearning MindAgent in the OAC, tells the EPLServer when to launch a new MOSES process .. and when to halt a MOSES process. Also, that sends reasonable answers back from the EPLServer to the OPC once they are obtained (in the simplest case, the best K procedures could be sent back to the OPC at the end of the MOSES run. In a more sophisticated version, MOSES could send interim results to the OPC while it keeps thinking and trying to learn better results.)

T1.3) Make a customized MOSES fitness function, that checks OpenPsi (in the OAC) to assess the fitness of a procedure. (I.e. while a procedure runs, it updates information into the AtomSpace in the OAC; and then after it's been running a while, OpenPsi in the OAC is polled to assess the overall satisfaction level of the agent.)

T2) Make MOSES support learning of procedures to be enacted in the Unity3D game world

T2.1) Enable "action nodes" in MOSES, corresponding to Unity3D actions

T2.2) Make a simple test program that runs a single Combo tree created by MOSES, in a way that causes it to make actions happen in the Unity3D world

T2.3) Look at the MOSES Occam and diversity penalties in the context of the Unity3D procedure learning context. (I recall Nil did some special tuning with the Occam Penalty in the context of MOSES imitation learning in the Multiverse world.)

T3) Create an ActiveSchemaPool object(?) associated with the OAC, which contains the schema currently being executed, and mediates their execution

T3.1) Write code porting the plans emanating from Shujing's planner into Scheme format, and wrapping them in GroundedSchemaNodes

T3.2) Code the ActiveSchemaPool, along with a simple SchemaExecution MindAgent associated with it, that (each step through the cognitive cycle) allows the GSNs in the ActiveSchemaPool to propose actions. Code a simple ExecutionManager that chooses which of the proposed actions gets executed. (The very first version of the ExecutionManager could just allow one schema at a time to execute. A slightly more sophisticated version could include a matrix indicating which Unity3D actions can be carried out simultaneously without conflict, e.g. walking and talking.)

T3.3) Hand-code some simple Combo trees controlling actions and test the ActiveSchemaPool this way

T3.4) Write a clean-up agent that periodically checks the ActiveSchemaPool to see the extent to which the procedures there have Implications to the system's current goals. (In case there is something useless there that should be removed.)

T4) Connect OpenPsi with procedure learning

T4.1) Modify OpenPsi so that, when it identifies a goal as important, it looks to find a GSN with an ImplicationLink to that goal, and then puts that GSN in the ActiveSchemaPool… (The choice of GSN should depend on the importance of the goal, and the truth value of the ImplicationLink.)

T4.2) Create a GoalDrivenProcedureLearning agent that, when there is an important goal, sends a signal to the EPLServer to ask MOSES to find a procedure potentially achieving this goal. (In general this should only be done for goals associated with embodied action. But for starters, as our immediate goal is a system that learns stuff in the Unity3D world, we can just make it generic.)