Embodiment Architecture (2012 Archive)
This document covers a conception of how embodiment was implemented in the 2010-2015 timeframe. It contains many very good and spot-on ideas; but equally, there are many omissions and oversights. The code implementing this architecture was used to drive a number of game-world demos in the Unity3D virtual avatar space. The code was removed from github in the summer of 2015, as a part of a general cleanup. Although the code was removed, many of the general ideas here remain valid. The current architecture is referenced in the embodiment page.
Embodiment Architecture Description
This document covers a description of the architecture for Embodiment along with a high-level design for two important components responsible for avatar control and shared long term memory.
What's not covered: This document doesn't yet cover provisions for fault tolerance and recovery, which should be added later. It's purposefully light on scalability, because the correct path for scalability depends on requirements that have not yet been clarified.
This document also currently ignores the fact that we'll need multiple servers dedicated to OACs as well as multiple learning servers. So it won't get into details of load balancing, request routing, persistence, and redundancy, etc.
Multiple avatars design choice
A major architectural choice is the design for controlling multiple avatars.
There are two ways to go here:
- develop the OAC so a single process can handle multiple avatars using a thread pool, or
- have multiple processes running on each machine in the OAC cluster, with each avatar associated with one OAC process. Both options have advantages and disadvantages. The former alternative is ultimately more scalable, mostly because it uses less memory per avatar. The latter option is much simpler and avoids some issues like multithreading.
With multiple avatars per process we can share objects among avatars, such as combo trees and, possibly, some perceptual data. Whether the benefits of doing so compensate the complexity of multithreading and the possible performance impact remains to be seen. One key factor is whether avatars will "live" in dedicated Virtual World areas, or wander around the whole virtual world. The initial vision for Avataraverse was to have all avatars living in "avatar islands". However, scalability issues (depending on the Virtual World) may render that idea impossible, as the system slows to a crawl when too many avatars occupy the same area. Clearly, with many avatars spending most or all their lives in the same area, there's a large incentive to share perceptual data. If avatars wander as often as their owners typically do, on the other hand, the memory savings would be much less.
Pragmatically, we have chosen to have multiple processes running in each machine in the OAC cluster (although we don't have an OAC cluster implemented yet), while maintaining a single avatar for each OAC process. This simplified the usage of AtomSpace and other data structures in an OAC and avoided eventual issues with multithreading and other related things at the beginning of the project.
The diagram below shows an updated view of the main components in the Embodiment "Brain". An overview of each component is given below the diagram:
- Embodiment Proxy (Proxy): this component abstracts away all details of interacting with a Virtual World. It's responsible for giving us perceptual data, action feedback data (including when an action fails; when a collision occurs, for example), training data (start and stop training sessions, sequences to imitate, rewards and punishment) and receiving commands -- which may be given as command sequences.
- Perception & Action Interface (PAI): this component converts data coming from the Embodiment Proxy into the format desired by OpenCog. It also encapsulate actions (represented as schemata) into requests sent back to the Embodiment Proxy.
- Global World Map (GWM): this is a data store that will maintain information about the location of places and objects in the world. The global world map is shared by all objects and is used for path formation between distant places in the virtual world, as well as for navigation within a given location.
- Operational Avatar Controller (OAC): this is the process in which a avatar "lives". It interacts with the Proxy (sometimes through the Perceptual Preprocessor) and implements the cognitive architecture that selects actions to perform through a Rule Engine, monitors their execution, and attempts to fulfill the system's goals. The OAC also creates learning tasks when requested by the avatar's owner, and dispatches those to learning servers. The results of these tasks are schemata (or procedures) which are added to the list of possible actions executed by the avatar. Initially, the OAC controls a single avatar. Avataravese interpretation is done in the OAC, in the context of each avatar that receives Avataravese commands.
- Collective Experience Store (CES): contains a large AtomSpace where all avatars dump perceptual, action, and state data. There are three processes acting on this: forgetting, querying, and abstraction. The first manages memory by dumping old perceptions and actions. Querying is used mostly by the learning servers. Abstraction covers both the process of converting raw perceptions into behavior descriptions and the process of mining patterns from these descriptions.
- (this component is not implemented yet)
- Learning Server (LS): is the component that handles GP or MOSES learning. Ideally, the learning server can be built around the learning algorithm so the same LS code can support both GP and MOSES, even if we decide to deploy two different learning pools for other reasons.
Communications among these components take place via socket messages. A router is used to exchange messages between components, including the Embodiment Proxy. Internally, OAC lives while the avatar is loaded within the Virtual World and it always acts as a client, initiating the communication with the CES and the LearningServer, which are assumed to always be up and running, listening for requests. See CommunicationsDesign for more details.
Communications between OAC and Proxy
Data communications between OAC and Proxy uses XML messages. A set of XSD files defines the format of all XML messages. You can find the XSD files at opencog/embodiment/Control/PerceptionActionInterface directory in the OpenCog codebase repository. The following kinds of message are exchanged here:
- Perceptual data from the Proxy to OAC. This includes physical data, sensorial perceptions, and kinaesthetic perceptions.
- Avataravese commands (instructions) from Proxy to OAC.
- Action sequences from OAC to Proxy.
- Status (success, errors, warnings) on actions or action plans from Proxy to OAC.
- Emotional indicators (love, excitement, fear, etc) from OAC to Proxy.
- Informative text to the user (avatar's owner) from OAC to Proxy. This is for reporting important things about the avatar via text chat (or any other text-based widget). For example, when avatar enters or exits learning mode; or when any internal error happens and user should be warned.
Type 4 is automated status, which generates errors when the avatar tries to walk through a wall or tries to grab an obstructed object, warnings when a collision is about to occur, etc. User-level feedback during learning is given through Avataravese.
Perceptual data comes through in the following format:
<?xml version="1.0" encoding="UTF-8"?> <avatar:avataraverse-msg xmlns:avatar="http://proxy.esheepco.com/brain" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://proxy.esheepco.com/brain BrainProxyAxon.xsd"> <map-info global-position-x="-124000" global-position-y="186000" global-position-offset="64000"> <blip timestamp="2009-05-12T20:09:06.515" visibility-status="visible" width="100.0" length="200.0" height="200.0" edible="false" drinkable="false" avatarHome="false" foodBowl="false" waterBowl="false" detector="true"> <entity id="534322" name="p1" type="avatar" owner-id="317" owner-name="Sally"/> <position x="-91617.0" y="210417.0" z="0"/> <rotation pitch="0.0" roll="0.0" yaw="-2.1516163939229855"/> <velocity x="-0.0" y="-0.0" z="0"/> </blip> <blip timestamp="2009-05-12T20:09:06.515" visibility-status="visible" width="529.8" length="4537.6" height="2178.7" edible="false" drinkable="false" avatarHome="false" foodBowl="false" waterBowl="false" detector="true"> <entity id="534306" name="fence3" type="structure"/> <position x="-81695.0" y="214680.0" z="0"/> <rotation pitch="0.0" roll="0.0" yaw="-1.5707963267948966"/> <velocity x="0.0" y="-0.0" z="0"/> </blip> <blip timestamp="2009-05-12T20:09:06.515" visibility-status="visible" width="300.0" length="50.0" height="5.0" edible="false" drinkable="false" avatarHome="false" foodBowl="false" waterBowl="false" detector="true"> <entity id="534318" name="Stick" type="accessory"/> <position x="-99049.0" y="212552.0" z="0"/> <rotation pitch="0.0" roll="0.0" yaw="-1.5707963267948966"/> <velocity x="0.0" y="-0.0" z="0"/> </blip>
Perceptions given like this require some simple transformations. It's up to the Perception/Action Interface to turn these sensorial perceptions into an hypergraph representation to be added into OAC's AtomSpace. Aside from these sensorial perceptions we also have kinesthetic perceptions about eating, drinking, rewards.
Avataravese commands are delivered as strings and parsed by the Avataravese interpreter, which generates the corresponding commands that will be handled by the OAC according to its current state.
Actions are always specified as sequences (action plans) even if a plan contains a single action. Typically, a plan corresponds to a schema (or procedure). However, a more elaborated or complex schema may be broken down into multiple plans based on decision points (loops and conditionals) or other aspect (a pause between 2 actions, for example). The schema interpreter is responsible for generating and sending out plans through the Perception/Action Interface.
An example plan is shown below:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <avatar:action-plan xmlns:avatar="http://proxy.esheepco.com/brain" entity-id="534322" id="3"> <action name="drop" sequence="1"/> <action name="sniffAt" sequence="2"> <param name="target" type="entity"> <entity id="317" type="avatar"/> </param> </action> </avatar:action-plan>
The current vocabulary of actions, along with their parameters, is defined at opencog/embodiment/Control/PerceptionActionInterface/ActionType.h.
One thing to note is that navigation isn't an action, even though it will be available through elementary schemata. Navigation's job is to generate an action plan that takes the avatar to the desired object or location, using any navigation algorithm (TangentBug, A* or HPA). This is another case in which a single schema involves multiple actions plans for robustness reasons.
A key point is that an action in the middle of a plan can fail. OAC needs to be able to handle this. Since our combo schemata do not include "exception handling" code by default, a failed action will result in a failed schema -- execution of that schema will be halted and action selection will be invoked to pick a new schema. A worthwhile exception is when navigation fails because the world has changed. In that case, failure will occur during execution of a goToObject or goToLocation function, and recovery is automated: a new path is generated and a new action plan is created.
Action execution status is useful not only to report failures, but also to indicate to the schema interpreter the last successful action.
Communications with Learning Servers
As already mentioned OAC acts as a client of LearningServer. When the user (the avatar's owner) decides to teach a trick to a avatar, he/she must enter a specific command so that avatar enters Learning mode. Then, the user must say that someone (including him/herself) will give exemplars of a trick. After giving the exemplar, the OAC sends a request to the LS to initiate it learning the trick. When the user asks the avatar to try the trick, LS must send back to OAC the best schema candidate it found for the trick so far. Upon reception of that schema candidate, OAC should execute that schema so that user can see the avatar trying the trick. User can then give reward/punish feedback on what the avatar executed, which is passed to LS so that it decides what path to go next. See more details on how LS works in LS design
High Level Avatar Design
We now switch to a high level design of the code for a single avatar. Since there is one avatar per OAC, this design will cover both the OAC and the avatar living inside it. At first, implementation of separate classes for the OAC and the avatar should be done carefully, so things like communications are done at the OAC level; while things like feelings, predicates, schemata and other avatar-specific data are stored inside the Avatar object. This should give us the necessary flexibility to evolve the architecture as needed. However, most of the Avatar-specific data is actually stored in an AtomSpace and due to the design for OpenCog servers, which uses singleton pattern for its AtomSpace, the evolution of the architecture to have multiple avatars per OAC become even harder now.
A avatar contains:
- A bunch of basic data such as name, id, type, owner's id and personality (traits) parameters.
- A bunch of schemata, corresponding to built-in and learned behaviors. Built-in behaviors include wandering around and taking appropriate action when hungry, tired, thirsty, etc. Learned behaviors include both predetermined behaviors that the avatar will "learn" without reinforcement and those learned through imitation and reinforcement. Schemata are stored as combo trees (or combo procedures), which combine elementary actions, elementary perceptions, along with a basic vocabulary of generic schemata (decisions, loop, logical and arithmetical operations).
- A list of feelings (both emotional and physiological), which should be updated as the time goes on and as new perceptions and actions take place.
- An AtomSpace holding a small hypergraph of recent perceptual data, recent actions, a list of known entities and their properties, and other important memories (such as locations where the avatar has found food and water in the past).
The avatar controller, in this initial single-avatar version, needs to provide:
- A schema (procedure) interpreter. This is based on the current combo interpreter. The schema interpreter needs to be able to "pause" execution implicitly as it waits for status from the Proxy regarding completion of an action or action plan. These pauses in schema execution are necessary when we need to evaluate loop termination, conditionals, or when we need to reach a given location, which is done through navigation abstracted away from the schema. As navigation may fail, multiple iterations may be needed.
- A Avataravese interpreter, which consists of a bunch of handlers for Avataravese commands. The design for Avataravese support is described in a separate document (see Avataravese parser).
- The Perception/Action Interface (PAI)
- Communication interfaces with the Proxy, CES, and LS
- Saving and loading support for a avatar.
Each avatar operates like an autonomous agent running an infinite loop, where each iteration is defined as one volley of updates from the Proxy and optional commands sent back to the Proxy (if the avatar is executing a long action sequence and everything is going fine, there may be no commands for a while). Updates to the CES are also sent after each iteration, and updates from the Proxy may trigger learning tasks sent to the LS.
See Perception/Action Interface document for details on this.
Built-in Elementary Schemata
Elementary schemata correspond to:
- Elementary world actions - some individual actions like drop, sit, bark, etc, which needs no argument and can be executed at any time.
- Abstractions for navigation - which look like single schemata but are handled by a navigation algorithm and then converts these calls into a sequence of elementary world actions.
- General purpose schemata for logical and arithmetic operations, decisions, loop, etc.
Built-in Behaviors (Composite Actions)
These are combo trees (or combo schemata or procedures) that implement the avatar's innate behaviors. See examples of these behaviors here. These schemata include actions taken to handle emergencies (usually to reach its physiological and emotional needs) such as when the avatar is too hungry, thirsty or tired; or when it's too excited, angry or in love. For example, the wander_searching_food, which should be executed when the level of hungry or energy of the avatar reaches certain values. Or yet a playInvitation schemata, when the avatar gets excited or when it wants to avoid to be boring.
Each built-in behavior/schema must be activated by a rule, which has a precondition which tells whether it is active or not -- e.g., the wander_searching_food schema can only be executed if avatar is hungry and there's no known source of food nearby. This is done through the RuleEngine, which evaluates the precondition of a bunch of rules that defines the avatar behavior in order to select the next action (schema) to execute, update the avatar emotional feelings and the relations between the avatar and all other objects in the world. See more details on RuleEngine here
Avatars need persistence, so we can save inactive avatars to disk and improve system scalability and resistance to failures. For avatar persistence we currently use the Opencog's AtomSpace saving and loading code to handle the information that's inside the avatar's AtomSpace. There is few information outside the AtomSpace that needs to be persisted and it's currently saved in a separate single text file.
A simple forgetting algorithm is used, which removes Atoms based on age.
Perception data can be forgotten as soon as it's transformed into behavior descriptions. Action and state data can be discarded once it's been used for RuleEngine. Behavior predicates should last longer, and patterns learned from these predicates should last even longer.
However, while atoms are being used in a learning task, they should be tagged so they won't be forgotten until the task is done. Also, learned schemata may include some of these Atoms, in which case they shouldn't be forgotten at all.
Both here and in the OAC we'll use AttentionValues for Atoms, but no economic attention allocation. Simple rules for STI decay enables forgetting, and we'll just boost the LTI of Atoms that shouldn't be removed.