EmbodimentLanguageComprehension DocumentationSketch

From OpenCog

Embodied Anaphora Resolution

The agent lives in a world with many objects, each one with their own characteristics. For example, we can have multiple balls, with varying colors and sizes. We represent this using a single node to represent the concept "ball", associated to the word "ball" (and maybe its synonyms), and multiple nodes, each one representing an instance.

As the agent interacts with the world, it acquires information about the objects it finds, through perceptions. The perceptions associated to a given object are stored as other nodes linked to the node representing the specific object instance. All this information is represented using Framenet relationships (described in the next section).

When the user says something like "Grab the red ball", the agent needs to find out the specific instance of ball the user is referring to. We call this process Reference Resolution (RR). RR uses the information in the sentence to select instances and also a few heuristic rules.

In this example, first we select the concept nodes related to the word "ball". Then we examine all individual instances associated to these concepts, using the determiners in the sentence (in this example the determiner is the adjective "red"; since the verb is "grab" we also look for objects that can be fetched). If we find more than one "fetcheable red ball", an heuristics is used to select one (in this case, we choose the nearest instance).

So Reference Resolution maps nouns in the sentences said by the user to actual objects in the virtual world, based on world knowledge obtained by the agent throught perceptions.

We also need to map pronouns in the sentences to actual objects in the virtual world. For example, if the user says "I like the red ball. Grab it." we want to map the pronoun "it" to an specific red ball. This process is done in two stages; first we use anaphora resolution to associate the pronoun "it" to the previously heard noun "ball", then using reference resolution we associate the noun "ball" to the actual object.

Anaphora resolution is an ambiguous process, since a given pronoun can have more than one candidate noun. Our anaphora resolution system is based on Hobbs algorithm\cite{hobbs}. Basically, when a pronoun (it, he, she, they and so on) is identified in a sentence, the Hobbs algorithm searches in the recent sentences the nouns that may fit this pronoun according to number, gender and other characteristics. The Hobbs algorithm is used to create a ranking of candidate nouns, ordered by time (most recently mentioned nouns come first).

We improve Hobbs algorithm results using the reference resolution based on the agent's knowledge of the world to choose the best candidate noun, instead of relying in the original time-based ranking. Suppose the agent heard the following sentences:

1) "The ball is red."

2) "The stick is brown."

and then it receives a third sentence

3) "Grab it.".

the anaphora resolver will build a list containing two options for the pronoun "it" of the third sentence: ball and stick. Given that the stick is the latest mentioned noun, the agent will grab it instead of the ball, using the standard Hobbes ranking.

However, if the agent's history was:

1) "From here I can see a tree and a ball."

2) "Grab it."

Hobbes algorithm returns as candidate nouns "tree" and "ball", in this order. But using the Reference Resolution the agent will conclude that a tree cannot be grabbed, so this candidate is discarded and "ball" is chosen.

So instead of relying only on syntactic information to perform anaphora resolution we improve the results obtained using the Hobbs algorithm using knowledge of the world obtained through embodied interactions with it.

Query Processing System

Our agent is capable of answering simple questions about its feelings/emotions (happiness, fear, etc.) and about the environment in which it lives. To do that, we've created a module for processing questions. First of all, after a question is asked to the agent, it is parsed by RelEx and classified as a yes/no question or a discursive one. After that, RelEx rewrites the given question as a list of Frames (based on Framenet http://framenet.icsi.berkeley.edu), which represent the semantic content of it. The question in Frames format is then processed by the agent and the answer is also written in Frames. The answer Frames are then sent to a module that converts it back to the RelEx format. Finally the answer, in RelEx format, is processed by another module, called NLGen, that generates the text of the answer in English.

Preparing/Matching Frames

In order to answer an incoming question, the agent tries to match the Frames list, created by RelEx, against the Frames stored into your own memory. However, the cannot use the incoming Frames in its original format because they lack of grounded information (information that connects the mentioned elements to the real elements of the environment). So, two steps are then executed before trying to match the Frames: Reference Resolution and Frames Rewriting. Reference Resolution was already explained in the Embodied Anaphora Resolution section. Frames Rewriting is a process that changes the values of the incoming Frames elements by grounded values. Here is an example:

Incoming Frame (Generated by RelEx)

  DefinedFrameElementNode Color:Color
  WordInstanceNode "red@aaa"
  DefinedFrameElementNode Color:Entity
  WordInstanceNode "ball@bbb"
  WordInstanceNode "red@aaa"
  WordNode "red"

After Reference Resolution

  WordInstanceNode "ball@bbb"
  SemeNode "ball_99"

Grounded Frame (After Rewriting)

  DefinedFrameElementNode Color:Color
  ConceptNode "red"
  DefinedFrameElementNode Color:Entity
  SemeNode "ball_99"

Rewriting Frames is a necessary step to convert the incoming Frames to the same structure used by the Frames stored into the agent's memory. After Rewriting, the new Frames are then matched against the agent's memory and if all Frames were found in it the answer is known by the agent, otherwise it is unknown. If a yes/no question was made and all Frames were matched successfully, the answer will be "yes", otherwise the answer is "no". But if the question requires a discursive answer the process is slightly different. For known answers the matched Frames will be converted into RelEx format by Frames2Relex and then sent to the NLGen, which will prepare the final English text to be answered. There are two types of unknown answers. The first one is when at least one Frame cannot be matched against the agent's memory and the answer is "I don't know". And the second type of unknown answer occurs when all Frames were matched successfully they cannot be correctly converted into RelEx format or NLGen cannot identify the incoming relations. This case the answer will be "I know the answer, but I don't know how to say it".


This module is responsible for receiving a list of grounded Frames and return another list containing the relations, in RelEx format, which represents the grammatical form of the sentence described by the given Frames. That is, the Frames list represents a sentence that the agent wants to say to another agent. Given that NLGen needs an input in RelEx Format in order to generate an English version of the sentence, Frames2Relex does this conversion.

Frames2Relex was implemented as a Rule Based System in which the preconditions are the required frames and the output is one or more RelEx relations. i.e.

#Color(Entity,Color) => 
  present($2) .a($2) adj($2) _predadj($1, $2) 
    definite($1) .n($1) noun($1) singular($1) 
      .v(be) verb(be) punctuation(.) det(the)

where, the precondition comes before the symbol => and #Color is a frame which has two elements: Entity and Color. Each element is interpreted as a variable Entity = $1 and Color = $2. The effect, or output of the rule, is a list of RelEx relations.


To generate sentences in natural language, we are using the NLGen framework, which converts RelEx format into natural language according to a previous processed corpus. NLGen receives as input RelEx output contents and matches this input with preprocessed sentences to see which one fits better the input. The sentence matched is then returned. To illustrated the whole process, let's consider the sentence "The red ball is near the tree". When parsed by RelEx, this sentence is converted to:

_obj(near, tree)
_subj(near, ball)
_to-do(be, near)
_subj(be, ball)

So, if sentences with this format are in the NLGen corpus, these relations are stored by the NLGen and will be used to match future relations that must be converted into natural language. It is relevant to said that NLGen keeps the syntax and semantic format of the sentence. So, sentences like "The stick is next to the fountain" will also be matched even if the corpus contain only the sentence "The ball is near the tree". That is, the nouns values are not significant for relations matching.

If the agent wants to say that "The red ball is near the tree", we need to invoke NLGen with the above RelEx contents as input. However, the knowledge that the red ball is near the tree is stored as frames, and not as RelEx format. More specifically, in this case the related frame stored is the Locative_relation one, containing the following elements and respective values:

  • Figure => red ball
  • Ground => tree
  • Relation_type => near

So we must convert this frame ant their elements values into RelEx format accept by NLGen. To do so, we have implemented a rule based system in which the preconditions are the required frames and the output is the corresponding RelEx format that will generate the sentence that represents the frames. The output of a rule may contains variables that must be replaced by the frame elements values. For the example above, the ouput _subj(be, ball) is generated from the rule output _subj(be, $var1) with the $var1 replaced by the Figure element value.

Considering specifically question-answering (QA), the Language Comprehension module (detailed in the Pet Brain section) represents the answer as a list of frames. In this case, we may have the following situations:

  • the frames match a precondition and the RelEx output is correctly recognized by NLGen, which generates the expected sentence as the answer;
  • the frames match a precondition, but NLGen did not recognized the RelEx output generated (maybe the sentece wasn't on its initial corpus). In this case, the answer will be "I know the answer, but I don't know how to say it", which means that the question was answered correctly by the Language Comphrehension, but the NLGen could not generate the correct sentence;
  • the frames didn't match any precondition, and the answer will also be "I know the answer, but I don't know how to say it" because the corresponding rule was missing, which didn't mean that the agent does't know the answer;
  • Finally, if no frames are generated as answer by the Language Comprehension module, the agent's answer will be "I don't know".

If the question is a truth-question, then NLGen is not required. In this case, the creation of frames as answer is considered as a "Yes", otherwise, the answer will be "No" because it was not possible to found the corresponding frames as the answer.

To illustrated the QA process, consider that the agent was asked "What is next to the tree?".


The question is parsed by RelEx, which creates the frames indicating that the sentence is a question regarding a location reference (next) to an object (tree). The frame that represents questions is called Questioning and it contains the elements Manner that indicates the kind of question (truth-question, what, where, and so on), Message that indicates the main term of the question and Addressee that indicates the target of the question. To indicate that the question is related to a location, the Locative_relation frame is also created with a variable inserted in its element Figure, which represents the expected answer (in this specific case, the object that is next to the tree).

The question-answer module tries to match the question frames in the atom table to fit the variable element. Suppose that the object that is next to the tree is the red ball. In this way, the module will match all the frames requested and realize that the answer is the value of the element Figure of the frame Locative_relation stored in the Atom Table. Then, is is created location frames indicating the red ball as the answer. These frames will be converted into RelEx format by the Relex2Frames rule based system as described above and NLGen will generate the expected sentence "the red ball is next to the tree".

Pet Brain

The whole system is composed by several modules. The environment is reproduced by a Virtual World System called Multiverse (http://www.multiverse.net). We've customized the Multiverse Server and transformed it in a Proxy to interchange the communication between the Virtual World and the Pet Brain. As you can notice in the Figure XXXX, Relex is used by the Multiverse Proxy to parse the sentences given by the user, via Multiverse Client, and only the Relations produced by it are then sent to the Pet Brain.


Pet Brain is an instance of the Opencog (http://www.opencog.org) server. It is composed by several technologies that are used to create an artificial mind which controls a virtual dog which 'lives' in a virtual world. Pet Brain stores all its knowledge inside the AtomSpace [reference]. Part of this knowledge is produced by the agent's sensors (exteroceptive and proprioceptive) and handled by the Perception Manager. The agent's knowledge about about the whole environment is used by the Language Comprehension to link the elements, mentioned in the sentences listened by the agent, to the objects inside the virtual world. An agent can recognize and execute commands requested by another agent/avatar, besides answering questions.


[hobbs] Hobbs, Jerry R. 1978. Resolving pronomial references. Lingua, 44:311-338.

[goertzel] Relex paper reference