Microplanner

From OpenCog
Jump to: navigation, search

The Microplanner, together with Sureal, implements a natural language generation system within opencog. Given a set of abstract relationships, the microplanner, together with sureal, can convert these into grammatically (syntactically) valid sentences. It replaces the older ideas described in SegSim. A small example can be found here: /examples/sureal.

Sureal is responsible for constructing individual sentences. The microplanner is responsible for breaking up the set of conceptual relations into one or more sentences, i.e. so that no sentence is overly long, and that a given sentence consists of the relevant related parts.


Implementation

The microplanner is available on /opencog/nlp/microplanning and is based on the design discussed on this page.

This page gives a rough outline for an initial microplanning algorithm. It is subject to revision based on discussion and experimentation.

The purpose of the microplanner is to begin with a set S of Atoms (plus, perhaps, some parameters indicating what kind of utterance to produce), and produce a series of grammatical sentences constituting a purposeful utterance based on the"meaning" of the set S of Atoms. In order to do this, other Atoms outside of the set S (but in the Atomspace) may need to be utilized as well.

For simplicity, let us assume that the microplanner begins with the intent to produce a certain type of utterance: either declarative, interrogative, imperative or interjective. For simplicity in this initial outline I'll focus on declarative and interrogative sentences. These can be implemented first, and then the microplanner can be extended to the other two cases.

Also, for simplicity, I will assume here that the nodes in S are linked directly by ReferenceLinks to WordNodes. So I will ignore the Word Selection problem, which however must be dealt with afterwards (for the case where some of the nodes in S are not directly linked to words, or are linked to many different words so that an expressive word choice must be made).

Rough Algorithm Sketch

For a declarative utterance, the microplanner's goal is to produce sentences conveying the body of information in S (S is then just some bunch of nodes and links...)

For an interrogative utterance, the microplanner's goal is to produce sentences either:

  • Asking for the truth value of some Link L (this link then constitutes the set S)
  • Asking for what value to fill into some VariableNode $V (the set S then constitutes the constraints characterizing $V; so basically S constitutes a set of Atoms similar to a PatternMatcher query, which is asking for a variable value to be filled in)

In either the declarative or interrogative case, it seems a similar procedure can be used. I'll elaborate in detail on the declarative case below, but the interrogative case could be filled in quite similarly.

On the high level there can be a procedure something very roughly like

MakeUtterance(Atom-Set S, utterance type t)

\\ the utterance type t = declarative, interrogative, imperative, etc.

   Initialize S_leftover=S

   Repeat until S_leftover = empty  :

      Make a sentence by calling MakeSentence(S_leftover, S, t)

      Let S_used = the Atoms from S used inside the
      above invocation of MakeSentence (i.e. the Atoms expressed
      in the sentence that MakeSentence produces)

      Let S_leftover = S_leftover - S_used


Then, the real work is done in something like (for the case of declarative utterances)


MakeSentence(Atom-Set S_available, S_just_said, utterance type t)

\\ S_available = the Atoms available for utterance, within S

\\ S_just_said = the Atoms just said within the same utterance,
\\     useful for inserting anaphora


   Initialize S_leftover=S_available

   Pick a top-level sentence form (e.g. for declarative it could be SV or SVO).

   Pick a set S_start consisting of Atoms in S_leftover that match the 
      top-level sentence form chosen
      (i.e. that match the output of the RelEx2Logic rule corresponding to the
      top-level sentence form)

   Let S_working = S_start

   If S_working contains Atoms from S_just_said, or from previous versions of
      S_working within this invocation of MakeSentence, then consider inserting
      anaphora to refer to them

   (**) Use SuReal (and a reverse of the chosen RelEx2Logic rule) to produce a 
      sentence corresponding to S_working  
   
          [NOTE: SuReal doesn't insert anaphora.  So Atoms indicating anaphoric
          words must be inserted before SuReal is invoked]

   If the sentence produced is too long or complex, DONE
       (i.e., decide that the sentence is done and additional Atoms must
       go into a new sentence, referring back to the current one)

   Now, pick a RelEx2Logic rule whose output matches some set S_new of Atoms that
      overlaps with S_working

   Let S_working = S_working + S_new

   Goto step **

   (***)

   DONE

Suggested Order of Proceeding

As this is new territory, I suggest to begin incrementally.

  • First, make it work for declarative utterances, with no anaphora. Start with some Atom sets that could be turned into 1 or else 2 different sentences, depending on parameters (i.e. how long/complex is too long/complex)
  • Then implement simple insertion of pronominal anaphora (it, he, she)
  • Then make it work for interrogative utterances, in the case where the question is a single sentence
  • Then try multi-sentence interrogatives. Note that an interrogative utterance, though the head sentence should be a question, may also produce declarative sentences along the way. "Where did the dog go? The dog who ate my hat."
  • Make sure pronominal anaphora work with interrogatives OK
  • Imperatives and interjectives (which should be simpler, actually)
  • Deal with nominal anaphora, at least in some simple cases as a placeholder for future work

...

This should give a microplanner that works, though perhaps awkwardly. Future directions can then be discussed, including using statistical corpus analysis to guide choices, etc.