OpenCog Dialogue Application

From OpenCog
Jump to: navigation, search

Introduction

This page describes work in progress and planned work for an OpenCog-powered dialogue system.

Initial Prototype Release Functionality

There's a prototype chatbot, originally developed by Linas and recently updated by Rodas. It's very simple at the moment. It will connect to IRC (channel #opencog on irc.freenode.net as cogita-bot, and listen to mentions of its nickname (or cogita, or cog). It will acknowledge declarative sentences it is told, and it will answer Wh* queries based on what it has already been told. Work is in progress for truth value queries (is something true or false?) using the backward chainer.

The bot doesn't have the concept of chatting with specific people (nor does it differentiate between a single person chatting with it, versus multiple people). It just parses input sentences and, when those are questions, uses the fuzzy pattern matcher to answer them. Still, it uses our natural language understanding and natural language generation pipelines.

Initial Prototype Release Architecture

The architecture for the prototype chatbot is described here:

https://github.com/opencog/opencog/blob/master/opencog/nlp/chatbot/README

We have three processes, all communicating via sockets:

  1. A bridge daemon, which is the intermediate between OpenCog and IRC. It listens for input on the IRC channel, and forwards text to OpenCog via scheme. It waits for the CogServer answer and forwards it back to IRC.
  2. The RelEx server. Inside the CogServer, the incoming text is processed by a scheme function, which sends the text to the RelEx server and receives the resulting Atoms as scheme.
  3. A CogServer whose purpose is to host an Atomspace and listen for queries from the bridge daemon. Once the text in the queries is parsed by RelEx it will be inserted in the Atomspace; the Atomese parts of RelEx2Logic will be run. If the sentence is a Wh-question, then the fuzzy pattern matcher is called to look for answers, and SuReal is used to convert those answers back into English.

An interesting note is that the CogServer runs no MindAgents. Chatting is just done in a request-response way, and so this could all happen without the CogServer by running, for instance, a scheme server. All processing done inside the CogServer is done by a connection-handling thread launched by the NetworkServer -- see Poor man's multithreading via Atomese. In principle this means we could have parallel processing of multiple queries (but with confusing results given the lack of separation of interaction channels).

Figure 1. Prototype Chatbot architecture and MindAgent-less CogServer

Medium-Term Planned Functionality

Over the next several months we plan to introduce many improvements to this application. Right now I'm more concerned with how they impact the architecture of single machine OpenCog than with the details of how the pieces will fit together, so this section will be brief (and someone is more than welcome to add details).

Planned improvements include:

  • Background knowledge, initially from Simple English Wikipedia and ConceptNet.
  • Multiple speech act schemata: conventional opening, conventional closing, answer question, ask

question, dump stream of consciousness.

  • ECAN for importance control, which is used to decide what gets said on "stream of consciousness" mode.
  • A background-running forward chainer for connecting input sentences to background knowledge.
  • A request-driven, shallower backward chaining heuristic for question answering.
  • OpenPsi to provide the chat agent with emotions and motivations, which are used to decide which speech act schema to use, and in the future, to guide what gets selected for expression.
  • Execution Management to handle coherent switching between speech act schema if needed (i.e., so the bot doesn't lose context if it stops stream of consciousness to answer a quick question).

Execution management is probably not strictly needed, but this provides a good testing application.

Medium-Term Possible Architectures

Incorporating the improvements mentioned above means that a request-response model won't be sufficient anymore. We need multiple background processes: ECAN, reasoning, and OpenPsi. Currently, ECAN and OpenPsi are implemented through multiple MindAgents and ECAN, at least, is dependent on the notion of the CogServer cycle, although removing this constraint might be straightforward.

This suggests that the ideal architecture for the medium-term chatbot would be the (properly) multithreaded CogServer. In the simplest case, we'd create one extra thread dedicated to background inference, and run ECAN and OpenPsi through the existing scheduler.

As this requires only one extra thread, a quick and dirty solution would be to trigger the background inference dynamics via a connection-handling thread created through the NetworkServer. I don't know if this would work (would a long-running scheme function screw up anything?), but if it does, it's supported by the existing CogServer.

So an implementation plan for background dynamics might be:

  1. Quick and dirty background inference via Scheme.
  2. Enable extra threads for long-running processes, do background inference via one of those threads.
  3. Refactor ECAN and OpenPsi so they don't depend on the cognitive cycle, do these as extra threads as well.

We'd end up with an architecture like the one below:

Figure 2. Chatbot Architecture with Parallel Background Dynamics

This plan doesn't cover Execution Management, as we still need a detailed design for that.

Chatbot as Architecture Playground

The chatbot application lends itself to a few architecture-related explorations that are fun to think about.

For instance, how would we scale this so we can have many chatbots running at the same time, or, more interestingly, the same chatbot running separate conversations? Scaling the bridge and RelEx is trivial, as it can be done horizontally. But scaling the bits that run inside the CogServer becomes less so, especially if we integrate a lot of background knowledge. Launching one CogServer per session becomes a memory hog. This lets us play with the idea of a distributed Atomspace and some of the thoughts being baked for Improving Single Machine Performance.

We can compare two deployment scenarios. A simpler scenario in which we rely on persistence to PostgreSQL and well-tuned ECAN to keep only a fraction of background knowledge in RAM on each CogServer at a time. A more complex one would have us launch a background knowledge server, and have the chat session CogServer instances hold only small Atomspaces with the current session, and use IPC to access background knowledge.

Another interesting idea is enabling multiple interaction channels inside one CogServer instance. This would require some sort of session identification in Atom form, to keep separate records of what's been said by each client. It would also further stress Execution Management with the need to handle a larger active schema pool.

Finally, we might have the idea of persistent conversations, in which the bot remembers each person and what has been said before. This is interesting from a MindCloud perspective, where many applications will have this requirement of storing and reloading a long term memory of past interactions with a specific user.