Erlang Implementation of CogServer and AtomSpace

From OpenCog
Jump to: navigation, search

Note: I started on this, but then realised a few things, and found CouchDB performance pretty abysmal. So after going back over Distributed Architecture RFC, I think the first task should be benchmarking various Repository implementations and develop the C++ code. Eventually a Erlang CogServer could be a good idea, but we should develop this after the Repository design is somewhat finalised.

This document outlines a reimplementation of the CogServer within the programming language Erlang. It is also strongly in favour of the use of CouchDB (also written in Erland) as the distributed AtomSpace.

Thus, instead of adapting the existing C++ code to allow for concurrency, parallelism, and distributed processing - which would inevitably be a painful and error prone process. Joel Pitt has been researching the merits of using Erlang, whose design is immensely suitable for the task at hand (namely distributed, concurrent, fault-tolerant, live-code-updates).

There are many important tasks to be working on in OpenCog, so why waste time on re-implementing the core of the project? I believe that it's important to get the architecture correct with scalability and the distributed nature of the AtomSpace being core aspects that MindAgents and modules need to at least be aware of, if not specifically tailored to work using distributed knowledge without carrying out operations that are expensive in terms of network or processor usage. The problem of general intelligence is partly one of optimisation and efficiency, if we had infinite computational resources, then we could just implement AIXI.

Outline

The general idea so far is:

  • Use CouchDB as a distributed and persistent AtomSpace.
  • Create a CogServer that allows access to the AtomSpace using Erlang.
  • Use Thrift to design the AtomSpace interface, which would allow the Erlang-implemented CogServer to service requests from a variety of languages.

CouchDB

CouchDB is a { key: value } data store, designed to be fault tolerant and handle conflicts. It allows the creation of Views which are somewhat equivalent to indexes that are maintained on instances in CouchDB. Essentially each view is a map function, and it's also possible to store a reduce function too. These views are only calculated once the first retrieval, and then stored as B-trees and updated as necessary.

Using this map reduce functionality, it should be possible to easily do queries such as getting the number of items for each Type, but allow CouchDB to handle updating the indexes/views (see here for some idea of how to construct these queries as map-reduce functions).

There is a lot of stuff related to it being fail safe, if an instance dies (hopefully due power failure rather than a bug!), it can restart and things will gracefully resume. Disk operations are atomic at the correct points. There is a draft of the coming O'Reilly book available.

Optionally, due to CouchDB ability to replicate between nodes, it may be possible to use local CouchDB databases on each CogNode as a a form of local cache to avoid network overhead.

Erlang

Thrift

Thrift allows service interfaces to be defined using the ThriftIDL. From this, it is possible to generate server and client libraries for various languages. We'd create a server in Erlang, and then allow libraries to be built to simplify access to it.

This will add a layer of networking to accesses to the AtomSpace, so yes this will be slower than internal modules to the CogServer. Eventually we could expose Erlang calls to C++ and other languages to avoid the additional network layer, but this may come at a cost of breaking the concurrency and distributed nature of an Erlang CogServer.

One possibility is that the existing code base could actually continue to be used with minor changes, and that the Erlang "CogServer" would actually act as a "backing store" (see opencog/atomspace/BackingStore.h). The in-memory AtomTable of the current C++ implementation would act as a cache, although this would need to be modified to account for conflicts in the distributed AtomSpace.