Distributed AtomSpace

From OpenCog
(Redirected from DistributedAtomspace)

Obsolete

This page presents ideas that are not really applicable to the current code base. See ProxyNode if you wish to get working code that can actually configure a distributed AtomSpace.

Distributed AtomSpace

There are several ways in which one might network together some AtomSpaces. One may have two AtomSpaces, exchanging Atoms with each other, peer-to-peer. One may have a large shared AtomSpace, served up by a server, with many client AtomSpaces interacting with it, sending Atoms to it, or receiving Atoms from it. These possibilities, and a few more, are given precise definitions in the Networked AtomSpaces wiki page.

The need for a distributed AtomSpace arises when a large shared AtomSpace can no longer meet performance requirements. This can happen when there are more clients than the number of network connections the server can handle. Perhaps a dataset is so large that the server cannot keep it entirely in RAM, and so every client request is forced to access the disk (and thus be slow). Perhaps clients are running complex queries, which burn large amounts of CPU in the server, and the server is overwhelmed. All three of these problems suggest a move to a distributed architecture. However, the specific "best" solution depends on which of these problems are actually arising. For the agi-bio/MOZI project, it is very much the third case. Please review the Networked AtomSpaces wiki page for additional issues that can arise.

The design of a distributed AtomSpace requires distinguishing between provider AtomSpaces, which work with one-another to provide data to the user AtomSpaces, which are the clients that want to "do something" with that data. From the user point of view, the distributed AtomSpace needs to look just like a shared AtomSpace: some big blob that is holding the data that you want. Internally, things are very different: the providers need to be able to efficiently coordinate with one-another to handle user requests. The principle design difficulty is to organize the providers into a meaningful structure so that they can get the job done. It might be useful to envision construction workers, or perhaps white-collar office workers, and ask: how can they be organized, to get the job done? So it also is with provider AtomSpaces.

It is assumed that the StorageNode provides exactly the correct API for multiple AtomSpaces to work with one-another. This includes both peer-to-peer communications (the CogStorageNode works great for that) was well as for client-server communications. For example, many client AtomSpaces can access a single large shared AtomSpace with the PostgresStorageNode. Presumably, there could be a "DistributedStorageNode", with which clients can interact with the distributed AtomSpace.