Distributed AtomSpace

From OpenCog

There are several ways in which one might network together some AtomSpaces. One may have two AtomSpaces, exchanging Atoms with each other, peer-to-peer. One may have a large shared AtomSpace, served up by a server, with many client AtomSpaces interacting with it, sending Atoms to it, or receiving Atoms from it. These possibilities, and a few more, are given precise definitions in the Networked AtomSpaces wiki page.

The need for a distributed AtomSpace arises when a large shared AtomSpace can no longer meet performance requirements. This can happen when there are more clients than the number of network connections the server can handle. Perhaps a dataset is so large that the server cannot keep it entirely in RAM, and so every client request is forced to access the disk (and thus be slow). Perhaps clients are running complex queries, which burn large amounts of CPU in the server, and the server is overwhelmed. All three of these problems suggest a move to a distributed architecture. However, the specific "best" solution depends on which of these problems are actually arising. For the agi-bio/MOZI project, it is very much the third case. Please review the Networked AtomSpaces wiki page for additional issues that can arise.

The design of a distributed AtomSpace requires distinguishing between provider AtomSpaces, which work with one-another to provide data to the user AtomSpaces, which are the clients that want to "do something" with that data. From the user point of view, the distributed AtomSpace needs to look just like a shared AtomSpace: some big blob that is holding the data that you want. Internally, things are very different: the providers need to be able to efficiently coordinate with one-another to handle user requests. The principle design difficulty is to organize the providers into a meaningful structure so that they can get the job done. It might be useful to envision construction workers, or perhaps white-collar office workers, and ask: how can they be organized, to get the job done? So it also is with provider AtomSpaces.

It is assumed that the StorageNode provides exactly the correct API for multiple AtomSpaces to work with one-another. This includes both peer-to-peer communications (the CogStorageNode works great for that) was well as for client-server communications. For example, many client AtomSpaces can access a single large shared AtomSpace with the PostgresStorageNode. Presumably, there could be a "DistributedStorageNode", with which clients can interact with the distributed AtomSpace.

Because there are many conflicting and complex requirements being made on a distributed AtomSpace, there is currently no specific architecture(s) for that. Some pre-alpha ideas and code are being maintained in a github repo: https://github.com/opencog/atomspace-agents/ The funny name "agents" arises, because different kinds of AtomSpaces will play different kinds of roles in a distributed AtomSpace, and there will, in general, be more than two roles. Just like a construction crew or a corporate office, the organization may be complex.