Decentralized AtomSpaces

From OpenCog
Jump to: navigation, search

A decentralized AtomSpace is not the same thing as a distributed AtomSpace. These are related but distinct concepts. It addresses some of the concepts surrounding multiple AtomSpaces.

Design and implementation is being tracked in #2138 #1855 #1502 #1967.

The key concepts are:

  • distributed - there is one database (atomspace), but it is distributed over many machines. local copies of the atomspace are just "views" into a single, central "whole" atomspace.
  • federated - multiple atomsapces, each atomspace is accessible from an atomspace server, with atomspace updates are published via updates using zeromq, or REST or protobuf or whatever. Each server is authoritative for its data.
  • decentralized - multiple atomspaces, but without an atomspace server. The concept of "authoritative" version of the data is not baked into the system. Authority is determined at a different layer (e.g. consensus, collaboration, voting, respect, kudos, brute-force, violence, bribery, whatever).

These three concepts are often confused with one-another, and are taken to be synonyms, they are not. Comments below unpack these in greater detail, listing the pros and cons.

Distributed AtomSpace

A distributed atomspace attempts to maintain the illusion that there is one single set of data, of which any particular machine might just hold a few shards. The concepts of "ACID" and "BASE" apply. So for example, atomic updates, vs. eventually consistent are both strategies for updating data in such a way that one maintains a consistent data state. (either immediately, by locking: "ACID", or eventually, by propagation of updates: "BASE")

The pros and cons:

  • Plus: when its distributed, you can have a bigger dataset: a dataset that is too big to fit on one machine.
  • Minus: issues arise with data ownership and contextual knowledge. The data ownership problem is here: #1855 where there is one very large "master" dataset, should be treated as read-only, containing the "best" copy of data (e.g. genomic data) and then multiple read-write deltas to it, created/explored by individual researchers who want to try new algos out, without wrecking the master copy. And don't want to make a full copy of the master.
  • The ContextLink (issue #1967) only "partly" solves this. It still implicitly assumes that there's a single master. The very concept of "master copy" is still baked into the system.

Please note that the atomspace is already distributed. See the demo in /examples/atomspace/distributed.scm. This can be made to work on a large scale, because postgres is already massively scalable. So in a certain sense, that part is done. What is unsolved is the multi-user and authority-of-update issues surrounding this.

Decentralized AtomSpace

A decentralized atomspace acknowledges that there is no single master copy, and that instead there are peers. Now some peers might be more authoritative, more correct, more knowledgeable than other peers, but the process for determining who is authoritative can be made to lie outside of the atomspace implementation. Determination of Authority is done at some other layer, and not hard-wired into the atomspace design.

Pros and cons:

  • Plus: performance is still good, because you can still have a copy of the data that you need, locally, in RAM.
  • Cons: the stuff that does not fit in RAM still has to go somewhere. Today, that means in your postgres backend.
  • Cons: the machanics of decentralization are unclear, and need to be worked out.

Federated AtomSpace

The concept of federation is that everybody runs their own server, and they exchange data with one-another. Classic examples of federation are email-servers, IRC servers, diaspora pods, etc. That is, there are owners/admins who run the server, and lots of users who use the server.

For the atomspace, users communicate with the servers using REST, or protobuff, or zeromq or ROS messages or whatever. (I don't care, as long as the performance is good and the API is maintained)

Pros and cons:

  • Federation gives the appearance of decentralization, without actually providing it. Example, its a lot easier to use gmail or yahoo, than it is to install and operate your own mail server.
  • Federation often leads to lowest-common-denominator feature set. If server A has whizz-bang feature that server B does not support, then all users of server-B lose. Worse, the whizz-bang feature never catches on in popularity; its blocked by adoption speedbumps. That's one reason why "web 1.0" standards like email and irc remain stuck in the backwaters.

lowest-common-denominators are killed by walled gardens. Facebook helped kill email. Slack help kill IRC. But now you are locked into a walled garden, that you cannot escape.

The goal here is to define some way of having decentralized atomspaces without the down-sides of federation, and without the authority-control issues of a distributed atomspace