Multiple AtomSpaces

From OpenCog
(Redirected from Multiple atom spaces)
Jump to: navigation, search

The idea of multiple atomspaces has to do with allowing the user to define multiple atomspaces so that they inter-operate appropriately. The discussion here is limited to multiple atomspaces in the same address space, that is, on the same machine, but possibly used in different threads. The discussion here does NOT cover multiple atomspaces distributed over the network.

Motivation

There are several reasons for wanting something like this:

  • Having a 'temporary' atomspace to hold temporary results, e.g. from PLN reasoning. (The "inference history repository")
  • Having an atomspace hold large context-specific information, that might otherwise be held in a ContextLink, or as discussed in the article claims and contexts. That is, because the truth value gives every Atom a default interpretation, a mechanism for allowing multiple interpretations is required. One way to do this is with Context Links. Another way to do this is with multiple atom spaces, where an atom can have a different interpretation in each atomspace.
  • Having the anaolog of C/C++/Java "stacks" or scheme/lisp "environments", so that variables have only local scope, and functions that are applied only have scope within that environment. Thus, for example, the BindLink allows lambda-like expressions to be defined; limiting the pattern matcher to explore only a local atomspace could be useful.

Why use atomspaces at all?

Atoms can be created and used just fine without putting them into an atomspace. Almost anything you can do with an atom, you can do it without having to put it in an atomspace. Atoms removed from an atomsapce will continue to "live" as long as there is some pointer somewhere, pointing to them.

The only thing that an atomspace offers is:

A) atom de-duplication (only one atom of given type/name/outgoing-set)
B) fast searches for atoms by type, by name, etc. (the atomspace maintains indexes)
C) atom-changed signals
D) automated attention-allocation management.
E) Automatic atom fetch and save to a persistent database; automatic sharing of the database by multiple machines in a cluster.

Status

Support for multiple atomspaces has been implemented, as of September 2014. A programmer can create multiple independent atomspaces, or nested (hierarchical, or 'contextual') atomspaces, or both, and have everything work "as expected" (as described below).

Some quick notes and caveats:

  • Links in one atomspace cannot contain atoms from another, independent atomspace (links in contextual atomspaces can contain those in the parent). Attempting to create such links will result in the atoms being copied, and issued brand-new handle UUID's. In particular, this means that a given atom is no longer universally unique: each atomspace might contain an atom with the same name or outgoing set, but would have a different TV, AV and UUID.
  • Links in a child atomspace can contain atoms from the parent environment.
  • The semantics of using the pattern matcher, of using PLN, and of using the database backing-store is undocumented/undefined at this time. Things probably mostly 'work', maybe, with unclear side-effects or possibly buggy or strange behavior.
  • Although the use of individual atomspaces is thread-safe and supported as such, complicated atom insertion-deletion scenarios involving multiple atomspaces might not be (e.g. one thread adding or removing an atom from one atomspace, while exactly the same atom is being removed or added in the parent or child is almost surely not (yet!?) thread-safe. Maybe. Not sure.) We'll work on this as the need and complexity arises.
  • Be careful with AtomSpace deletion. Delete child atomspaces first, then the parents. Things will crash if you reverse this order. Currently, there is no safety mechanism in place.

Theory of Multiple AtomSpaces

An assortment of issues arise when threre is more than one atomspace. So, for example, if you create a bunch of nodes in atomspace A, and a bunch of links in atomspace B, should the nodes in A also automatically show up in B? Why, or why not? What happens if atomspace A is deleted? Should the deletion of A even be allowed, as long is there is a higher-up B? Should atomspaces be arranged hierarchically, so that B always points to A as a parent? Is it legal to have links in A that reference atoms in B? (this would be a circular reference between atomspaces, so the hierarchy principle would be broken... this has real implications in pointer-chasing, as resolving an atom might require breaking long circular loops, and this could chew up a lot of CPU time, for everyone ... we shouldn't add that as a feature if no one needs it.)

In short saying "multiple atomspaces" is pretty meaningless, until a meaning for it is defined.

There are two ways this can be done.

Independent AtomSpaces

Consider two entirely independent AtomSpaces A and B.

  • Any atom places into either space is cloned upon insertion. If a link is inserted, all of the links children are cloned.
  • Either AtomSpace can be deleted at any time.

Hierarchical AtomSpaces

Hierarchical atomspaces are similar to "environments" or "closures" in scheme/lisp. In the below, AtomSpace B is created and contained in the environment of AtomSpace A. This means that every atom in A is visible to B, and in addition, B can contain atoms that are not in A. In this sense, B is "larger" than A. However, B does not copy or clone any atoms in A.

  • Links in atomspace B can reference atoms in atomspace A if and only if B was created with A as its parent.
  • A must not be deleted until B is. (Perhaps B could be automatically deleted when A is; but this is not implemented.)
  • If atomspace C is created independently of A, then its impossible to put links into C that reference atoms in A, and v.v. Such links would have their outgoing sets cloned/copied.
  • If B has A as a parent, and an atom is added to B that is identical to an atom that already exists in A, then the atom in A is returned. (right? this is how its currently implemented. Another possible choice would be to clone the atom in A, but that does not seem wise. If you want that, then just use independent atomspaces, right?)
  • If C is independent of A, then one can add atoms with the same type/name/outgoing-set, to both, and get back two different atoms (i.e. even though they have the same attributes, they are not really the same... that is, there would be two different uuids, two different truth values, two different attention values.)
  • Trying to add an atom to C that already exists in A would "fork" (copy) the atom, as above.
  • If B has A as a parent, then searches in B are always recursive. That is, if we ask B for all atoms of type X, then you'd get back an answer that is all atoms of type X in B and A.
  • Attention allocation across B and A ... I dunno how this should work.

There are some points about adding atoms to make note of:

  • If one attempts to add an atom to atomspace B that is already in atomspace A, one will get back the one in A. It is not copied. This is more-or-less what it means to have B live in the environment of A. If you really want atoms in B to be copies of those in A, then do NOT use hierarchical atomspaces!
  • However, if an atom is added to B that does not exist in A, then it will be in atomspace B only. If later it is added to A, then two independent copies shall exist: one in B and one in A. Its not quite clear if this is the 'right' thing to do; but defacto, this is what happens.
  • If a link in atomspace B references an atom in atomspace A, and it is removed from A, then it is not automatically added to B; instead, it will (currently) dangle. A recursive remove across nested atomspaces will possibly crash, at this time. The nested atomspace implementation is minimal.

Use cases

The following are requirements from different use-cases. The ideal solution satisfies all these use cases.

PLN inference history

When PLN is running, it needs to create an inference trail.

  • Atoms in the history repository need to link to Atoms in the main Atomspace
  • Atoms in the main Atomspace don't need to link to Atoms in the history repository

This suggests a hierarchical arrangement: Lets call the main space A, and the history space B. B is contains superset of A.

There are two possibilities:

i) B is a fork of A (a *copy* of every atom in A is also in B; there are two versions of the atoms of A: the original, and the copy.)
ii) B contains A (every atom in A is visible in B also; but its not a copy; there is only one version of the atoms of A)

Clearly option (i) uses more more RAM, and also cpu cycles to perform a copy. Option (i) could be achieved simply by making atomspace B independent of A.

Option (ii) is the traditional comp-sci definition of "environment". For example, in C/C++ the environment is called the "stack"; in lisp/scheme, its called the "closure".

However, there is this PLN requirement: In some of the above cases, we would want to let the different Atomspaces optionally contain different "versions" of the "same" Atom. That is, two Atomspaces might both contain versions of the "cheese" and "ham" nodes, with different strengths for the InheritanceLink between ham and cheese, and maybe different node probabilities for ham and cheese. Again, this can be achieved simply by maintaining two distinct atomspaces.

This requirement suggests that PLN wants option (i). This option is already 'minimally' supported. The minimal implementation does NOT automatically copy atoms from A to B or vice-versa; it is up to the user to do this themselves.

Contexts

In this use case, all atoms in a particular context (i.e. a particular ContextLink) live in their own atomspace.

Lets call the 'main' atomspace M and the context atomspace C.

Open questions are similar to above: should atoms in the main space and a sub-context be identical, or should they be copies of one-another? Do the copies need to be kept in sync somehow? Do links in C need to be able to hold atoms in M? If so, are they copies, or not? Do links in M need to hold atoms in C? If so, are they copies, or not?

Are there special considerations needed to have PLN work, or have the pattern matcher work in this setup? Will PLN be accessing both atomspaces? If so, and the two spaces contains copies, then how does it deal with these duplicate atoms? Will it get confused? Likewise: the pattern matcher cann only chase incoming and outgoing sets; if these don't cross the boundary between M and C, then a patern search started in one space won't cross over into the other.

C++ API

To create a new atomspace in C++, simply say:

 AtomSpace* alt_as = new AtomSpace();

To create a nested atomspace in C++, just say:

 AtomSpace* existing_as = ...;
 AtomSpace* nested_as = new AtomSpace(existing_as);

By default, the cogserver has a default atomspace that is used, if no other atomspace is specified. This default can be gotten by saying:

  AtomSpace* default_as = &cogserver().getAtomSpace();

Only the default atomspace is automatically managed; any others that get created are not, If you create some atomspaces, you are responsible for later deleting them, as appropriate.

Scheme API

The scheme atom creation and manipulation functions typically are not called with an atomspace argument, and so instead use the atomspace in the current scheme (dynamic) environment. This atomspace is stored in a scheme fluid, and so can be different in every thread. The atomspace to be used with a scheme evaluator is specified at the time that the evaluator is created in C++. So, for example:

  AtomSpace* my_atomspace = ...;
  SchemeEval* evaluator = new SchemeEval(my_atomspace);
  std::string used_as = evaluator->eval("(cog-atomspace)");
  std::cout << "My atomspace=" << ((void *) my_atomspace) << std::endl;
  std::cout << "Evaluator uses: " << used_as << std::endl;

The above will print the same hex address for the atomspace. Multiple evaluators can be created, each using the same or different atomspaces. The current atomspace is set by each evaluator whenever its eval() or <eval_h() meothods are called.

Atomspaces can also be created in scheme. So, for example:

 (define current-as (cog-atomspace))
 (define alt-as (cog-new-atomspace))
 (define nested-as (cog-new-atomspace current-as))

will do exactly what you think they will: they'll set current-as to the current atomspace, and create two new atomspaces, one nested, one not. There's a predicate too:

 (cog-atomspace? as)

returns #t if as is an atomspace, else it returns #f.

The current atomspace for an evaluator can be set by saying

 (cog-set-atomspace! as)

The above returns the previous atomspace, which perhaps you may want to save and restore later. Thus, for example, to execute some scheme code in a different atomspace, and then put the atomspace back to the original one, one might say this:

 (define alt-as (cog-new-atomspace))
 (define (alt-env f)
    (let ((curr-as (cog-set-atomspace! alt-as))
          (result (f)))
       (cog-set-atomspace! curr-as)
       result
    ))

The above can be tested in various ways. So for example

 (alt-env cog-atomspace)

should print the address of the alt-as. Similarly, atoms created in one atomspace will not be visible in the other, and vice-versa.

The atomspace to use for atom creation, querying and deletion can be explicitly given in the relevant command. So, for example:

 (cog-node 'ConceptNode "asdf")        ; does not yet exist
 (define alt-as (cog-new-atomspace))   ; define an alternative atomspace
 (cog-node 'ConceptNode "asdf" alt-as) ; doesn't exist in the alt-as either
 (ConceptNode "asdf" alt-as)           ; create it in the alt-as, only
 (cog-node 'ConceptNode "asdf" alt-as) ; now exists in alt-as
 (cog-node 'ConceptNode "asdf")        ; still doesn't exist in the main as
 (define n (cog-node 'ConceptNode "asdf" alt-as))  ; handy handle
 (cog-delete n)                        ; no-op, it never existed in the main as
 (cog-delete n alt-as)                 ; delete it from the alt-as
 (cog-node 'ConceptNode "asdf" alt-as) ; Indeed, it is now gone.

This form can be freely intermixed with the other ways of working with the atomspace.