The AtomSpace is an API for storing and querying hypergraphs, and is the central Knowledge Representation system provided by the OpenCog Framework. The hypergraphs stored in the AtomSpace consist of Atoms (the superclass for Nodes and Links).
Conceptually, knowledge in OpenCog is stored within large [weighted, labeled] hypergraphs with nodes and links linked together to represent knowledge. This is done on two levels: Information primitives are symbolized in individual or small sets of nodes/links, and patterns of relationships or activity are found in [potentially] overlapping and nesting networks of nodes and links. (OCP tutorial log #2).
The primary, driving design requirement is to efficiently support induction, reasoning, neural nets and other graphical algorithms with the best possible performance and flexibility. Essentially, the AtomSpace must offer an efficient substrate for Self-Modifying Evolving Probabilistic Hypergraphs.
Nodes and Links (and thus hypergraphs) can be created, manipulated and traversed without using the AtomSpace. The AtomSpace provides some additional utility that is not available by using "naked" atoms:
- Uniqueness of Atoms. An AtomSpace will contain only one single Node of a given name, and one single Link with a given outgoing set.
- Indexes. The AtomSpace contains several indexes to provide fast access to certain classes of atoms, such as queries by atom type, by atom name, by outgoing set, and the like. User-defined indexes are planned but have not been implemented (due to lack of demand).
- Persistence. The contents of an AtomSpace can be saved-to/restored-from some storage medium, such as a traditional SQL database.
- Distributed computing. Atoms can be shared across multiple network servers, by having them use a common database backend. See distributed AtomSpace for details.
- Pattern search. The AtomSpace can be searched for all subgraphs of some particular shape
- Change notification. The AtomSpace provides signals that are delivered when an atom is added or removed, thus allowing actions to be triggered as the contents change.
Hypergraph Database Design Requirements
The AtomSpace is nothing more than a container for storing (hyper-)graphs. It is optimized so that one can implement (probabilistic/fuzzy) inference-engines/theorem-provers, etc., such as PLN, on top of it. To achieve this, the Atomspace must have some database-like features:
- It must be queriable for all occurrences of certain hypergraphs, i.e. one must be able to perform pattern-matching against arbitrary query patterns.
- It must maintain user-defined indexes of certain common pattern types, using e.g. the RETE algorithm, and/or other database-like indexing systems.
The AtomSpace implementation has some additional requirements:
- Perform queries as fast as possible.
- Be thread-safe.
- Hold hypergraphs consisting of billions or trillions (or more) nodes/links; scale to petabytes (or more).
- Save & restore hypergraphs to media, such as disk, a more traditional SQL or non-SQL database, or other system (e.g. flat files, XML, etc.).
- Exchange, query and synchronize hypergraphs with other, network-remote atomspaces or servers, in a manner as fast as possible, while maintaining as much coherency as possible.
- Allow extremely fast access to an Atom's incoming set, outgoing set, TruthValue and AttentionValue.
The current implementation fails on some of these requirements, especially in scalability; there is room for improvement. Nonetheless, a basic system exists. The following is currently possible:
- The AtomSpace can be accessed using C++, Scheme, Python, a web interface (REST) or ZeroMQ. Low-level access via C++ is about 40x faster than access with scheme or python; thus scheme and python are better suited for invoking high-level functions (such as the pattern matcher, which can take a fair amount of time to search large atomspaces.) The REST and ZMQ interfaces are another order of magnitude slower than the scheme/python interfaces, and are thus usable only for casual inspection.
- Access to the AtomSpace via C++ is completely thread-safe. Access via scheme is thread-safe, but seems to somehow serialize to only one thread, presumably due to some currently-unknown design bug/feature. The python interface is not thread-safe.
- AtomSpace contents can be saved/restored as string s-expressions (i.e. scheme), and in an SQL database.
- The SQL storage backend allows a weak form of dynamic, on-demand saving and loading of hypergraphs. The contents of the AtomSpace can be bulk-saved/restored. In addition, individual atoms can be saved/restored on request. This is intentional: forcing all atoms to be automatically mirrored to the database would require a large CPU overhead, especially for workloads that rapidly create and destroy a lot of transient atoms, or modify truth values at a rapid clip. The store-back queues for the SQL database are fully parallel and asynchronous; the current design uses four store-back threads by default.
- Multiple AtomSpaces on multiple network-connected servers can communicate via the SQL backend. That is, they can share common atoms. User algorithms must explicitly push atoms to the database in order for them to become visible to other AtomSpaces. This is intentional: trying to maintain constant synchronization across multiple AtomSpaces would be extremely CPU-intensive, especially since many workloads create and destroy atoms at a rapid pace, and there is no need to share such atoms.
- A generic pattern matcher has been implemented. Queries may be specified as hypergraphs themselves, using ImplicationLink and BindLink. Procedures may be triggered using ExecutionLink. Low-level access to the pattern matcher is possible by coding in C++, scheme or (coming soon!) python.
- A half-dozen hard-coded indexes are kept by the atomspace. User-defined indexes are not yet supported (due to lack of demand).
- Atomspace contents may be viewed with several different visualization tools, including the atomspace visualizer.
- The benchmark tool measures the performance of a dozen basic atom and atomspace operations. A diary in the benchmark directory maintains a historical log of measurements.
Nodes and Links
There is a class for Node and another for Link as well as a common superclass Atom. So we refer to the AtomSpace as the set of nodes and links which represents the knowledge stored in an OpenCog database. Nodes are representation of entities in general, while Links are representation of some relationship among two or more Atoms (Links may link Nodes to Nodes, Nodes to Links or Links to Links).
Atoms are uniquely identified by a Handle; a Handle is essentially a 64-bit UUID-like identifier that can be used to track an atom. Knowing the Handle is enough to retrieve the corresponding atom (if it exists -- deleted atoms can no longer be retrieved!)
The AtomSpace provides signals that are delivered when an atom is added or removed, thus allowing actions to be triggered as the contents change.
Multiple atom spaces within a single program can be used, either independently, or as nested (hierarchical) environments, or both. Multiple atom spaces on multiple network-connected machines can (automatically) communicate via the SQL backend (if they are all pointed to the same backend).
The AtomTable provides the indexes and the signals for the AtomSpace. Its the main class that actuall holds the references to atoms. The
AtomSpaceImpl class provides two services: the interface to the persistence backend, and maintenance of the AttentionBank.
The AtomSpace class itself does nothing at all, other than to provide a uniform API to these other services.
Atoms can be stored on disk, or even another machine, rather than always kept in RAM. Thus, an Atom is fetched into local memory only when it is actually needed. See opencog/atomspace/BackingStore.h for details, and opencog/persist/README for a persistence implementation.
The persistence backend allows multiple AtomSpaces on multiple networked servers to communicate and share atoms. In practice, this can scale to possibly several dozen servers but probably no more than that, depending on the workload.
See distributed AtomSpace for details.
The diagram below is slightly out-of-date, but is mostly accurate. The
AtomSpaceAsync classes no longer exist. The
Time server classes are no longer directly connected to the AtomSpace.