The OpenCog AtomSpace is a knowledge representation (KR) database and the associated query/reasoning engine to fetch and manipulate that data, and perform reasoning on it. Data is represented in the form of graphs, and more generally, as hypergraphs; thus the AtomSpace is a kind of graph database, the query engine is a general graph re-writing system, and the rule-engine is a generalized rule-driven inferencing system. The vertices and edges of a graph, known as Atoms, are used to represent not only "data", but also "procedures"; thus, many graphs are executable programs as well as data structures.
There are pre-defined Atoms for many basic knowledge-representation and computer-science concepts. These include Atoms for relations, such as similarity, inheritance and subsets; for logic, such as Boolean and, or, for-all, there-exists; for Bayesian and other probabilistic relations; for intuitionist logic, such as absence and choice; for parallel (threaded) synchronous and asynchronous execution; for expressions with variables and for lambda expressions and for beta-reduction and mapping; for uniqueness constraints, state and a messaging "blackboard"; for searching and satisfiability and graph re-writing; for the specification of types and type signatures, including type polymorphism and type construction (dependent types and type variables TBD).
Because of these many and varied Atom types, constructing graphs to represent knowledge looks kind-of-like "programming"; the programming language is informally referred to as Atomese. It vaguely resembles a strange mashup of SQL (due to queriability), prolog/datalog (due to the logic and reasoning components), lisp/scheme (due to lambda expressions), haskell/caml (due to the type system) and rule engines (due to the graph rewriting and forward/backward chaining inference systems). This "programming language" is NOT designed for use by human programmers (it is too verbose and awkward for that); it is designed for automation and machine learning. That is, like any representation_knowledge representation system, the data and procedures encoded in "Atomese" are meant to be accessed by other automated subsystems manipulating and querying and inferencing over the data/programs. Also, viewed as a programming language, it can be very slow and inefficient and not scalable; it was not designed with efficiency and programming tasks in mind, nor with scalability; but rather, it was designed to allow the generalized manipulation of networks of probabilistic data by means of rules and inferences and reasoning systems. It extends the idea of probabilistic logic networks to a generalized system for automatically manipulating and managing data.
The use of the AtomSpace, and the operation and utility of Atomese, remains a topic of ongoing research and change, as various dependent subsystems are brought online. These include machine learning, natural language processing, motion control and animation, planning and constraint solving, pattern mining and data mining, question answering and common-sense systems, and emotional and behavioral psychological systems. Each of these impose sharply conflicting requirements on the system architecture; the AtomSpace and "Atomese" is the current best-effort KR system for satisfying all these various needs in an integrated way. It is likely to change, as the various current short-comings, design flaws, performance and scalability issues are corrected.
The examples directory contains demonstrations of the various components of the AtomSpace, including the python and scheme bindings, the pattern matcher, the rule engine, and many of the various different atom types and their use for solving various different tasks.
As a symbol table
Once an atom is placed in an atomspace, it becomes unique. Thus, atoms are the same thing as symbols, and the atomspace is a symbol table. The unique ID of a Node is it's string name, and the unique ID of a Link is it's outgoing set.
Examples of symbol tables can be found in python and scheme. In Prolog, symbols are called atoms. While Prolog makes a distinction between atoms, variables and (compound) terms, OpenCog does not: all of these are considered to be atoms of different types (e.g. the VariableNode).
In some programming languages, such as scheme and LISP, symbols can be given properties. The analogous concept here is that of a valuation: each OpenCog atom can be given various different properties, or values.
As a database
Nodes and Links (and thus hypergraphs) can be created, manipulated and traversed without using the AtomSpace. The AtomSpace provides some additional utility that is not available by using "naked" atoms:
- Uniqueness of Atoms. An AtomSpace will contain only one single Node of a given name, and one single Link with a given outgoing set.
- Indexes. The AtomSpace contains several indexes to provide fast access to certain classes of atoms, such as queries by atom type, by atom name, by outgoing set, and the like.
- Persistence. The contents of an AtomSpace can be saved-to/restored-from some storage medium, such as a traditional SQL database.
- Distributed computing. Atoms can be shared across multiple network servers, by having them use a common database backend. See distributed AtomSpace for details.
- Pattern search. The AtomSpace can be searched for all subgraphs of some particular shape.
- Change notification. The AtomSpace provides signals that are delivered when an atom is added or removed, thus allowing actions to be triggered as the contents change.
Hypergraph Database Design Requirements
The AtomSpace is nothing more than a container for storing (hyper-)graphs. It is optimized so that one can implement (probabilistic/fuzzy) inference-engines/theorem-provers, etc., such as PLN, on top of it. To achieve this, the Atomspace must have some database-like features:
- It must be queriable for all occurrences of certain hypergraphs, i.e. one must be able to perform pattern-matching against arbitrary query patterns.
- It must maintain user-defined indexes of certain common pattern types, using e.g. the RETE algorithm, and/or other database-like indexing systems.
The AtomSpace implementation has some additional requirements:
- Perform queries as fast as possible.
- Be thread-safe.
- Hold hypergraphs consisting of billions or trillions (or more) nodes/links; scale to petabytes (or more).
- Save & restore hypergraphs to media, such as disk, a more traditional SQL or non-SQL database, or other system (e.g. flat files, XML, etc.).
- Exchange, query and synchronize hypergraphs with other, network-remote atomspaces or servers, in a manner as fast as possible, while maintaining as much coherency as possible.
- Allow extremely fast access to an Atom's incoming set, outgoing set, TruthValue and AttentionValue.
The current implementation fails on some of these requirements, especially in scalability; there is room for improvement. Nonetheless, a basic system exists. The following is currently possible:
- The AtomSpace can be accessed using C++, Scheme, Python, Haskell, a web interface (REST) or ZeroMQ. Low-level access via C++ is about 40x faster than access with scheme or python; thus scheme and python are better suited for invoking high-level functions (such as the pattern matcher, which can take a fair amount of time to search large atomspaces.) The REST and ZMQ interfaces are another order of magnitude slower than the scheme/python interfaces, and are thus usable only for casual inspection.
- Access to the AtomSpace via C++ is completely thread-safe. Access via scheme is thread-safe, but seems to somehow serialize to only one thread, presumably due to some currently-unknown design bug/feature. The python interface is not thread-safe.
- AtomSpace contents can be saved/restored as string s-expressions (i.e. scheme), and in an SQL database.
- The SQL storage backend allows a weak form of dynamic, on-demand saving and loading of hypergraphs. The contents of the AtomSpace can be bulk-saved/restored. In addition, individual atoms can be saved/restored on request. This is intentional: forcing all atoms to be automatically mirrored to the database would require a large CPU overhead, especially for workloads that rapidly create and destroy a lot of transient atoms, or modify truth values at a rapid clip. The store-back queues for the SQL database are fully parallel and asynchronous; the current design uses four store-back threads by default.
- Multiple AtomSpaces on multiple network-connected servers can communicate via the SQL backend. That is, they can share common atoms. User algorithms must explicitly push atoms to the database in order for them to become visible to other AtomSpaces. This is intentional: trying to maintain constant synchronization across multiple AtomSpaces would be extremely CPU-intensive, especially since many workloads create and destroy atoms at a rapid pace, and there is no need to share such atoms.
- A generic pattern matcher has been implemented. Queries may be specified as hypergraphs themselves, using SatisfactionLink, BindLink and GetLink. Procedures may be triggered using ExecutionOutputLink. Low-level access to the pattern matcher is possible by coding in C++, scheme or (coming soon!) python.
- A half-dozen hard-coded indexes are kept by the atomspace. User-defined indexes are not yet supported (due to lack of demand).
- Atomspace contents may be viewed with several different visualization tools, including the atomspace visualizer.
- The AtomSpace benchmark tool measures the performance of a dozen basic atom and atomspace operations. A diary in the benchmark directory maintains a historical log of measurements.
Atoms are uniquely identified by a Handle; a Handle is essentially a pointer to an atom. Knowing the Handle is enough to retrieve the corresponding atom (if it exists -- deleted atoms can no longer be retrieved!)
The AtomSpace provides signals that are delivered when an atom is added or removed, thus allowing actions to be triggered as the contents change.
The current implementation of the AtomSpace signals is very strongly deprecated; it has a very strong, serious negative performance impact on all users. The current signals API is likely to be removed at some future date.
Multiple atom spaces within a single program can be used, either independently, or as nested (hierarchical) environments, or both. Multiple atom spaces on multiple network-connected machines can (automatically) communicate via the SQL backend (if they are all pointed to the same backend).
The AtomTable provides the indexes and the signals for the AtomSpace. Its the main class that actuall holds the references to atoms. The
AtomSpaceImpl class provides two services: the interface to the persistence backend, and maintenance of the AttentionBank.
The AtomSpace class itself does nothing at all, other than to provide a uniform API to these other services.
Atoms can be stored on disk, or even another machine, rather than always kept in RAM. Thus, an Atom is fetched into local memory only when it is actually needed. See opencog/atomspace/BackingStore.h for details, and opencog/persist/README for a persistence implementation.
The persistence backend allows multiple AtomSpaces on multiple networked servers to communicate and share atoms. In practice, this can scale to possibly several dozen servers but probably no more than that, depending on the workload.
See distributed AtomSpace for details.
Working with AtomSpaces
There are multiple ways of working with atomspaces. See the multiple AtomSpaces wikipage for more info.
If you are using the scheme bindings:
- ,apropos atomspace -- list all atomspace-related functions available to the scheme programmer. These are:
- cog-atomspace -- get the current atomspace used by scheme
- cog-set-atomspace! ATOMSPACE -- set the current atomspace
- cog-new-atomspace -- create a new atomspace
- cog-atomspace? OBJ -- return true if OBJ is an atomspace, else return false.
- cog-atomspace-uuid ATOMSPACE -- get the UUID of the ATOMSPACE
- cog-atomspace-env ATOMSPACE -- get the parent of the ATOMSPACE
- cog-atomspace-stack -- A stack of atomspaces
- cog-push-atomspace -- Push the current atomspace onto the stack
- cog-pop-atomspace -- Pop the stack
You can get additional documentation for each of the above by typing ,describe func-name. So, for example ,describe cog-new-atomspace prints the documentation for the cog-new-atomspace function.
You can share an atomspace with python by using this:
- python-call-with-as FUNC ATOMSPACE -- Call the python function FUNC, passing the ATOMSPACE as an argument.
Say ,describe python-call-with-as to show the documentation for this function.
The above is provided by the (opencog python) module. For example:
$ guile guile> (use-modules (opencog) (opencog python)) guile> (python-eval " from opencog.atomspace import AtomSpace, TruthValue from opencog.atomspace import types def foo(asp): TV = TruthValue(0.42, 0.69) asp.add_node(types.ConceptNode, 'Apple', TV) ") guile> (python-call-with-as "foo" (cog-atomspace)) guile> (cog-node 'ConceptNode "Apple")
If you are in python, you have these functions available:
- AtomSpace() -- create a new atomspace
- scheme_eval_as(sexpr) -- evaluate scheme sexpr, returning an atomspace.
- set_type_ctor_atomspace(ATOMSPACE) -- set the atomspace that the python type constructor uses.
An example usage of some of the above:
from opencog.atomspace import AtomSpace, TruthValue from opencog.atomspace import types from opencog.scheme_wrapper import scheme_eval_h, scheme_eval_as # Get the atomspace that guile is using asp = scheme_eval_as('(cog-atomspace)') # Add a node to the atomspace TV = TruthValue(0.42, 0.69) a1 = asp.add_node(types.ConceptNode, 'Apple', TV) # Get the same atom, from guile's point of view: a2 = scheme_eval_h('(cog-node \'ConceptNode "Apple")') # Print true if they are the same atom. They should be! print "Are they equal?", a1 == a2