Atom

From OpenCog
Jump to: navigation, search

Atoms are one of the main components of the AtomSpace; Atoms, together with Values are what the AtomSpace stores. Atoms are named after atomic formulae (or atoms) in mathematical logic, although they are more general. The two primary types of Atoms are Nodes and Links; these are used to represent and store (hyper-)graphs; they correspond very roughly to the idea of vertexes and edges in Graph Theory. Unlike ordinary graphs, Atoms are typed (in the sense of Type Theory), and thus can be used to store a large variety of information. Although the AtomSpace was originally intended to store logical assertions, as well as facts about the world, to which probabilistic TruthValues could be assigned, the AtomSpace has become more general than that. It is currently used to store natural language grammars, dictionaries and parsers, to store biochemical and biomedical data, robot control algorithms, machine learning algorithms, audio/video processing pipelines and deep learning neural networks.

All of these systems are built on cornerstone foundations. Nodes and Links are used to represent anything that resembles a graph-theoretical graph. Values are used to assign "valuations" to Atoms: Values can be boolean true/false valuations, but they can also be numbers, weights, vectors or strings, or any combination thereof: every Atom has a key-value database attached to it, that can store any kind of information about that Atom. This distinction between the "shape of the graph", and the data that is "hung on it" is central for allowing high-speed graph traversal and generalized graph query.

Another fundamental cornerstone is Atomese: the idea that everything, including graph queries, graph re-write rules, graph processing algorithms, data pipelines and the like, can all be represented with Atoms, and implemented with Atoms, at runtime. Thus, Atoms can be used not only to form abstract syntax trees representing some algorithm or procedure, but those trees are also executable. One of the things expressible as Atoms is the query language itself: thus one can "query for queries", which sounds perhaps arcane, but is how chatbots actually work (that is, how chatbots work not only in OpenCog, but in general).

A final foundational piece is that Values an be accessed and modfied with Atoms themselves. That is, the algorithms that modify Values can also be expressed in Atomese. Thus, the "graph" can be thought of as plumbing, and values are the fluid, the water that flows in this plumbing. This is somewhat similar to GPU programming, in that a GPU pipeline is the "plumbing", and 3D color and texture data is the "fluid". Unlike GPU programs, the "fluid" can be used to alter the processing program (the pipeline) itself. Thus, kind-of like a hydraulic valve.

All of the above is possible because Atoms themselves (and Values) are extensible: they can be used to wrap existing C++ or python (or scheme or haskell, etc.) libraries and export those functions in Atomese. This is different than merely being a "foreign function call", because Atoms are themselves stored in a database, the AtomSpace; and Atoms represent graphs, and so graph re-write rules can be used to modify, at run-time, how the wrapped library is invoked, when it's invoked, and on what data it applies. Atomese allows one to write algorithms that modify algorithms. In this sense it resembles a compiler "intermediate language", except that it is stored in the AtomSpace, instead of being fleeting and discarded. Atomese can also be thought of as "source code", but instead of being stored in a flat file, and written in a human-friendly programming language, it is stored in a database, and written in a machine-learning-friendly language.

In a nutshell, this is what Atoms are. Clearly, to unpack and articulate all of the above will require a fair bit of work. Please note: none of this is theoretical: all of this is coded up, and works, and has been deployed in practical systems over the last 5-10 years. It's gone through multiple trials-by-fire.

All of the above aspects are illustrated with examples, in the github directory examples. These examples can be directly run after installing the AtomSpace.

Atoms as logic statements

An Atom captures the notion of an atomic formula (or atom) in mathematical logic. The primary differences between the concept of atoms in logic, and the concept of Atoms in OpenCog is that OpenCog Atoms have TruthValues that are more general than 'true' or 'false'. Another important difference is that AndLink and OrLink are Atoms themselves, and can be used to combine other Atoms, and so OpenCog does not draw a clear distinction between atomic formulas and atomic sentences: both are given equal footing. Likewise, the ForAllLink and ExistsLink are Atoms that are analogs for the logic expressions x and y, while LambdaLink corresponds to the lambda-calculus notion of λx. Patterns with variables to be grounded are specified using the BindLink.

Truth values can be taken to be probabilistic, and thus the resulting logic network can be understood to model a Bayesian logic network, or a Markov logic network. Things like Hidden Markov Models are a special case. Truth values need not be probabilities; they can be any collection of weights, numbers, vectors and strings, and so can model fuzzy logic and many other kinds of predicate formulae.

Because all Atoms carry an immutable type, they can be understood as terms, in the sense of logic or type theory. Atoms are just type instances.

Atom Types

Main article: Atom types

All Atoms in the AtomSpace have a type. The two most basic types of Atoms are the Node and the Link. The types form a type hierarchy: all atoms inherit from the type "Atom", and the type Atom itself inherits from ProtoAtom. The ProtoAtom is itself the base type for values (such as truth values) as well as atoms.

The list of supported atom types can be found in Category:Atom Types.

There is an infrastructure for working with atom types: these are the type constructors, of which the TypeNode, the SignatureLink, the ArrowLink and the TypeSetLink are all examples. Polymorphic types can be specified using the TypeChoice link. The type, and more generally, the signature, can be checked with the type checker.

In essence, there is a fairly robust type system in Atomese, and thus allows Atomese to describe itself in terms of it's own types.

Atoms as symbols

Atoms are normally stored in an AtomSpace, and, when they are placed in the AtomSpace, they become unique. Thus, in comparison to other programming languages, Atoms can be understood to be the same thing as symbols. The AtomSpace is essentially the same thing as a symbol table. A symbol table commonly has the property that all symbols in it are unique.

Once an Atom is placed in the AtomSpace, it gets a single, unique ID. This unique ID is the string name of a Node, and it is the outgoing set of a Link.

In the Ruby programming language, symbols are literally called atoms. This is not an accident! Likewise, symbols are called atoms in Prolog. However, Prolog distinguishes atoms, variables and (compound) terms; OpenCog does not. Thus, the VariableNode plays the role of a Prolog variable, and a Link is the OpenCog analog of a compound term.

Both LISP and Guile allow parameters to be attached to symbols. These are analogous to the Values that can be attached to Atoms.

Valuations

A TruthValue gives each Atom a valuation or an interpretation; thus all Atoms in a given, fixed AtomSpace always carry a default valuation/interpretation along with them. Additional interpretations can be created by using contexts and the ContextLink. When contexts are used, truth values resemble conditional probabilities; that is, the truth value of an atom A in context C can be roughly understood to behave as a probability P(A|C).

The more complex and more general notion of an interpretation or structure in model theory is handled by means of the EquivalenceLink and the SatisfyingSetLink. Atoms can be placed into multiple atom spaces for various convenience calculations.

The EvaluationLink allows a relational model to be specified (again, in the sense of model theory), and thus, the collection of Atoms in an AtomSpace can be thought of as being a generalization of a relational database. In a certain sense, the EvaluationLink is the most central and important atom type in OpenCog, as it is how OpenCog implements knowledge representation.

The atom types page provides a general overview of the various different atom types; see the Category:Atom Types for an index of the atom types currently in use.

Implementation

Atoms are implemented as C++ classes (with bindings for scheme, python and haskell.) Many atom types (in fact, most atom types) provide some specialized operation, such as searching for patterns, adding two numbers, beta-reducing an expression or distinguishing between free and bound variables. The code to perform these specialized operations is implemented in C++. Thus, many/most atoms are executable: they have a C++ method on them to "run" the operation that they implement. For the users of Atomese, these programming API's are irrelevant, and thus are not documented here. The API's are for the system programmers only; for everyone else, it is sufficient to create Atoms and place them into the atomspace, and then trigger execution.

A more detailed sketch of how Atoms are extended with code is given at the bottom of the article on nodes.

At this time, most specialized Atom code is written in C++, although python has been used in a few cases. There are two ways in which this can be done. One is a fairly traditional "foreign function interface", this is accomplished with the GroundedSchemaNode and the GroundedPredicateNode. Another, more closely integrated variant is with GroundedSCMNode and GroundedPythonNode, which are able to cache JIT bytecode from these languages. These allow Atomese abstract syntax trees to be compiled into fast runtimes. See opencog/atoms/grounded for details.