Node

From OpenCog

A Node is one of the basic structures that is stored in the AtomSpace. Informally, it can be (should be) thought of as a labelled vertex (of a graph).

Nodes, together with Links, make up the basic (hyper-)graph structure of the AtomSpace. Thus, for example, a single Link that contains two Nodes can be thought of as a traditional "edge" of conventional graph theory. Links, however, are more general than ordinary edges: they can contain more than two Nodes (or less) and they can also contain other Links.

By convention, the label on a Node is a null-terminated byte-string. Some specialized Nodes can contain other data; for example, a NumberNode stores a number. Of course, this number could also be represented as a string; the NumberNode provides convenience utilities for working with it's label as a number. (More precisely, a NumberNode is a vector of numbers; this is useful for various arithmetic and numerical applications.)

There is more than one type of Node: besides NumberNode, there is ConceptNode and PredicateNode (used for knowledge representation), GeneNode and ProteinNode (used for biochemistry), WordNode and SentenceNode (used for natural language) and many others. All of these different types are formally Types, in the formal, mathematical sense of Type Theory.

Formal definition

Nodes and Links are the two primary types of Atoms; all Atoms are either one or the other. A Node is specifically an Atom with a name (Links do not have names). The name together with the type of the Node form a globally-unique key. The AtomSpace ensures that there is only one such Node in the entire AtomSpace. Thus, any reference to a Node of a given type and name is always a reference to the same Node.

The AtomSpace implementation ensures this uniqueness; thus inserting a "new" node with the same name and type will just cause the AtomSpace to return the old Node (if it exists already). The general philosophical principle is that all Atoms are universally unique; However, if there are multiple AtomSpaces running on multiple machines, there is no automatic system for coordinating them to ensure uniqueness (unless you are running either the Postgres backend, or a network of properly configured CogServers; but this is an advanced topic.)

The name and type of a Node cannot be changed after the node has been inserted into the AtomSpace: OpenCog atoms are immutable objects.

Nodes (and Atoms in general) do serve as anchor points for Values, such as TruthValues. That is, one can associate one or more (mutable) Values to a Node (or to any Atom). Values are stored as key-value pairs, so, in fact, every Node (every Atom) is an anchor point for a key-value database (in full generality). Thus, the Node name is used only to ensure global Node uniqueness; other values (of any kind: numerical, vector, video/audio, etc.) are stored or referenced in the Value database attached to the Node.

Examples

For example, one can define a directed edge by writing

Link
    Node "from vertex"
    Node "to vertex"

Here, indentation is used to indicate that the Link contains two Nodes. It is common to use s-expression notation for the above, and so it may be written as

(Link (Node "from vertex") (Node "to vertex"))

with the parenthesis denoting the hierarchical grouping. Some users use the Python bindings to the AtomSpace, and so they will typically write

Link( Node("from vertex"), Node ("to vertex"))

for the above. Mailing-list discussions most commonly use the s-expression style of notation. Core Atomese always uses the s-expression style.

Labelled directed edges

There are many different ways in which a labelled directed edge can be represented in Atomese. One "obvious" possibility is to write

Link
    Node "edge label"
    Link
       Node "from vertex"
       Node "to vertex"

although there are other possibilities. The AtomSpace has a multi-decade history, and the conventional, traditional way of writing a labelled edge is

EvaluationLink
    PredicateNode "edge label"
    ListLink
       ConceptNode "from vertex"
       ConceptNode "to vertex"

This is a historical artifact. There is no code in the AtomSpace that assumes this format, or expects it to be used; however some subsystems, such as PLN, do expect this format. Many other subsystems, including the biology and the natural language subsystems, use a format similar to this (with different node types, e.g. with GeneNodes and WordNodes in place of ConceptNodes, and so on.)

System programming

Most users are expected to use pure Atomese via scheme or python (or haskell) and C++ programming is reserved for system programmers who are creating optimized and special-purpose subsystems, or are creating new, custom Node types.

The actual implementation of the AtomSpace is in C++, and the C++ interface looks vaguely, very roughly like this:

   class Node : public Atom
   {
       private:
         std::string name;
       public:
         Node(Type, std::string &);
         const std::string& getName() const;
         Value getValue(const Atom& key) const;
         std::string toString() const;
   };

The getName() method returns the name of the node; the toString() method returns a string representation combining the type and name. The getValue() method is used to look up Values in the KVP database associated with this Node. The actual implementation can be found in github, in the opencog/atoms/base directory.

Extensions

Atoms have been designed so that system programmers can create highly customized Atom types having specific properties and capabilities. For example, the C++ implementation of the NumberNode looks (very roughly) like this:

   class NumberNode : public Node
   {
       private:
         std::vector<double> values;
       public:
         NumberNode(double);
         NumberNode(std::vector<double>);
         double get_value();
         std::vector<double> get_values();
   };

This allows vectors of numerical values to be stored directly as a part of the Node, avoiding any kind of funny-business with turning those values into strings and back. The current NumberNode implementation can be found in the github directory opencog/atoms/core.

The above example is only the tip of the iceberg, though. One could design, for example a NeuralNetNode. This is a hypothetical example for this wiki page, although something similar has been created for other subsystems. Thus, one might have

   class NeuralNetNode : public NumberNode
   {
       public:
         Atom& execute();
   };

where the execute() method sends stuff to a GPU, or perhaps starts GPU processing, or otherwise engages with special-purpose deep-learning libraries and systems. By default, all Atoms have two virtual methods, one called execute() and the other called evaluate() that return Atoms and Values, respectively. These have been used to create a wide variety of custom Atom types. Consider, for example, how one might implement a PlusLink: when calling the execute() method on the PlusLink, it should "obviously" add together the numerical values of it's arguments. If the PlusLink has VariableNodes as it's children, then it can perform symbolic algebra. (This is not just a theoretical example; the current PlusLink implementation actually does do basic symbolic algebra. Yet, this is not the best way to do symbolic algebra; the ASMOSES infrastructure is currently redesigning/enhancing Reduct to provide a generic symbolic algebra framework.) The current implementation of PlusLink can be found in the github directory opencog/atoms/reduct.

There are a large number of Atoms that are specialized in the above way. There is a node for holding Postrgres database connections (see opencog/persist), for RocksDB storage backends (see https://github.com/opencog/atomspace-rocks), for Link Grammar dictionaries and parsing (see opencog/nlp/lg-dict and opencog/nlp/lg-parse) and many many others, too numerous to easily list.

Note that all such wrappers adhere to the basic property that Nodes are universally unique, and that they are immutable. Thus, although they may be executable, a given Node sitting in the AtomSpace is unique, and neither it's name nor its type can change. This is a basic assuption that all subsystems make, and is a necessary assumption, since many subsystems are multi-threaded, and run algorithms in parallel. Executable Atoms must be thread-safe.

JIT for scheme, python and other languages

It is often the case that Atoms need to wrap the functionality of existing python libraries, or that Atoms that represent abstract syntax trees need to be compiled down to bytecode. Because this is a common requirement amongst AtomSpace system programmers, a specialized set of interfaces were developed to simplify such wrapping. These include the GroundedPythonNode and the GroundedSCMNode; see the github repo directory opencog/atoms/grounded for details.