Link

From OpenCog

A Link is one of the basic structures that is stored in the AtomSpace. Together with Nodes, Links are used to define (hyper-)graphs. Links generalize the idea of an edge in an ordinary graph. And ordinary graph edge is just a Link with two Nodes in it; the two Nodes are the vertexes at each end of an ordinary edge. Link generalize this idea by allowing more than two Nodes (or less!) to be contained in a Link. Furthermore, Links can also hold other Links.

Links, together with Nodes, are collectively called Atoms; they are the two basic types of Atoms. Thus, a Link can be very easily thought of as a container for Atoms. Since Links can contain other Links, they can collectively be visualized to be a tree. Thus, naively, one might say that the AtomSpace is a database that holds trees: it is a "multi-tree" or a "forest". Thinking of the AtomSpace in this way is very useful, but is only partly correct. This is because all Atoms in the AtomSpace are universally unique.

Consider, for example, the structure

Link
   Node "vertex A"
   Node "vertex B"

This can be understood to be either a tree with two branches, or it can be understood to be an edge, with it's two endpoints being given labels. Given a second edge

Link
   Node "vertex B"
   Node "vertex C"

the AtomSpace ensures that (Node "vertex B") is universally unique, i.e. that it is the same Node. Thus, an AtomSpace containing these two "trees" can also be thought of as containing a simple graph of two edges connecting three vertexes.

Just as the AtomSpace ensures the uniqueness of two identically-named Nodes, it also ensures the uniqueness of Links (trees), so that identical Links necessarily become references to the same Link. Thus, the AtomSpace becomes no longer just a collection of trees; these trees share leaves and branches, and so are more correctly visualized as a dense brush or rhizome. This is similar to how a graph is more than just a collection of edges: the edges share vertexes with one-another; this is what distinguishes a graph from a collection of edges. Likewise, a hypergraph shares sub-trees between different trees, so that they inter-connect.

Formal Defintion

A Link is a type of Atom that connects (associates together) other Atoms. The outgoing set of a Link is the contents of that Link. The incoming set of a Link is the set of all other Links that contain it. Both the outgoing and the incoming sets are explicitly maintained, so that a hypergraph can be traversed in the fastest possible way. This is a core design principle: it distinguishes Atoms from Values, in that the hypergraphs that Atoms form are easily and quickly traversable, whereas Values are not. (That is, Values can also encode graphs, but nothing special is done to make such graphs rapidly traversible.)

Links are typed, in the formal sense of Type Theory. Types allow a large variety of diffferent Link types to be defined, each having distinct properties. The most basic division is between the OrderedLink and the UnorderedLink, which, as the names suggest, maintain either a strict sequential ordering of their contents, or imply an arbitrary order. A special case of the UnorderedLink is the SetLink, which is explicitly meant to resemble the "set" of set theory. There are many other special cases; for example, the AndLink and OrLink are used to model the boolean-logic operators of the same name; InheritanceLink is used for knowledge representation and ontologies; PlusLink and MinusLink are used to build arithmetic expressions and abstract syntax trees; the SentenceLink is used in the natural language subsystem; there are many many other link types.

The AtomSpace ensures that Links are unique. Links are identified only by their type and outgoing set (and by nothing else). That is, once a Link is inserted in an AtomSpace, it becomes the sole, unique version of that Link. Inserting a second Link of the same type and outgoing set just returns the first Link. The general philosophical principle is that all Atoms are universally unique; However, if there are multiple AtomSpaces running on multiple machines, there is no automatic system for coordinating them to ensure uniqueness (unless you are running either the Postgres backend, or a network of properly configured CogServers; but this is an advanced topic.)

The type and outgoing set of the Link cannot be changed: Atoms are immutable objects. Immutability is key for parallel graph traversal: A Link cannot change while it is being traversed. The implementation uses smart pointers to track Atoms, so that any reference to an Atom will never go stale and will not be de-allocated.

Links (and Atoms in general) do serve as anchor points for Values, such as TruthValues. That is, one can associate one or more (mutable) Values to a Link (or to any Atom). Values are stored as key-value pairs, so, in fact, every Link (every Atom) is an anchor point for a key-value database (in full generality). Thus, Links are used only to define the hypergraph structure, the connectivity between Atoms. Actual data, including labels, weights, vectors and so on are meant to be stored in the Value database attached to that Link. The stored Values are mutable; only the graph structure is kept immutable.

The "meaning" of a Link

The above described a Link as a certain type of data structure that is implemented in the AtomSpace. The interpretation of a Link, or it's "meaning" is left open-ended and ultimately determined by the user. There is more than one way to represent edges with Atoms, and different OpenCog subsystems use different representations.

Consider, for example, the problem of representing a labelled edge. One "obvious" possibility is to write

Link
    Node "edge label"
    Link
       Node "from vertex"
       Node "to vertex"

Another possibility is to just use the unlabelled form

Link
   Node "from vertex"
   Node "to vertex"

and to store the edge label as a Value on the Link. There are distinct performance and searchability (queriability) implications for these two different forms. The first form allows a trivial and instant search for all edges carrying the label "edge label". This is trivial, because one need only ask for the incoming set of (Node "edge label") and one is done: that's it. This query is instant, because the incoming sets are maintained in RAM. The same query is wildly inefficient in the second case. It would require an exhaustive search of all Links, fetching the appropriate Value attached to it, and then comparing to see if it has the desired name. This performance and ease-of-query comes at a price: incoming sets require a significant amount of RAM to store; Values avoid this overhead. Also, Nodes are immutable, and so the edge label is unchangeable. A name stored in a StringValue can be changed at any time.

Predicative Relationships

The AtomSpace was originally designed to be a system for knowledge representation (only), although it has grown since then. It is appropriate to mention that the conventional, agreed-upon format for storing logical predicates as labelled edges is to write them as

EvaluationLink
    PredicateNode "edge label"  // the name of the predicative relation 
    ListLink
       ConceptNode "from vertex" // the first of two concepts in the relation
       ConceptNode "to vertex"

This is a historical artifact. There is no code in the AtomSpace that assumes this format, or expects it to be used; however some subsystems, such as PLN, do expect this format. Many other subsystems, including the biology and the natural language subsystems, use a format similar to this (with different node types, e.g. with GeneNodes and WordNodes in place of ConceptNodes, and so on.)

Relational Algebra

The above hints at the idea that the AtomSpace encodes a relational algebra. The query engine provides the query language of that relational algebra. The complete set of concepts and ideas from relational algebra have been implemented in the AtomSpace, albeit with somewhat idiosyncratic names, as the AtomSpace is more than "just" a relational algebra, and has been tuned to optimize various use cases and various common programming styles and algorithms that aren't easily or commonly expressed in relational terms. For starters, the AtomSpace also includes many ideas from Model Theory.

To be more precise, a Link can be thought of as a relation R between a set of atoms. The outgoing set of relationship R is the list of arguments of that relation. The incoming set of an entity E is the set of relationships that take E as an argument. That is, a relation R(E1, E2, ..., En) can be represented as

RelationshipLink
   RelationNode "R"
   ListLink
      EntityNode "E_1"
      EntityNode "E_2"
      ...
      EntityNode "E_n"

(At this time, there are no Links and Nodes with these explicit names, and so this is an example, only. There are many other Link and Node types to choose from, and new Atom types can be defined at run-time or compile time, on a project-by-project basis, as needed by the user.) Documentation located on the pattern matcher wiki page, and the constellation of related pages, illustrate how conventional relational algebra concepts can be mapped to the corresponding Atomese.

System programming

Most users are expected to use pure Atomese via scheme or python (or haskell) and C++ programming is reserved for system programmers who are creating optimized and special-purpose subsystems, or are creating new, custom Atom types.

The actual implementation of the AtomSpace is in C++, and the C++ interface looks vaguely, very roughly like this:

typedef std::shared_ptr<Atom> Handle;  // Handles are smart pointers to Atoms
 
class Link : public Atom
{
    private:
       std::vector<Handle> _outgoing;
    public:
       Link(Type, const std::vector<Handle>&);
         
       const std::vector<Handle>& getOutgoingSet()
            { return _outgoing; }
       size_t getArity() const {return _outgoing.size(); }
       std::string toString() const;
};

The toString() method returns a string representations of the Link.

The getArity() method returns the arity of the Link, i.e. the number of Atoms which participate in the relationship it represents.

The getOutgoingSet() method returns the outgoing set of the Atom.

The analogous method to return the incoming set is located on class Atom, because both Nodes and Links have incoming sets. Likewise, class Atom has methods for getting, saving and copying Values. Atoms are also both executable and evaluatable, in general, and so virtual methods are provided for these. (The difference being that execution returns an Atom; while evaluation returns a Value.) The implementation can be found in github, in the directory opencog/atoms/base.

Extensions

Atoms have been designed so that system programmers can create highly customized Atom types having specific properties and capabilities. For example, PlusLink has a corresponding C++ class that implements addition. Thus, not only can one write the abstract syntax tree

PlusLink
    NumberNode "1"
    NumberNode "41"

as an abstract representation of a formula, but one can also execute the formula to obtain the answer "42". There are many dozens, approaching a hundred different Atom types that have underlying C++ implementations that are capable of being executed or evaluated. See Node#Extensions for more details on the conceptual process of creating custom Atom types. The core set of Atom types provide actual examples of how this is done; see the github opencog/atoms/core directory.

Incoming and Outgoing Sets

The outgoing set of a Link is the set of atoms that it encompasses. The incoming set is the converse.

That is, one can visualize a Link as vertex in a directed tree, and its outgoing set being all of the child-vertexes under that vertex. By default, the outgoing set is always an ordered set, unless the link is a type of UnorderedLink.

Since, in OpenCog, an Atom can appear in many different trees (in many different links), the incoming set is the set of all the Links in which an atom might appear. The incoming set is always unordered, and there is no (natural) way to order it.

Below is an example. Consider an AtomSpace formed by four Nodes A, B, C and D and four (ordered) Links:

Link Outgoing set
L1 {A, B}
L2 {A, C}
L3 {D, B}
L4 {L1, A}

The following table shows the incoming and the outgoing sets of each Atom:

Atom Incoming set Outgoing set
A {L1,L2,L4} null
B {L1,L3} null
C {L2} null
D {L3} null
L1 {L4} {A,B}
L2 null {A,C}
L3 null {D,B}
L4 null {L1,A}

The outgoing set is stored in an std::vector of smart pointers to Atoms. The incoming set is stored inside the Atom and represented as a set of weak pointers. They are stored as weak pointers, to avoid the memory management issues that would otherwise ensue from reference counting of circular graphs. That is, Links always form a DAG, with the downward edges being smart pointers, and the upwards edges being weak pointers.