API

From OpenCog

The OpenCog system has a variety of different things that can be called an "API". This page attempts to provide a quick sketch.

Atomese

The primary API provided by OpenCog is Atomese, and, the funny thing is, its not an API intended for humans. Rather, it is a kind-of programming language designed so that complex, possibly non-human algorithms can can full access to the knowledge representation system, and manipulate, move around and control that data.... except that the algorithms themselves live within the knowledge-base, and thus can act on themselves, self-modify, as it were. The primary interface to Atomese are the various Atom types, most of which have various underlying C++ implementations to do the actual heavy lifting. One of the most prominent Atomese interfaces is the pattern matcher.

If you look at the various Atoms, it will gradually become clear that it looks a lot like an ordinary programming language, except that its much more verbose and awkward to use... for humans. For machine algorithms, we are working hard to make sure that its a good fit, and is actually easy to use ... by machines.

You can program in it directly, and many (most?) of the programmers here do. However, since its verbose, its a bit like programming in assembly language. Thus, you want to avoid getting caught in a trap here: you don't want to write much Atomese directly; you do want to write algorithms that do things with it.

Programming Language Bindings

Programming in OpenCog can currently be done in:

Possibly interesting future language bindings include Rust, Scala, Elixir, R. It might be best if the language supported parametric polymorphism (aka "higher-rank types"). Alternately, it would be best if the language binding provided parameteric polymorphism "natively".

The reason that parametric polymorphism is interesting is because the AtomSpace can be viewed as a free-for-all graph data store, and it is often interesting to look at some portion of it as being of a certain kind of data: for example, some subset of atoms can be thought of as an NxN sparse matrix. One would like to be able to specify some subset as being such a matrix, and then get "all possible matrix algorithms" "for free" -- i.e. without any additional programming, and also without having to write import/export functions that export data from the atomspace into someone-else's notion of an NxN sparse matrix, do some calculation, and then import the results back into the atomspace. In general import/export tends to be inefficient and hard to manage, and thus best avoided.

The above is one reason why R bindings to the AtomSpace are more interesting than SciPy bindings: the Rcpp bindings provides polymetric polymorphism as the core, basic design principle, whereas SciPy does not (from what I can tell -- although this is a generic design issue with python, it seems).

Network data access

There are currently three ways of accessing data remotely:

The Guile REPL server and the cogserver scheme network shell are conceptually similar, in that both allow you to run arbitrary scheme code at the network prompt. The cogserver shell is an order of magnitude faster and more scalable that the guile REPL server. It is also more stable (the REPL server will crash and/or hang). The cogserver shell is extremely robust -- its been pounded on more than just about any other component, and all teh bugs and performance problems have been squeezed out.

The cogserver also provides an Atomese-only interface. This interface is much faster (maybe 4x or 10x faster) than the scheme cogserver interface. This is because absolutely everything but the bare minimum has been cut out. There is no scheme interpreter. The incoming Atomese is decoded, directly, with tuned C code that avoids copies and all un-needed operations. Incoming Atomese is immediately inserted into the AtomSpace; queries are run, with the Atomese results returned without any further format conversions or copying.

The best way to perform atomspace-to-atomspace communications is to just use the CogStorageNode. it's simple, easy, direct, fast, and handles all the details for you.

In the past, there were ZeroMQ, Protocol Buffer, RESTful APIs, JSON and other systems for talking to the AtomSpace. All of these other communications channels were slower and klunkier, bloated and unmaintainable. Why? Because Atoms are tiny. Just about any kind of format conversion done to an Atom is pure overhead: Atoms are just some hundreds of bytes of RAM: you can access a few hundred bytes very quickly. An operation as simple as copying an Atom will cut your performance in half! Copying it, and converting the format to something else will be at least 10x slower, or more. Any layering at all is slower than direct access. All those other layers do is bottleneck. A famous quote: the best part is no part. The best process is no process. All these other systems just add parts and processes, and offer nothing in return.

Distributed computing

The Networked AtomSpaces wiki page defines general terms for network computing with AtomSpaces, and describes several different organizational structures. As of this writing, there are two usable and practical means of doing networked computing with AtomSpaces.

  • The CogStorageNode provides a peer-to-peer mechanism for exchanging Atoms with another AtomSpace. It allows Atoms to be sent and received, and it allows remote queries to be performed (one can ask the remote AtomSpace to perform a BindLink or QueryLink on your behalf).
  • The PostgresStorageNode provides a shared-dataset service. Multiple AtomSpaces can attach to the same dataset, and send/receive Atoms, and perform queries. It uses exactly the same API as above: the StorageNode API.

Some very early, pre-alpha ideas and experiments for more complex organizatinos can be found in the git repo https://github.com/opencog/atomspace-agents/. It is given the funny name "agents" because different AtomSpaces will typically have different roles to play in a network organization. Solving a distributed networking problem is a lot like organizing a construction crew, or an office of white-collar workers: there are different roles, responsibilities and organizational structures, a lot of different ways of doing things. The CogStorageNode provides the basic peer-to-peer communications mechanism on which to build these more complex systems.

Subsystem APIs

There are some large and important subsystems in OpenCog. These are reviewed in the AI Documentation page, and include:

The Cognitive API

Currently OpenCog is mainly useful to AI researchers and AI application developers. For it to be useful to plain old application developers who want to use AI in their applications a simpler and cleaner API will need to be developed (along with more robust, reliable functionality!). This has been summarized as the concept of a Cognitive API.

It has been suggested to first develop the Cognitive API concept in the context of data mining applications, creating an API allowing OpenCog to be used by non-AI software developers as a large-scale unsupervised data mining and querying tool.