From OpenCog
Jump to: navigation, search

The OpenCog system has a variety of different things that can be called an "API". This page attempts to provide a quick sketch.


The primary API provided by OpenCog is Atomese, and, the funny thing is, its not an API intended for humans. Rather, it is a kind-of programming language designed so that complex, possibly non-human algorithms can can full access to the knowledge representation system, and manipulate, move around and control that data.... except that the algorithms themselves live within the knowledge-base, and thus can act on themselves, self-modify, as it were. The primary interface to Atomese are the various Atom types, most of which have various underlying C++ implementations to do the actual heavy lifting. One of the most prominent Atomese interfaces is the pattern matcher.

If you look at the various Atoms, it will gradually become clear that it looks a lot like an ordinary programming language, except that its much more verbose and awkward to use... for humans. For machine algorithms, we are working hard to make sure that its a good fit, and is actually easy to use ... by machines.

You can program in it directly, and many (most?) of the programmers here do. However, since its verbose, its a bit like programming in assembly language. Thus, you want to avoid getting caught in a trap here: you don't want to write much Atomese directly; you do want to write algorithms that do things with it.

Programming Language Bindings

Programming in OpenCog can currently be done in:

Possibly interesting future language bindings include Rust, Scala, Elixir, R. It might be best if the language supported parametric polymorphism (aka "higher-rank types"). Alternately, it would be best if the language binding provided parameteric polymorphism "natively".

The reason that parametric polymorphism is interesting is because the AtomSpace can be viewed as a free-for-all graph data store, and it is often interesting to look at some portion of it as being of a certain kind of data: for example, some subset of atoms can be thought of as an NxN sparse matrix. One would like to be able to specify some subset as being such a matrix, and then get "all possible matrix algorithms" "for free" -- i.e. without any additional programming, and also without having to write import/export functions that export data from the atomspace into someone-else's notion of an NxN sparse matrix, do some calculation, and then import the results back into the atomspace. In general import/export tends to be inefficient and hard to manage, and thus best avoided.

The above is one reason why R bindings to the AtomSpace are more interesting than SciPy bindings: the Rcpp bindings provides polymetric polymorphism as the core, basic design principle, whereas SciPy does not (from what I can tell -- although this is a generic design issue with python, it seems).

Network data access

There are currently three ways of accessing data remotely:

The REST API provides RESTful style access to the AtomSpace. Its terribly low-level, accessing individual atoms, and therefore, its terribly slow. Great for highly-selective poking around, but that's it.

Slightly more robust are the two scheme network interfaces. In both cases, the general idea is that you can send atoms into the AtomSpace, and get results back, by treating them as ASCII (UTF8) scheme strings. This is nicely general purpose, but is partly limited by the speed of interpreting the string being passed around. For current-generation CPU's and software, this runs at about 20K Atoms/second. This means that dealing with millions of atoms can take many minutes or more. There are numerous bottlenecks that determine this limit; one of them is speed of insertion of atoms into the atomspace.

The Guile REPL server provides network access; but is slower and less dependable that the scheme shell in the CogServer.

An ideal network server would provide the speed and robustness of the CogServer, login (aka capability) management and encryption of the network traffic. One might think that there are pre-existing libraries and open-source projects that provide this, but I have not been able to find any.

Distributed computing

There are currently two ways of doing distributed computing in OpenCog, and a third proposal:

Performing distributed computing by having different atomspaces on different machines access the same SQL backend is a bit ad-hoc. The different processes that are running in the different atomspaces all have to have some implicit or explicit coordination between one-another, so that they achieve desired results. This solution is not terribly scalable: it breaks down after about ten or twenty distinct processes. This primarily has to do with the speed of updating SQL databases.

The Gearman interface is totally ad-hoc. It allows users to send work-jobs, as scheme strings, to various cogservers, and to collect up the results of those computations. That's all - its nothing more than that -- a simple/simplistic message-passing, work-farm API.

Subsystem APIs

There are some large and important subsystems in OpenCog. These are reviewed in the AI Documentation page, and include:

The Cognitive API

Currently OpenCog is mainly useful to AI researchers and AI application developers. For it to be useful to plain old application developers who want to use AI in their applications a simpler and cleaner API will need to be developed (along with more robust, reliable functionality!). This has been summarized as the concept of a Cognitive API.

It has been suggested to first develop the Cognitive API concept in the context of data mining applications, creating an API allowing OpenCog to be used by non-AI software developers as a large-scale unsupervised data mining and querying tool.