Massively Multithreaded OpenCog
Speculative Notes on Refactoring OpenCog's Core to Better Exploit Multicore and Hardware Multithreading
(Ben Goertzel, October 20, 2013)
This speculative note outlines one, fairly conjectural direction for refactoring the OpenCog Atomspace to make it more efficient. The focus is on achieving better exploitation of multicore systems and hardware multithreading – e.g. making an Atomspace that would run very effectively on hardware like the Intel Xeon Phi coprocessor, or the (Cray) YarcData supercomputer. However, the proposed modifications would, it is conjectured, also result in a more efficient Atomspace in the single-core, single-hardware-thread scenario.
At the moment this is a quite hypothetical proposal posted to spur discussion.
Two key ingredients referenced here are:
- Qthreads, a library for massive multithreaded software, which runs both on the (massively hardware multithreaded) Cray XMT and on standard commodity architectures (utilizing pthreads)
- MTGL, Multi-Threaded Graph Library, which is implemented using qthreads
In researching these ideas, some study was also put into GraphCT, an alternative to MTGL.
To quote the relevant webpages,
The qthreads API is designed to make using large numbers of threads convenient and easy, and to allow portable access to threading constructs used in massively parallel shared memory environments. The API maps well to both MTA-style threading and PIM-style threading, and is still quite useful in a standard SMP context. The qthreads API provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.
The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library for spawning and controlling coroutines: threads with small (4k) stacks. The threads are entirely in user-space and use their blocked/unblocked status as part of their scheduling. The library's metaphor is that there are many qthreads and several "shepherds". Shepherds can be thought of as a thread mobility domain; they map to specific processors or memory regions. Qthreads are assigned to specific shepherds and do not migrate unless directed to migrate.
The API includes utility functions for making threaded loops, sorting, and similar operations convenient.
The MultiThreaded Graph Library (MTGL) is a collection of algorithms and data structures designed to run on shared-memory platforms such as the massively multithreaded Cray XMT, or, with support from the Qthreads library, on Symmetric Multiprocessor (SMP) machines or multi-core workstations. The software and API is modeled after the Boost Graph Library, though the internals differ in order to leverage shared memory machines.
Summary of Suggested Approach
This section contains a fairly sketchy summary of the approach I’ve been musing about:
Don't Use Handles Except for Persistence and Distributed Processing
Re-frame OpenCog’s Handle system as something to be used for interfacing between an Atomspace on a local machine, and 1) persistent stores, 2) Atomspaces on other machines.
Don’t use Handles for reference into an Atomspace, for internal operations within a CogServer running on a single machine. Rather, Atoms within a single machine would have local IDs (like the ones currently created within MTGL). There would then be a table at the CogServer level, mapping Handles into local IDs.
Make an AtomTable via extending MTGL's adjacency list to hypergraphs
Build a hypergraph adjacency list which is similar to the simple adjacency-list graph container used in MTGL,
The change that is needed is to make this a generalized hypergraph (n-ary links, and links pointing to links) rather than a straightforward binary graph.
Associate attributes with Atoms (nodes/links in the hypergraph) using property maps, as in
Make some MindAgents use thread iterators for efficient use of multiple cores and hardware threads
For MindAgents or other OpenCog processes that need to iterate over a large swath of the Atomspace, rewrite the core loops using thread iterators, as in the “Thread Iterators” section in https://software.sandia.gov/trac/mtgl/wiki/GraphAPI
Wrap it all in a (tweaked?) Atomspace API
Wrap the above container in the Atomspace API, making any needed additions to the Atomspace API to support e.g. thread iterators
Improve MTGL's handling of mutable graphs as needed
As well as the extension to hypergraphs, it may be necessary to improve MTGL’s handling of dynamic graphs in some ways (the documentation indicates they have put more effort into optimizing it for static graphs that can be represented in their CSR format)
Tweak qthreads for efficient use of Intel Xeon Phi, thus making Xeon Phi into an efficient Atomspace coprocessor board
Qthreads runs on commodity hardware as well as the Cray XMT, but it seems it would be useful to tweak Qthreads for optimal operation on existing multicore hardware with hardware multithreading, e.g. Intel Xeon Phi