GSoC 2009: Distributed and Persistent AtomSpace

From OpenCog
Jump to: navigation, search

This page describes the GSOC 2009 project. The current Distributed Opencog architecture is deescribed here: Distributed AtomSpace.

Note: The system designed below never worked correctly, due to a design flaw/misunderstanding -- the incoming set must not be stored as a table; it must be computed. The flaw should not be hard to fix. The code is here.

Abstract

I will be using a BigTable to create persistent storage for AtomTable. This will first require linking to the BigTable to implement a simple save/load API for Atom Handles. I will then use that implementation to provide just-in-time data persistence for AtomTable. That is, AtomTable will be modified to, in conjunction with the ECAN system, maintain in memory only those atoms which are currently most important. This will relegate the AtomTable into an AtomCache of the larger, persistent BigTable.


Please see Storing the AtomSpace in HyperTable

June 16, 2009 Update

After circumstances dictated that I install hypertable three times on two OS's, I've come up with a list of a few issues to look out for when installing it on your machine. First, a few provided resources for installation: the README file in the download or online, instruction pages here and here. Be warned that these contain some old information (particularly the last one). Now, onto the issues.

Thrift Installation Compilation succeeds but does not produce a libthriftnb.so file (you won't notice this until you test hypertable, unless you manually check the output):

-Install libevent if you don't already have it and reconfigure thrift with

   ./configure --with-libevent=<path-to-libevent>

Compilation fails, complaining libboost_system-mt.so cannot be found.

-After confirming that your boost library is up-to-date (which it should be anyway for opencog), run cmake again with the argument

    ./configure --with-boost=<path-to-boost>

You may need one, both, or neither of these depending on your system. Of course, if you need both, you can just string them together:

    ./configure --with-boost=<path> --with-libevent=<path>
   

Hypertable Installation Compilation fails, complaining several boost functions are undefined. -Run cmake again with the argument:

    DBoost_INCLUDE_DIR=<path_to_boost_header_files>

-Alternatively, use ccmake to change the boost include directory interactively

Testing The first thing to do before making alltests is to run

  stop-servers.sh 

and

  start-all-servers.sh local

from the bin/ in the installation directory. If start-all-servers gives the output

  Waiting for Hyperspace to come up ... 
  Waiting for Hyperspace to come up ... 
  Waiting for Hyperspace to come up ... 
  ERROR: Hyperspace did not come up

and likewise for MasterServer, RangeServer, and ThriftBroker, you need to specify the shared library directory:

   echo $prefix/$version/lib' | \
       sudo tee /etc/ld.so.conf.d/hypertable.conf
   sudo /sbin/ldconfig

Stop the servers again and restart them. Once they start correctly, run the regression tests.

Ubuntu 9.04 note This may affect other operating systems as well, but on 9.04 I was unable to start the ThriftBroker server. I would get the warning "stack smashing detected" and the process would not be allowed to finish. The only solution I found was to install the thrift snapshot hosted at the apache incubator rather than the one hosted at hypertable. As far as I can tell, this is a problem with an older version of thrift being used with a newer version of the standard C library.


Project: Next Step Continue integrating hypertable: after having done it manually, write code that will save/load a single atom to/from a hypertable.


June 23 Update

I spent most of this week looking through/working with the Hypertable API and we're very close to being able to do interesting things with it -- fully implement BackingStore.h, run benchmarks, fit hypertable into the big picture of the rest of the framework -- but there are still some integration issues to work out. As smoothly as things have gone through the command line, my C++ save/load tests have been turning up some cryptic error messages from the hypertable codebase. Barring any major revalations while I'm asleep, I will likely need to get some help from the people at the hypertable-users mailing list so I can get it running soon.


Next steps:
-Get the save/load tests to run and pass
-Implement the rest of BackingStore.h
-Tidy up my code, start maintaining a branch
-Grab a stopwatch: From what I've read, we can expect to save at a rate of ~25k atoms per second.


July 3 Update

Hypertable persistence is now functioning! The code can be retrieved from its bzr branch at

  lp:~jeremy-schlatter/opencog/hypertable

This code includes implementation of all of the functions in BackingStore.h and supports storage and retrieval of importance and truth values. Importantly, however, only getAtom(handle) and storeAtom(handle) have been tested. Confirming that the other functions actually work is my first priority. After that I will probably be ready to start running benchmarks and figuring out how to integrate hypertable storage with the rest of opencog. Both of these steps bring into consideration the plan to make this storage distributed. Even though we don't have online hypertable servers available yet, I may need to start working with Hypertable's Hadoop client.