SpaceServer

From OpenCog
Jump to: navigation, search

The OpenCog embodiment system needs a subsystem that provides for spatial and temporal perception that can be integrated with th reasoning and natural language subsystems. For example, it is useful to be able to respond to queries such as "is Object A in front of, behind, or next to object B?", "Is object A far away?". There have been numerous attempts to describe and implement this subsystem; none have been particularly successful. The below presents the latest ideas on implementing this for the 2019-2021 time-frame.

Requirements

There are requirements for both a real-time, numerical interface, and a requirement for a natural-language, common-sense interface. Consider, for example, the concept of "distance". Is object A far away? A numerical answer would provide the distance to object A, in meters. A natural-language, common-sense version would answer "yes, it is far away, compared to what we were talking about just now." Or "yes, its far away, because I cannot walk to it in the next few minutes".

There is a need for a multi-body representation system. Thus, imagine that OpenCog is controlling a dozen robot bodies, simultaneously. The question "Is object A far away?" becomes a robot-body-relative question: "Is object A far away from robot body serial-number 42?". Thus, the location of "self" needs to become just another object in the space server.

Common-sense responses to spatial-relationship questions need to modulated not only for locomotion style ("can I walk to it?", "can I reach out and touch it?") but also for relative sizes: "Is spider X close to the window?" gets a different response than "Is spider X close to fly Y?" simply because windows are much much larger than spiders (the distance between spider and window is relative to the window size; the distance between sider and fly is relative to the size of the spider-web). Size-relative prepositional relations are pervasive: the answer to "Is X next to Y?" depends on the relative sizes of X and Y, and stops making sense when the relative size differences are too great ("Is this microbe next to the City of Chicago?")

The natural-language interfaces need to support the following:

  • All of the typical prepositional relations: next-to, in-front, behind, above, below, beyond, close-by, near, far.
  • The typical size-predicates: big, small, tiny, larger, smaller, tinier.
  • The typical containment prepositions: inside-of, outside-of, overlapping, touching, piercing (viz, long, narrow arrow-like object piercing roundish object), intertwined (viz. tangled up, caught in a web)
  • Time predicates: before, after, at-the-same-time, in-the-future, in-the-past.
  • Allen-interval time-predicates: overlaps-in-time, is-contained-in, etc.
  • Movement and speed predicates: coming, going, faster, slower, near-miss, collision-course.

The numerical interfaces need to support API calls similar to the above, but simplified and appropriate for pair-wise numerical relationships. Distances should be presented in meters, time in seconds. Distances and location might be presented in robot-body relative coordinates, venue-relative coordinates, or earth-centric coordinates (e.g. GPS coordinates)

Scalability, performance

All but one of the designs presented below are server-less, thus making them inherently distributed and scalable. A serverless design avoid lock contention and other bottlenecks for position updates. A server-based design might still be desirable for other reasons; right now, it is not known what those reasons might be.

API and Subsystem Architecture

There are several possible implementations, sharing a common API. Four are sketched below.

  • A ROS-based server, using OctoMap to track objects
  • A ROS-based system, avoiding OctoMap, or any server-bsed design at all.
  • A Tensorflow-based system, using neural-net algos for object position tracking.
  • A Tensorflow-based system, using neural-net algos for prepositional relations

ROS+OctoMap Space-Time Server

This assumes that basic sensory input is collected and processed by ROS, for example, from some SLAM system (Simultaneous Localization and Mapping) or other ROS subsystem that collects and maintains object identities and positions. The layering is as follows:

  • Sensor devices (video cameras, sonar, lidar, other)
  • ROS object-tracking and sensor-fusion system.
  • OpenCog Octree spacetime server. This server can either subscribe to ORS events, to get regular high-speed position updates (e.g. 50 times per second) or has the ability to query ROS for object locations, as needed. This SpaceTime server is optional (and in fact, its use is discouraged; see notes below).
  • AtomSpace Values attached to specific Atoms. The Atom is used as the object label, e.g. (ConceptNode "Gerardo's foot"). The associated Values track the 3D position of the object (Gerardo's foot), it's relative size, the coordinate system in use, and other numeric and number-related data. These Values work with the OpenCog Octree server to obtain current 3D positions, when queried.
  • AtomSpace PredicateNodes that implement prepositions. These PredicateNodes, when used with an EvaluationLink, will evaluate to yes/no truth-values, based on the the specific preposition (next-to, above, ...) and the specific 3D positions obtained from the object Values.
  • Natural language subsystem, which generates and evaluates such EvaluationLinks.

Implementation Example

The dataflow for the above system can be understood as follows. This example assumes that an Octomap server is being used. This Octomap server is not really needed; it mostly just gets in the way. An alternative design, without OctoMap, is given in a later section.

  • A chatbot (for example ghost) or any other natural-language processing subsystem is posed a question: "Is Gerardo's foot near the soccer ball?" As a result of processing this text, the following EvaluationLink is generated:
  EvaluationLink
      DefinedPredicateNode "is near"
      ListLink
          ConceptNode "Gerardo's foot"
          ConceptNode "soccer ball"
  • The NLP subsystem/chatbot evaluates the above EvaluationLink. This triggers the following predicate pseudo-code to run:
  is_near(Atom A, Atom B) :
       static location-key = PredicateNode("*-3D-location-key-*")
       static size-key = PredicateNode("*-size-key-*")
       loc-of-A = get_value(A, location-key)
       loc-of-B = get_value(B, location-key)
       size-of-A = get_value(A, size-key)
       size-of-B = get_value(B, size-key)
       if (size-of-A not comparable to size-of-B):
           return "unknown"
       
       # Here is the numerical calculation that provides the answer
       distance = (loc-of-B - loc-of-A) / size-of-A
       if (distance < 10):
           return "yes"
       else:
           return "no"
  • The EvaluationLink calls the above pseudo-code in a generic fashion:
   apply(Predicate P, Atom A, Atom B):
        rv = code-of-P(A, B)
        if (rv == "yes"):
             TV = SimpleTruthValue(1,1)
        if (rv == "no"):
             TV = SimpleTruthValue(0,1)
        if (rv == "unknown"):
             TV = SimpleTruthValue(0.5,0)
        this->set_truth_value(TV)
  • The NLP subsystem can then access the TruthValue on the EvaluationLink, and use that to formulate a verbal reply to the question.
  • Implementation tips: The above pseudocode is written in a pseudo-Python, pseudo-C++ style. Either are possible, and would be adequate. There is a third possibility and it might be the simplest: writing methods in Atomese. See the examples on the ValueOfLink page. These examples suggest that maybe all of the above can be accomplished without writing any Python/C++ code; just some basic Atomese might suffice. Maybe. This is unexplored. This is appealing, since it might be the simplest, least messy alternative. However, it won't work if the algos start getting big and complicated; complex algos require C++, scheme or python.
  • The get_value() method above is a method on the C++ class Atom. It accesses the value associated with a particular key. The location-value is managed by the OctoMapPositionValue, the C++ class given below. It will query the OctoMap SpaceTime server to obtain the current object position. It is derived from FloatValue, so as to provide a generic 3D vector API.
The object to track, and the point in time at which to track it, are supplied in the constructor. The location is returned by FloatValue::value() which calls PositionValue::update(). The below is C++ pseudocode; the actual implemented API is slightly different. Note: As of October 2018, the code below has been written and (minimally) tested. See opencog/spacetime/octomap.
// Derives from FloatValue
class PositionValue : public FloatValue {
   private:
      // Things that this instance of PositionValue has to remember
      OctoMap* _octomap;
      Handle _object;
      Handle _when;

   protected:
      // Update the position. Called automatically by FloatValue::value()
      // Throws exception, if object is not in octomap.
      void update() {
          _value = _octomap->get_position_at_time(_object, _when);
      }

   public:
      // This value needs to know what to track.
      FloatValue(Handle obj, Handle time_offset) {
          _object = obj;
          _when = time_offset;
          _octmap = ...;
      }  
};
  • Object tracking initialization. In order for an object to get tracked, the constructor for the PositionValue given above needs to be called. This needs to be done in Atomese, in response to other system events. This can be done by invoking the following:
EvaluationLink
    DefinedPredicate "octomap-object-tracking-value"
    ListLink
        Concept "Main OctoMap Server"
        Predicate "*-3D-location-key-*"
        Concept "Gerardo's foot"
        TimeNode "00:00:00"
The arguments need some explanation. The Concept "Main OctoMap Server" indicates which OctoMap server to use. There may be more than one. The Predicate "*-3D-location-key-*" indicates what key to use, for placing the PositionValue. This key-name needs to be "well-known", and shared in common by all systems interested in obtaining positions. The Concept "Gerardo's foot" is the object who's position is to be tracked. The TimeNode "00:00:00" indicateds the time offset for tracking. The value of zero means "right now" i.e. it is the "current time".
The result of evaluating the above EvaluationLink would be a PositionValue, fetching it's data from the indicated octomap server, anchored at the given key on the given object.
There probably needs to be another predicate that will disconnect and remove the object-tracking PositionValue.
  • Alternate implementation. It might be more convenient/easier to design a custom C++ class called OctoMapLink, which would be used as
  OctoMapLink
     Concept "Main OctoMap Server"
     Predicate "*-3D-location-key-*"
     Concept "Gerardo's foot"
     TimeNode "00:00:00"
This is effectively the same API as before, just implemented a bit differently according to the tastes of the developer, with the goal of keeping things simple and maintainable.
  • In the above examples, there were two PredicateNodes that had to be coupled to executable code (written in C++ or Python or scheme). These were PredicateNode "is near" and Predicate "octomap-object-tracking-value". In practice, it seems reasonable to implement these as GroundedPredicateNodes. In practice, it is also convenient to hide this implementation detail from the users. This can be done with the DefineLink; for example:
  DefineLink
     DefinedPredicateNode "is near"
     GroundedPredicateNode "py: is_near(Atom x, Atom y)"
This allows an early implementation to be written in python; later on, if that implementation is changed, or e.g. re-written in C++ or scheme, then only the DefineLink needs to be re-declared to get the new implementation. The DefineLink is working like an pluggable indirection.
  • The OctoMap above is either maintaining a 3D value for the object position, obtained by subscribing to ROS events, or it knows how to query ROS for the current location of that object. Pseudo-code:
   subscribe to ROS events
This is not the right place to explain how ROS works. Its not hard.

Implementation Status

As of May 2019, the implementation status is this:

  • Generation and evaluation of EvaluationLink's by Ghost and/or other NLP subsystem: Not Implemented. There is no detailed design.
  • Generic EvaluationLink+PredicateNode interop: Partly (mostly?) implemented Basic things like DefinedPredicateNodes and GroundedPredicateNodes have been implemented many many years ago. They work, they are heavily used in many other subsystems, and are functionally stable API's. The details for how these should be used for implementing the prepositions for this specific project is up for discussion; there are multiple possibilities. Some possibilities have not been (fully) explored.
  • Preposition PredicateNodes: Not Implemented.
  • Octomap server: Code complete as of 2016. More unit tests are needed. A general design review is needed. See opencog/spacetime
  • ROS-to-Octomap driver: Not Implemented.

ROS-based serverless design

The OctoMap server is not actually needed for anything; it does not really do anything useful for OpenCog or the AtomSpace -- it mostly burns CPU cycles and gets in the way. One can get 3D positions directly from ROS. There is no need for a server to get these. This has several important design advantages, including the elimination of lock contention, a simpler, smaller design, and a more flexible architecture that simplifies development and maintenance. The basic idea is this: let Values do all the work. Overall, the design is just like the above, except that there's no OctoMap.

Motivation for a serverless design

Values are designed to hold fleeting, time-varying data. By storing a 3D position with the particular Atom, we don't have to worry about lock-contention or scalability, viz how to track the location of a million objects, without over-burdening a big, monolithic server. If one has the handle to an Atom, one can update that atom without burdening any other part of the system.

Storing values per-Atom also tremendously simplifies relational queries: to discover if object A is next to object B, one needs only to formulate localized queries about objects A and B, rather than issuing a query to some master-clearing-house, which then wastes time and effort trying to find where A and B are.

Storing values per-atom also simplifies having multiple, competing implementations of relational queries: instead of having to create a single, monolithic, does-everything server, one can have specialized components that work independently of one-another; e.g one specializing in size-relations, another specializing in time-ordering, a third specializing in distance relations. All this, without having to have three competing servers, all three of which are trying to keep 3D coordinates for one object.

Values also allow data-streaming. A simple FloatValue is just an array of floating-point values, and is suitable for storing xyz and time data. However, there is no reason whatsoever to constantly pound the latest position-data from some position-data-source into the Value. Instead, one could write a custom C++ object, say, a PositionValue object that inherits from FloatValue, and only fetches, updates and returns a position when it is queried; otherwise, it can sit quiesciently, not burning up any CPU cycles, at all.

Implementation Example

This is very nearly like the earlier example. Notice, however, the direct access to ROS:

// Derives from FloatValue
class PositionValue : public FloatValue {
   private:
      // Things that this instance of PositionValue has to remember
      ROS_commo* _ros_api;
      Handle _object;
      Handle _when;

   protected:
      // Update the position. Called automatically by FloatValue::value()
      // Throws exception, if object is not in octomap.
      void update() {
          _value = _ros_api->get_position_at_time(_object, _when);
      }

   public:
      // This value needs to know what to track.
      FloatValue(Handle obj, Handle time_offset) {
          _object = obj;
          _when = time_offset;
          _ros_api = ...;
      }  
};

The pseudocode here is that class ROS_commo knows how to implement the ROS_commo::get_position_at_time() method, which basically just sends the right kind of messages to the correct ROS server, and gets back the needed 3D data.

All other parts of the system are the same as before.

Implementation Status

As of May 2019, the above has not been implemented.

Tensorflow-based position tracking

Like the ROS serverless design, but using tensorflow as a source for object positions.

Implementation Example

Same as the ROS serverless design. More details and pseudocode need to be developed.

Implementation Status

Not implemented.

Tensorflow-based prepositional relations

Unlike any of the other designs, this one bypasses location tracking entirely, and instead directly offers prepositional relationships. The biggest problem here is that there are several hundred possible prepositional relationships. so getting good coverage might be difficult.

Implementation Example

TBD.

Implementation Status

Partly implemented. Ask Vitaly Bogdanov for details.

Older versions

Documentation describing older and obsolete versions can be found here: