This page originally written by Ben Goertzel, Jan 3, 2011
This page semi-formally gives some ideas related to spatial and spatiotemporal relationships. Not all are appropriate for immediate implementation at the time of writing. The document is written specifically in the context of the OpenCog HK project and its 3D Minecraft-like game world, though many of the ideas are more broadly applicable.
By an “entity” here I’ll mean a “physical object” (e.g. in the game world ), which may be an object created in 3dsMax or similar and imported into the game world, or may be a block, or may be a construct from blocks (like a wall or a step, or a house, or the wall or the roof of a house, or a table, or a whole staircase, etc. etc.).
Any entity may have many different properties, and ultimately the system should learn new properties. For now I’ll just suggest some simple properties and some others that may be derived from them…
Some elementary properties, that we may want to supply for starters as “canned” procedures:
- Height (extent from highest to lowest part)
- Width (i.e. diameter)
- Extent (i.e. volume of smallest 3D rectangular parallelopiped that contains the entity) … yeah this is a bit crude…
- Any others?
Some other “secondary” properties that I think we can express as logical combinations of the elementary ones:
- Tall (meaning higher than other stuff in the same context)
- Towering (tall and narrow)
- Squat (short and wide)
- Hulking (tall and wide)
- Small (short and narrow)
- Any others?
I’m thinking that these secondary properties can be defined explicitly in the Atomspace, rather than via canned schema. The conceptual idea here is that
- we’re supplying the system with the primary properties as “preprogrammed intuitions” for the time being
- regarding the secondary properties, we’re supplying them as mutable content, which the system can then adapt based on its experience using reasoning (though initially it can use them in their provided form
In this approach, for example, Towering becomes something like
EquivalenceLink EvaluationLink Towering X ANDLink EvaluationLink Tall X EvaluationLink Narrow X
or properly contextualized,
EquivalenceLink ContextLink EvaluationLink Towering X C ANDLink ContextLink EvaluationLink Tall X C ContextLink EvaluationLink Narrow X C
Tall becomes something like
EquivalenceLink EvaluationLink Tall X AverageQuantifier Y: EvaluationLink GreaterThan ExOutLink Height X ExOutLink Height Y
which is most meaningful in contextualized form,
EquivalenceLink ContextLink EvaluationLink Tall X C AverageQuantifier Y: ContextLink EvaluationLink GreaterThan ExOutLink Height X ExOutLink Height Y C
Prepositional Spatiotemporal Relationships
Next, what about relationships between entities?
One reasonable way to start thinking about spatial, temporal and spatiotemporal relationships is to look at common prepositions (ideally cross-linguistically, but it turns out these are not so language-dependent, so looking only at English isn’t a large loss). For interest, a relatively complete list of English prepositions is here] and a shorter list of prepositional relations that appear likely applicable in the present context is:
across, against, along, amidst, among, apart, around, behind, beside, between, in front of, inside, into, near, next to, out of, outside, through, together, toward
above, atop, below, beneath, over, under
after, before, during, since, until
A couple others that seem useful for handling English conversation are:
- My_left (the left part of the agent’s body)
- My_right (the right part of the agent’s body)
- My_front (the front part of the agent’s body)
- My_back (the back part of the agent’s body)
Similar to with entity properties as considered above, it seems that a few relationships may be defined as primary, and the rest considered secondary defined in terms of them. More on this later!
Relationships Between Parts of Entities
Some important relationships between an entity and the sub-entities that are parts of the entity are as follows:
It seems that many of these can be treated as secondary, and defined simply in terms of others.
For example, Top_of might be roughly, initially defined as
EquivalenceLink Evaluation Top_of T X AndLink EvaluationLink Part_of T X ImplicationLink AndLink NotLink SimilarityLink S T EvaluationLink Part_of S X EvaluationLink Below S T
It must be emphasized that definitions like these are absolutely not intended to constitute
- Full definitions of the human concepts or natural language words referred to (e.g. “top”)
- Full, richly featured concepts suitable for an advanced AGI’s conceptualization
Rather, they are intended as initial conditions for learning, so that the AGI system can build its own concepts on top of them, eventually using its experience to capture many of the subtleties implicit in the commonsense human understanding of related concepts, and some subtleties of its own invention as well.
The right way to define these relations in terms of primitive concepts won’t always be obvious; and there may be many routes. For example, one could define “interior_of” X as meaning “part_of X and NOT next_to Y unless X is also part_of X”; and “boundary_of” of X correspondingly…
Mathematical Spatiotemporal Relationships
The material in this section is probably not needed for near-term implementation in the OpenCog HK project at time of writing, but is included here for interest and future applicability.
Another approach to spatiotemporal relationships is founded in mathematics rather than linguistics. There are well-known mathematical systems for representing such relationships, which are elegant and easy to work with computationally. These systems tend to ignore uncertainty, but in my prior work I have extended them to include both fuzzy and probabilistic uncertainty.
As a side note, the relationship between these mathematical systems and human cognition is unclear to me. Clearly humans can handle all such relationships both implicitly and explicitly, but some may be much easier for us than others, whereas these relative differences in difficulty are not apparent from the mathematics.
Allen’s Interval Algebra
The most traditional and well-developed theoretical framework for systematizing the qualitative relationships between time-intervals is Allen’s Interval Algebra, or simply Interval Algebra (IA). IA, in its original and simplest form, models temporal events as intervals, that is, processes that have a beginning and an end in time. Based on that, thirteen temporal relations may exist between any given pair of events. The following figure illustrates those relations with a simple bar representation for intervals:
There is a calculus that defines possible relations between time intervals and provides a composition table that can be used as a basis for reasoning about temporal descriptions of events. Note that in the AI approach, time intervals are not necessarily represented by precise endpoints, but rather by their relationships.
Region Connection Calculus
The most direct analogue of IA for space instead of time is the Region Connection Calculus (RCC). The basic entities modeled by RCC are regions (as opposed to points), assumed to be embedded in the same space of some dimensionality (e.g. 2D in the most commonly considered case). The basic relations in RCC are defined in terms of connection or inclusion between two regions of the analyzed space. The number of possible relations varies according to the “granularity” of the RCC variety being used. The “coarser” variety, RCC8, is thus called due to the set of just 8 primitive relations that define it:
- disconnected (DC)
- externally connected (EC)
- equal (EQ)
- partially overlapping (PO)
- tangential proper part (TPP)
- tangential proper part inverse (TPPi)
- non-tangential proper part (NTPP)
- non-tangential proper part inverse (NTPPi)
which are depicted in the following figure:
From these basic relations, combinations can be built. For example, proper part (PP) is the union of TPP and NTPP.
As an example, here is a network illustrating some RCC relationships between some regions in a fictitious city:
A set of RCC relations stating spatial facts about a group of entities can be used to infer new spatial facts. For instance, if A is a non-tangential proper part of B and B is a non-tangential proper part of C, then A is a non-tangential proper part of C. If on the other hand A is a tangential proper part of B and B is a tangential proper part of C, then A can be either a non-tangential or a tangential proper part of C. Or, showing the same inferences predicate-wise:
nonTangentialProperPart(A,B) ^ nonTangentialProperPart(B,C) ⇒ nonTangentialProperPart(A,C) tangentialProperPart(A,B) ^ tangentialProperPart(B,C) ⇒ tangentialProperPart(A,C) XOR nonTangentialProperPart(A,C)
RCC8 comes with a complete composition table embodying all such possible inferences.
There are also extensions of RCC enabling it to deal with more complex situations, such as for example RCC23 handles convex regions. We suspect that RCC8 may be adequate for the initial Icarus test problems, but it’s also possible some benefit could be achieved from RCC23, in spite of the added complexity.
Finally, the above relationships are defined for 2D, but all of them also apply to 3D perfectly well. There are some recent publications on RCC-3D that pursue this in detail. (Julia Albath, Jennifer L. Leopold, Chaman L. Sabharwal and Kenneth Perry (2010). Efficient Reasoning with RCC-3D. Lecture Notes in Computer Science, 2010, Volume 6291/2010, 470-481 )
Uncertain Spatial and Temporal Algebra
An issue with RCC and IA as traditionally formulated is that they are crisp, i.e. they don’t admit any uncertainty. They don’t let you talk about one interval being almost after another one, nor about one region being almost next to another one. In a forthcoming book entitled Real World Reasoning (and much more briefly in a paper published in the proceedings of the AGI-10 conference), my colleagues and I have explained how to add fuzzy and probabilistic uncertainty to the RCC and IA relationships, thus obtaining an uncertain calculus of spatial and temporal relationships.
It would seem that in order to get good use of IA and (especially) RCC for the OpenCog project, implementation of uncertain versions would be necessary. It would be fairly straightforward to implement this mathematics in the OpenCog system, but this work hasn’t yet been done.
Parse Graphs and AND-OR Graphs for Object Representation
Another interesting question is how to represent complex objects in terms of primitive objects. Some conceptual guidance here is given by the theory of image grammars, a conceptually interesting (though not yet that practically successful) approach to image understanding. The following figures are drawn from Zhu and Mumford’s paper on stochastic image grammars (Song-Chun Zhu and David Mumford (2007). A Stochastic Grammar of Images, Foundations and Trends⃝R in Computer Graphics and Vision Vol. 2, No. 4 (2006) 259–362). The first ones illustrate “parse graphs” for a picture involving a cat and mouse; and for a car. The next one illustrates two parse graphs for clocks. The next illustrates an AND-OR graph for a clock, which incorporates both of the parse graphs into a single graph. Commentary follows.
To the reader familiar with OpenCog’s knowledge representation, it should be obvious that these parse and AND/OR trees can straightforwardly be represented in terms of ANDLinks, ORLinks and part_of relations.
Of course the hard part of image grammar research is actually abstracting the image grammars from images. But in a game-world context we don’t need to mess with this “hard part” – we just need to decompose entities defined within the game world into other entities defined within the game world. This still isn’t trivial but it’s a lot easier.
Identifying Significant Entities and Parts
How, then, do we decompose an entity in the game world into component parts? How do we build Atom structures corresponding to parse trees or AND/OR trees of entities?
In the case of entities imported from 3dsMax or similar, the decomposition basically has to be human-defined. Otherwise one would have to do image processing on the screen renderings of the objects.
The interesting case is entities composed of blocks, or combinations of blocks and other imported constructs.
Some criteria that seem relevant are:
- A combination of blocks that is similar to other combinations seen elsewhere, may be worth isolating as a distinct entity (these might be identified via Fishgram, for example)
- A combination of blocks that tends to move as a unit, may be worth isolating as a distinct entity
- A combination of blocks that form a connected region, and are all the same color (whereas the other blocks adjacent to them are different colors), may be worth isolating as a distinct entity
- A combination of blocks that has a pattern of arrangement distinct from the others near it (e.g. all lined up in a row whereas the others are messy; ascending steadily whereas the others vary unpredictably in height; etc.), may be worth isolating as a distinct entity
The easiest compact way to summarize these various cases (and others) seems to be: If the system’s mind identifies a simple predicate P so that (either always or during certain intervals of time):
- P is highly true for all the blocks in group G
- P is significantly less true for the other blocks nearby G
then maybe it makes sense to consider G an entity, and distinguish it via an Atom (probably a SpecificEntityNode).
The question then becomes what predicates P the system is oriented to look for. Predicates related to some of the above examples would be:
- “has color R”
- Is higher than the adjacent block to the left, and lower than the adjacent block to the right
- Is moving