Event boundary recognition
This page records the results of a “thought experiment” oriented toward figuring out how to identify EVENT BOUNDARIES in quantitative data representing real-world happenings… and how to do this using a methodology in which
- program control structures and functions are defined in Atomspace
- number-crunching is done outside Atomspace, invoked via GSNs
The particular case studied is that of recognizing the boundaries delimiting a coherent “facial expression”, e.g. the event of a “smile” or a “shocked expression.” The input assumed is: the continuous current values going into a set of motors controlling a robot head (the robot head making the facial expression). The use case is one where humans are making facial expressions into a camera, captured by software such as Faceshift, which then observes the movements of control points on the person’s face — which are then mapped into the current values that would be used to control the robot’s face, to make it imitate those movements. The point is that this gives continuous current values for the robot’s face motors, and doesn’t chop up the stream of face movements or current-value vectors into discrete “events” representing coherent facial expressions.
This particular use case is both a) of near-term practical interest, and b) a specific instance that may be used to elucidate some more general issues…
Regarding event boundary recognition, the paradigm I will assume is: the boundaries of an event are discontinuities of predictability (or equivalently, of surprisingness). Some exploration of the math of measuring surprisingness has been done, and implemented in the Pattern_Miner. But on this page we will not dig into those issues, and will focus more on using event boundary recognition as a case-in-point for exploring representational and operational issues related to OpenCog.
I will also assume some not-yet-implemented mechanisms in OpenCog for representing and manipulating quantitative data — i.e. ArrayNode and related mechanisms as discussed on Outsourcing_Array_Operations_From_OpenCog (and see also Deep_Learning_in_OpenCog:_Proposal).
PHILOSOPHY OF EVENT BOUNDARIES
It seems that the beginning and ending of an event is often characterized by a "discontinuity of predictability"... e.g.
"People perceive and conceive of activity in terms of discrete events. Here we propose a theory according to which the perception of boundaries between events arises from ongoing perceptual processing and regulates attention and memory. Perceptual systems continuously make predictions about what will happen next. When transient errors in predictions arise, an event boundary is perceived. "
(there is also bunch of more recent literature)
So if we accept this as a conceptual methodology, then the problem is reduced to the issue of estimating the predictability of a multidimensional time series...
SOME ATOMESE REPRESENTATION ISSUES
Before getting back to the math of identifying discontinuities in predictability, I want to elaborate how this kind of math would be wrapped up in the Atomspace, for integration into a robot perception and control pipeline.
First of all, we need to create Atoms to represent the motors used to control a robot’s face
Eventually an OpenCog system may be able to learn the characteristics of the robot body it’s connected to. But for now, we can assume there is a hand-built representation, in the Atomspace, of the body of any robot the system needs to control.
We can have a ConceptNode for each body part that the system has data about, or control over… e.g. for the Hanson Robot head stuff like
ConceptNode “face motor 1” ConceptNode “face motor 2” ConceptNode “neck motor 1”
We can build links such as
InheritanceLink ConceptNode “face motor 1” ConceptNode “motor” EvaluationLink PredicateNode “part of” ListLink ConceptNode “face motor 1” ConceptNode “Sophia robot body”
Given a time interval, we should be able to get the time series giving the current going into that motor over that interval, e.g.
ExecutionOutputLink GroundedSchemaNode “py:getCurrentSeries” ListLink ConceptNode “face motor 3” TimeNode $T NumberNode $N
would output an ArrayNode indicating a 1D array containing the current going into face motor 3, during the time interval $T, measured at periods of $N seconds (so e.g. if $N=1/100, and $T had duration 5 seconds, then the array in the ArrayNode would have 500 entries).
Now, here is some Atomese I would like to be able to write:
DefineLink DefinedSchemaNode “getRobotCurrentSeries” LambdaLink VariableList TypedVariableLink VariableNode "$T" TypeNode "TimeNode" TypedVariableLink VariableNode "$N" TypeNode "NumberNode" ExecutionOutputLink GroundedSchemaNode “py:accumulate_matrix” ExecutionOutputLink GroundedSchemaNode “map” ListLink LambdaLink VariableNode "$X" ExecutionOutputLink GroundedSchemaNode “py:getCurrentSeries” ListLink VariableNode "$X" VariableNode "$T" VariableNode "$N" DefinedSchemaNode “Sophia robot motors”
where DefinedSchemaNode “Sophia robot motors” corresponds to a ListLink using DefinedSchema as a constant.
This works if the list of motors is hardwired or only filled up upon initialization. In case it changes, which shouldn't be often, it needs to be rebuilt. For that one may use a StateLink instead of a DefineLink to define “Sophia robot motors” and invoke the pattern matcher to build a ListLink consisting of every motor in the Atomspace that is recorded as being a part of the “Sophia robot”….
Notice some other features (which wouldn’t work currently) used in the above…
Firstly, what “py:accumulate_matrix” is doing is taking the ArrayNodes in the ListLink output by this ExecutionOutputLink, and merging them into a single ArrayNode. The thing I want to point out here is that the ArrayNodes in the ListLink never need to make their way into the Atomspace. They need to be created in the course of the pattern matching process, but then they can be thrown out immediately afterwards, since they are only used as interim structures…
Then, we are using the “map” construct to map a Schema over a list…
INVOKING EVENT BOUNDARY RECOGNITION CODE VIA GSN’S
And so, assuming all this worked —
The result of running “getFaceCurrentSeries” will be a matrix with N rows, where N is the number of motors in the robot, and the row corresponding to a motor indicates the current values for that motor at a number of discrete time-points.
The conceptual approach to event boundary recognition to be followed is, then:
- At each point in time (each column in the matrix), calculate the predictability of the values in that column, relative to the K previous columns
- At each point in time (each column in the matrix), calculate the degree to which the predictability at that point in time, constitutes a big change in predictability s relative to the previous K1 columns
- Create event beginning and ending boundaries where there is a big change -- a "significant discontinuity" in predictability
There is plenty to be tuned in getting an algorithm like the above to work, obviously... But here I am just trying to outline a generally workable framework.
Given all this, representationally, we can do event boundary recognition via something like
DefineLink DefinedSchemaNode “getRobotCurrentSeriesPredictability” LambdaLink ListLink TypedVariableLink VariableNode "$T" TypeNode "TimeNode" TypedVariableLink VariableNode "$N" TypeNode "NumberNode" ExecutionOutputLink GroundedSchemaNode “py:getPredictability” ExecutionOutputLink DefinedSchemaNode “getRobotCurrentSeries” ListLink VariableNode "$T" VariableNode "$N"
which will then calculate the Predictability value for the robot current series during a certain interval.
We can then ask whether there has been a sudden increase in Predictability, which would indicate the boundary of an interval. We can do this, for instance, by creating a DefinedSchemaNode that will build a time series of Predictability values, i.e. something like
DefineLink DefinedSchemaNode “getRobotCurrentSeriesPredictabilityDiscontinuitySeries” LambdaLink ListLink TypedVariableLink VariableNode "$T" TypeNode "TimeNode" TypedVariableLink VariableNode "$N" TypeNode "NumberNode" ExecutionOutputLink GroundedSchemaNode “py:calculateDiscontinuities” ExecutionOutputLink DefinedSchemaNode “getRobotCurrentSeriesPredictability” ListLink VariableNode "$T" VariableNode "$N" DefineLink DefinedSchemaNode “createEvents” LambdaLink ListLink TypedVariableLink VariableNode "$T" TypeNode "TimeNode" TypedVariableLink VariableNode "$N" TypeNode "NumberNode" ExecutionOutputLink GroundedSchemaNode "py:createEventFromDiscontinuity" ExecutionOutputLink DefinedSchemaNode “getRobotCurrentSeriesPredictabilityDiscontinuitySeries” ListLink VariableNode "$T" VariableNode "$N"
where "py:createEventFromDiscontinuity" would create start and end events when encountering discontinuities. For instance py:calculateDiscontinuities could return 1 when predictability suddenly rises, and -1 when predictability suddenly falls, and 0 otherwise. py:createEventFromDiscontinuity would use that to turn 1 into an event start, and -1 into an event end.
As intended, in this approach, the actual number-crunching work is doing in some outsourced python-or-whatever code accessed via GroundedSchemaNodes. SchemaNodes are used to pass around array values, thus using OpenCog sorta like one uses python when using a library like Theano or Scipy or Pylearn2 or whatever…
THE ACTUAL MATH OF EVENT BOUNDARY RECOGNITION
Finally, how can we address the actual number-crunching problem at the heart of the above? -- how to assess the predictability of a multivariate time series?
That is: In terms of the above sketchy Atomese code, what actually goes inside the proposed function
? We need some method to measure how predictable is a (multivariate) time series, over a certain period of time.
For this, I think one good option is to experiment with multidimensional multivariate permutation entropy, which has been used to assess predictability of biological time series
Some systematic evidence that 1-D 1-time-scale permutation entropy is useful for assessing predictability of time series is here:
(It's clear conceptually that *some* kind of entropy method is gonna be the way to assess predictability of a time series. The work done in the above and other related papers, suggests that permutation entropy is one tractable approach. Basically by looking at ordinal information only, a lot of noise is thrown out...)
So, in this approach, one uses discontinuities in predictability to identify temporal boundaries of events. The occurrences between these boundaries are discrete events.
INCORPORATING MORE ABSTRACT PROPERTIES
The quantitative method suggested above (multivariate permutation entropy) is probably suitable for identifying event boundaries based on lower-level perceptual data.
On the other hand, when event boundaries need to be recognized based on more cognitive properties, then we will need the Pattern_Miner or something similar to help estimate surprisingness/predictability.... So the simple call to py:getPredictability in the above would need to be replaced by code that combines lower-level estimates of predictability (via multivariate permutation entropy or similar) with higher-level cognitive estimates of predictability (via the Pattern Miner or similar).
Suppose the above works, and we can recognize the beginning and ending of events with reasonable accuracy.
This just leads us to other problems! ... such as, how to we identify what is happening in these events? I.e., we may want to do unsupervised or supervised classification on the discrete events recognized...
Hypothetically: I propose that the *ordinal series* used in the multivariate permutation entropy may be useful features for unsupervised classification/clustering of events. The fact that this "trick" provides a kind of de-noising that's useful for assessing predictability, suggests to me that it may also provide a kind of de-noising that's useful for event clustering (or supervised classification for that matter). Perhaps associating an identified event with a set of ordinal time series, computed on multiple scales, will provide a good feature structure for simple "perceptual level" event classification/clustering. But this becomes a further research problem, obviously... an interesting one...