BehaviorSimilarityAlgorithm (Embodiment)

From OpenCog

Problem definition (Ben's)

Define a behavior description as follows:

A set of pairs of the form (P,T), where

  • P is a token, drawn from a fixed finite set of possibilities [the letter P is because the tokens are really predicates...]
  • T is a time-interval

Time-intervals may be considered as relative, where the starting-point of the first time-interval in the behavior description is assumed to be time 0.

Tokens may have similarities between them; so we may assume an inter-token similarity measure.

Note that different tokens may correspond to overlapping time intervals (i.e. there may be multiple tokens going on at the same time).

The problem we have is then as follows.

We have a collection of exemplar behavior descriptions B1, B2, ..., Bn

And, we have a candidate behavior description B

We then want to calculate, essentially, the odds that B belongs to the category exemplified by B1, ... , Bn

One idea I had was to customize dynamic programming to make a pairwise similarity measurement between B and each of the Bj. Then one can use that to assess the required odds, e.g. most simply by finding the minimum of

distance(B, Bj)

for any i.

However, I did not think much about exactly how to customize dyn prog to this problem.

Some more comments on requirements...

For PB 1.0, the behavior descriptions will probably not be very large: we're talking about perhaps a few dozen elements in each behavior description, or less. On the other hand, later on we may work with more complex behavior descriptions, so it certainly wouldn't hurt to have a scalable approach (but it's not an absolute requirement).

White-hot speed is not critical for PB 1.0, because we will be initially using this to assess behavior descriptions produced in TL against exemplars produced by the pet owner, and I expect that executing a schema in TL is going to be slower than doing the similarity matching.

On the other hand, if we try fitness estimations besides TL, then we will want to match a behavior description against a large database of behavior descriptions, in which case speed will be critical. So like scalability to large behavior descriptions, speed will be important later on, but is not totally critical for the initial implementation.

Input data manipulation

The first point is not the algorithm itself but how we should manipulate input data to compute similarity measures that makes sense.

I'd suggest the following:

(1) For each behavior in the set of behaviors, sort the events (define an event as each pair (P,T) in a behavior description) regarding the start point of T. Lets assume a concrete example (after the sort):

Bj = {(p0,[t0, t2]), (p1, [t1, t2]), (p2, [t1,t4]), (p3, [t1, t3]), (p4, [t3, t4])}

(2) Use the sorted events to build a timeline like this:

   S0      S1      S2        S3
+-----==----+----------+---------+==
|        |    |          |         |
t0       t1   t2         t3        t4

S0 = {p0}
S1 = {p0, p1, p2, p3}
S2 = {p2, p3}
S3 = {p2, p4}

The timeline may be represented as a set {(S0, T0), (S1, T1), (S2, T2), (S3, T3)}
where Ti = ti+1 - ti.

(3) We build a timeline like this above for each behavior Bj in the set of behaviors.

At this point we may use different algorithms to manipulate the sets Si, either in an algorithm like the Ben suggested (computing similarity pairwise and using this to compute the odds that the behavior being considered (B) belongs to the category exemplified by B0, B1, ...) or in another one not based in pairwise similarity.

Assumptions regarding behavior descriptions

  • Interval lengths are usually relevant but there is no way to know a priori how relevant they are. This can only be determined once we have a set of behavior descriptions for the same behavior. Eg we don't know whether the difference between (p0,[0, 1]) (length = 1) and (p0,[0, 2]) (length = 2) is relevant or not without considering all the other behaviors in the list.
  • The sets Si which appear in a behavior description are obviously relevant but there is no way to determine how bad mismatches are unless we consider the whole set of behavior descriptions.
  • The order in which sets Si appear in a behavior description may or may not be relevant. That will also depend on the whole set of behavior description. E.g. if we have behavior descriptions for dancing, a behavior like "one-step-left than one-step-right" may be just as good as "one-step-right than one-step-left"

In other words: we can't define a good similarity measure to evaluate a pair of BDs (Bj, B) without considering the context, ie, all the other BDs in the category.

Proposed algorithm

General idea behind the algorithm

Given a collection of exemplar behavior descriptions (BD) B1, B2, ..., Bn the idea is to extract which features are more relevant amongst the Bj's to determine if a candidate behavior description B belongs to the category represented by B1,B2,...Bn.

Note that this algorithm is not based in pairwise similarity, i.e., it will not do something like iterating over Bj's trying to find the Bj which is most similar to B. The idea is to use the collection of BDs to extract information in such a way that we can determine a number which tells the odds that B belongs to the category represented by the BDs.

My first approach is to use three different features:

  1 The sets Si which appear on each BD timeline
  2 Relative order among the various Si on each BD
  3 Time length of each Si in BD timeline

Extracting features information

Let a behavior descriptor (BD) be represented as the timelines described above.

Let R be a set with all the sets Si which appear in all BDs of a given category: R = {S0, S1, ... Sm}.

Let COUNT(Si) be the number of occurrences of set Si in all BDs of the category.

  • 1 - The sets Si which appear on each BD*

For each Si in R, compute P(Si,v) = C(Si) / n, where C(Si) is the number of BDs which contains v instances of Si and n is the number of BDs in category (compute for all relevant v).

  • 2 The relative order among the various Si on each BD*

For each Bj compute a Matrix Mj as follows:

Mj is m X m indexed by the elements of R and initialized with zeros

Let Q be a vector with all sets in Bj, in the order they appear in Bj, with repetition
For q = 1 to size_of(Q) do
    For k = (q-1) to 0 do
        increment Mj(Q(q), Q(k)) by 1
    end
end

Then compute the matrix M by summing all Mj and dividing by n (M is an "average" matrix of all Mj).

  • 3 The time length of each Si in BDs*

For each Si in R, compute (L(Si), U(Si)) the lower and upper limits for the time length of Si in all BDs. Ie, L(Si) is the lowest time length amongst all occurrences of Si and U(Si) is the highest, considering all BDs in the category.

Measuring the relevance of each feature

We can now compute measures to evaluate how relevant each of these features is to the given category. Ie, given the variety of exemplars, how relevant is the set of Si in its BDs, the order in which these Si appear and their duration.

First, the relevance W1 of sets Si in each BD.

sum2 = 0
For each Bj do
    sum1 = 0
    For each different Si of Bj do
        compute v' = number of occurrences of Si in Bj
        sum1 += P(Si, v')
    end
    sum2 += (sum1 / size_of(Bj))
end

W1 = sum2 / n

Then the relevance W2 of the order in which the various Si appear.

sum = 0
For each Bj do
    Compute Mj as described above
    Compute Mj' = abs(M - Mj) // Mj'(i,j) = abs(M(i,j) - Mj(i,j))
    DMAX = (size_of(Bj) * (size_of(Bj - 1)) / 2
    sum += (sum of all elements in Mj') / DMAX
end

W2 = 1 - (sum / n)

And finally the relevance W3 of interval duration.

sum = 0
For each Si in R do
    d = MAX(|Ti - L(Si)|,|Ti - U(Si)|)
    sum += 1 - (d / (U(Si) - L(Si)))
end

W3 = sum / n

Evaluating a candidate behavior descriptor

Let B = {(Z0, T0), (Z1, T1), ... (Zn, Tn)} (forgive me for re-using n :-)) be a candidate behavior descriptor we want to evaluate. We seek for a number FITNESS in [0,1] which tells the odds that B belongs to the given category.

The first thing to do is to map from the sets Zi in B to the known sets in R (R is a set with all the sets Si which appear in all BDs of the given category).

This mapping is not trivial. There are several different approaches but for starters we can use a simple one.

For each Zi we can just pick up the most similar Si, attaching a "weight" which is the similarity measure itself. Thus B become:

B = {((S0,SIM0), T0), ((S1,SIM1), T1), ... ((Sn,SIMn), Tn)}

Then we use the mapped sets to evaluate FITNESS.

For each different Si in B do
    compute v' = number of occurrences of Si in B
    sum += P(Si, v') * SIMi
end

F1 = sum / n

Let M* be a matrix computed just like Mj but using B elements instead.

Compute M' = abs(M - M*) // M'(i,j) = abs(M(i,j) - M*(i,j))
DMAX = (size_of(B) * (size_of(B - 1)) / 2

F2 = 1 - ((sum of all elements in Mj') / DMAX)

Take the (L(Si), U(Si)) values compute as described above (L(Si) is the lowest time length amongst all occurrences of Si and U(Si) is the highest, considering all BDs in the category).

For each different Si in B do
    if (Ti is between L(Si) and U(Si)) then
        v = 1
    else 
        d = MIN(|Ti - L(Si)|, |Ti - U(Si)|)
        if (d > (U(Si) - L(Si))) 
            v = 0
        else 
            v = 1 - (d / (U(Si) - L(Si)))
        end
    end
    sum +=  v * SIMi
end

F3 = sum / n

Finally:

FITNESS = (F1 * W1 + F2 * W2 + F3 * W3) / (W1 + W2 + W3)