Agglomerative Clustering in Atomspace using the URE

From OpenCog
Jump to: navigation, search


It seems to me we can implement agglomerative clustering using the URE, via implementing a few inference rules such as:

C1)
A  \\not a SetLink
B  \\ not a SetLink
|-
SetLink {A, B|

C2)
A \\ not a SetLink
SetLink { L }
|-
SetLink{ L, A}

C3)
SetLink {L}
SetLink {M}
|-
SetLink{L,M}

Here:

  • C1 merges two Atoms into a set
  • C2 merges an Atom and a set by placing the Atom in the set
  • C3 merges two sets via producing their union

Applying these rules repeatedly to a set of Atoms — and placing the sets formed via the rules back into the set ongoingly — would enable a kind of agglomerative clustering to occur.

For this approach to yield meaningful clustering behavior, careful attention to premise selection would be needed. Otherwise the rules would just glom together Atoms and Atom-sets in a random way.

Some meaningful premise selection heuristics would be:

For C1) 
Given A, choose B w. prob. proportional to f( sim(A,B) )

For C2)
Given A, choose SetLink{L} with prob proportional to dist_L(A)

Given SetLink{L}, choose A with prob proportional to dist_L(A)

For C3)
Given SetLink{L}, choose SetLink{M} with prob proportional to avg_{x in L} dist_M{L}

Here f:R—>R is some monotone increasing function, maybe e.g. f(x)=x^p with p>1

In the simplest case, we can take

dist_L(x) = sim(x, centroid(L) )

where

centroid(L) =  C_L

is the centroid or “Weighted average” of all the members in SetLink{L}, i.e.

Member X C_L <s>

means

s = average of s_i for all Z_i in L

where

Member X Z_i <s_i>

To do this efficiently, we’d want to store the centroid of a SetLink e.g.

EvaluationLink
    PredicateNode “CentroidOf”
    ListLink
    SetLink $L
    ConceptNode $C