Harel and Koren's Dimensional Embedding Algorithm
Our technique for embedding OCP Atoms into high-dimensional space is based on an algorithm suggested by David Harel and Yehuda Koren (PDF). Their work is concerned with visualizing large graphs, and they propose a two-phase approach:
- embed the graph into a high-dimensional real space
- project the high-dimensional points into 2D or 3D space for visualization
In OCP, we don't always require the projection step (step 2); our focus is on the initial embedding step. Harel and Koren's algorithm for dimensional embedding (step 1) is directly applicable to the OCP context.
Of course this is not the only embedding algorithm that would be reasonable to use in an OCP context; it's just one possibility that seems to make sense.
Their algorithm works as follows.
Suppose one has a graph with symmetric weighted links. Further, assume that between any two nodes in the graph, there is a way to compute the weight that a link between those two nodes would have, even if the graph in fact doesn't contain a link between the two nodes.
In the OCP context, for instance, the nodes of the graph may be ConceptNodes, and the links may be SimilarityLinks. We will discuss the extension of the approach to deal with asymmetric links like InheritanceLinks, later on.
Let n denote the dimension of the embedding space (e.g. n=50). We wish to map graph nodes into points in Rn, in such a way that the weight of the graph link between A and B correlates with the distance between proj(A) and proj(B) in Rn.
Choose n "pivot points" that are roughly uniformly distributed across the graph.
To do this, one chooses the first pivot point at random and then iteratively chooses the i'th point to be maximally distant from the previous (i-1) points chosen.
One may also use additional criteria to govern the selection of pivot points. In OCP, for instance, we may use long-term stability as a secondary criterion for selecting Atoms to serve as pivot points. Greater computational efficiency is achieved if the pivot-point Atoms don't change frequently.
Estimate the similarity between each Atom being projected, and the n pivot Atoms.
This is expensive. However, the cost is decreased somewhat in the OCP case by caching the similarity values produced in a special table (they may not be important enough otherwise to be preserved in OCP). Then, in cases where neither the pivot Atom nor the Atom being compared to it have changed recently, the cached value can be reused.
Create an n-dimensional space by assigning a coordinate axis to each pivot Atom. Then, for an Atom i, the i'th coordinate value is given by its similarity to the i'th pivot Atom.
After this step, one has transformed one's graph into a collection of n-dimensional vectors.