PLN Rule to contextualize an atom A given no prior assumption about how A relates to C.
Specifically apply the inference
C <TV1> A <TV2> |- ContextLink <TV3> C A
Let's start by assuming
- A and C are uniformly distributed
- TV1.c = TV2.c = 1
Because of #1 the mean of TV3 is TV1.s.
Now let's study the distribution of TV3.s under these assumptions. We discretize it as follows:
P(n/CXn) = (C(CXn, n) + C(N-CXn, An-n)) / C(N, An)
where N is the size of the universe (or the discretization size), TV1.s = An/N, TV2.s = CXn/N and n is varied between 0 and CXn. C(n, k) is the number of k-combinations in a set of size n.
The term at the left of + is for the parts of A that are in the context of C, and the term at the right of + is for the parts of A that are not in the context of C. C(N, An) is a normalization term so P sums up to 1.
When N tends to inf I suppose one can find a nice analytical expression of dP (the density of P) and maybe calculate the confidence TV3.c directly with that. Otherwise one can use the same technique used in the indefinitizer code (see IndefiniteRule::conclusion in FomrulasIndefinite.cc).
In the cases where TV1.c and TV2.c are not 1 (assumption #2 is wrong) can be dealt with using the indefinitizer as well (so in such a case CXn and An are obtained by sampling a beta distribution, etc).
So this technique seems doable and would offer maximum accuracy of TV3.c (under assumption #1 of course) but it's also expensive.
So in addition I suggest a cheap (and probably inaccurate) heuristic
TV3.c = TV1.c*TV2.c*(1-H(TV1.s)*H(TV2.s))
where H() is the entropy. Don't expect much by trying to find an interpretation of that formula, I chosed it because it fits the expected behavior of the function that determines TV3.c given extreme values of TV1 and TV2, like TV1.s = 0, 0.5 or 1, etc.
Such a rule since it makes the assumption that A and C are independent should be called in last resort (after ContextualizerRule for instance) and therefore have a low priority. But in any case this the confidence is gonna lower it should not make too much damage if the rule is called earlier too.