New ANDRule for PLN

From OpenCog
Jump to: navigation, search

Scope

This page suggests a new (Feb 2014) approach to PLN formulas for the cases

1)

Implication
    AND L1 L2 ... Lk
    R

and

2)

AND L1 L2 ... Lk

Here it is assumed that the Li are links involving common variables, e.g.

Implication
    AND 
         Evaluation eat ($X cheese)
         Evaluation has ($X tail)
         Inheritance $X cute
    Inheritance $X mouse

or

AndLink
     Evaluation eat ($X $Y)
     Evaluation has ($Y tail)
     Inheritance $X cute
     NOT
           Inheritance $X mouse

The ideas presented are somewhat "experimental", as they involve approximations to the underlying probabilistic mathematics that PLN applies to these constructs, and it's not entirely obvious how accurate these approximations will be in practice.

Case 1 will be discussed in detail; Case 2 can obviously be handled in a similar way, using mostly the same code.

Direct Predicate Evaluation

First we discuss a subsidiary problem: how to estimate the truth value of a (generally, multivariable) predicate across the Atomspace.

In the following we assume that given a variable-bearing predicate such as

Evaluation eat ($X mouse)

or

AND
    Evaluation eat ($X mouse)
    Inheritance $X Chinese

or

AND
    Evaluation eat ($X $Y)
    Inheritance $Y Chinese

etc., we have a DirectPredicateEvaluation method to estimate the truth value of the predicate via "direct evaluation" across the Atomspace. That is, the posited method scans the whole Atomspace and figures out the average truth value of the given predicate, over all Atoms in the Atomspace that fulfill the type restrictions of the VariableNodes.

For the case

AND
    Evaluation eat ($X mouse)
    Inheritance $X Chinese

this method would figure out the average degree of truth of "being Chinese and eating mice" over all ConceptNodes in the AtomSpace.

For the case

AND
    Evaluation eat ($X $Y)
    Inheritance $Y Chinese

this method would figure out the average degree of truth of "being a pair of items where the first eats the second, and the second is Chinese", over all pairs of ConceptNodes in the AtomSpace.

For

AND
    Evaluation $Z ($X $Y)
    Inheritance $Y Chinese
    Inheritance $X American

this method would figure out the average degree of truth of "being a triple of items where the first is a predicate, which is evaluated at two arguments, the first being Chinese and the second being American", over all triples (PredicateNode, ConceptNode, ConceptNode) extractable from the Atomspace.


Formulating Direct Predicate Evaluation as Integration

For the reader with some math background, some clarity about the nature of this problem may be achieved by recognizing that it's basically a multivariable integration problem, where the variables live in the Atomspace rather than (say) the real line. That is, we can look at what we're doing as calculating an integral such as, for example,

ExpectedValue[ f($X, $Y); $X in Af1, $Y in Af2]  =

Integral[ f($X, $Y) p($X,$Y) ; $X in Af1, $Y in Af2) ] 

where

1)

Af1 = the set of Atoms in the Atomspace with the 
right type to be f's first argument

Af2 = the set of Atoms in the Atomspace with the 
right type to be f's second argument

2)

As a simple initial assumption we may consider

p($X,$Y) = p($X) p($Y)   

p($X) = 
node/link probability of $X
----------------------------------------
sum of node/link probabilities over all $X in Af1

p($Y) = similar

More sophisticatedly we could make p($X,$Y)
take account of suspected dependencies between $X
and $Y, e.g. boost p($X,$Y) if there is a link between
$X and $Y already (since such a link is evidence that
$X and $Y are associated).  But we can ignore that
for now.

As a simplest workable assumption, one could use

p($X) = 
1
--------------
# of Atoms $X satisfying Af1

p($Y) = similar

3)

We have

f($X,$Y) = truth value strength of:

AND
    Evaluation eat ($X $Y)
    Inheritance $Y Chinese

Now, having formalized the problem, how can we solve it?

The Simpler Case, Where We Can Assume All Highly Relevant Knowledge is Explicit

If we only or mostly care about argument-tuples on which the predicate has ALREADY, previously been evaluated -- and the results stored in the Atomspace -- then things are relatively easy. We can just apply the Pattern Matcher in the right way, to return the list of all Atom-Tuples satisfying the pattern. Then we look at the truth values of the corresponding links. For example, the Pattern Matcher might respond with

AND <.2>
    Evaluation eat (people food)
    Inheritance food Chinese

AND <.5>
    Evaluation eat (Ben dog)
    Inheritance dog Chinese

...

etc.

If we assume that the pattern matcher gives us the list of Atoms for which f($X,$Y) has a truth value significantly larger than zero, then we can just take these Atoms and calculate the integral specified above using their truth value information. Done deal.

However, this doesn't necessarily give us the answer we need. This can be used to calculate what percentage of tuples are explicitly recorded in the Atomspace to fulfill the given predicate. In some cases this is what is desired, because we can assume that enough inference has been done that, if a tuple fulfills the predicate to a significant degree, this will already be recorded in the Atomspace. If so, then direct evaluation is not so hard -- it basically involves running the pattern matcher then doing some calculations based on the results.

On the other hand, the subtler case arises when we want to consider also Atom-tuples for which the truth value of the given predicate has not been evaluated before (or if it has been evaluated before, the result has not been stored in the Atomspace). This is the more general case for AGI-ish applications.

The Harder Case, where Much Relevant Knowledge is Implicit

In the more difficult case, for each tuple ($X,$Y) under consideration, at least little bit of inference must be done to figure out the degree to which the predicate applies to ($X,$Y) . We then have an inference control problem: How to guide inferences across the Atomspace, in such a way as to get a good estimate of the degree to which a predicate holds on average across the Atomspace, even when the predicate has not necessarily been evaluated that widely across the Atomspace yet.

That is: We then have a multivariable integration problem, where evaluation of the f($X,$Y) portion of the integrand requires some PLN backward chaining. Fun, fun, fun!

Having cast the problem as multivariable integration, we can now see that: Estimating the answer to this problem via randomly sampling pairs ($X,$Y) from the Atomspace, and doing backward chaining on f($X,$Y), is basically doing Monte Carlo estimation of the value of the expected value integral.

The problem with this kind of Monte Carlo sampling, is that for many relevant predicates: For almost all pairs ($X,$Y), we can expect f($X,$Y) to be about zero. In these cases, doing MC with a modest number of samples is going to yield something very noisy.

This leads to the idea of approaching the problem via importance sampling (see http://en.wikipedia.org/wiki/Monte_Carlo_integration#Importance_sampling ) or related methods...

The general notion of importance sampling for function integration is to do more function evaluations in those areas where the function (the integrand) varies more. In the case of many common predicates, in the present case, this is largely the same as doing more predicate evaluations in the region of tuple space where f($X,$Y) is far from zero. Because for tuples where f($X,$Y) =0, there is usually not much variation nearby -- we usually have f($X1,$Y1) =0 for nearby tuples ($X1, $Y1) as well.

To find tuples for which f($X,$Y) is different from 0, one useful approach will be frequent itemset mining based. That is, given a conjunction C of N links involving a tuple of K variables, one can follow an algorithm such as:

1) 
Scan the Atomspace to find all instantiated-variable tuples that (according to the knowledge 
explicitly represented in the Atomspace) fulfill any ONE of the links in C to a sufficient degree 
(as determined by some threshold T).  Call these tuples T11, T12, T13, ....  Let L(Ti) denote the 
subset of the N links that Ti fulfills to an above-threshold degree.

2) 
For each of the tuples T1i found in Step 1, scan the Atomspace to find all instantiated-variable 
tuples that fulfill the conjunction of two of the links in C to an above-threshold degree, where 
only pairs having at least one member in L(T1i) are considered.   Call these tuples T21, T22, T23,...

3)
For each of the tuples T2i found in Step 2, scan the Atomspace to find all instantiated-variable 
tuples that fulfill the conjunction of three of the links in C to an above-threshold degree, where 
only triples having at least one member in L(T2i) are considered.   Call these tuples T31, T32, T33,...

...

What this will yield is an estimate of the probability of the conjunction (the predicate), over a special sample of argument tuples.

Now, this FIM approach will yield a very nonrepresentative sample, as it is explicitly constructed to comprise tuples making the conjunction as true as possible.

We need to estimate the probability of the conjunction C as evaluated over ALL appropriately-typed tuples in the AtomSpace, yet using the tuple mining approach, we have only evaluated C over an intentionally nonrepresentative subset of these tuples.

One approach to this problem is to assume we have BOTH an estimate from random sampling across all tuples drawn from the Atomspace, AND an estimate from the frequent tuple mining approach, AND an estimate from simply running the pattern miner.

How do we define these three estimates? This is where importance sampling comes in.

Suppose we have

  • an estimate x1 based on evaluation of N1 tuples that were provided by the pattern matcher
  • an estimate x2 based on evaluation of N2 tuples that were evaluated via the FIM algorithm described above
  • an estimate x3 based on evaluation of N3 tuples that were found via random sampling of tuples from the AtomSpace (assumed to have N Atoms altogether)

The idea of importance sampling applies to a case where one is choosing samples (in this case tuples) to evaluate based on some probability distribution over sample space. It then says that one should estimate the integral of, say, F($X,$Y) = f($X,$Y) p($X,$Y) based on the average of, e.g.

F($X,$Y)
-------
P($X,$Y)

where P($X,$Y) is the probability of getting ($X,$Y) in one's search over sample space.

The simplest incarnation of this idea in the present context would be to use the estimator

x1 w1 + x2 w2 + x3 w3

w1 = 3 N1 / (N1+N2+N)

w2 = 3 N2/ (N1+N2+N)

w3 =  3 N/ (N1+N2+N)

It might also be good to add some confidence weighting, e.g. weighting the w3 term by an additional weight

c3 = 1 - s3/sqrt(N3)

where s3 is the standard deviation going along with x3. This represents the idea that MC sampling is quite unreliable, so that the important thing is really how many samples one has. The s3/sqrt(N3) is the standard formula for sample error, http://en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean , which holds under reasonable assumptions. Then one would have

x1 w1 + x2 w2 + x3 w3

w1 = 3 N1 / (N1+N2+c3 N)

w2 = 3 N2/ (N1+N2+c3 N)

w3 =  3 c3 N/ (N1+N2+c3 N)

AndImplicationRule

Now we return to the case

Implication
    AND L1 L2 ... Lk
    R


The Simpler Case, where the Conjunction Has Few Terms

Consider for simplicity the case

Implication
    AND L1 L4 L7 L9
    R

Then, the issue at hand is to calculate

P( R | L1 & L4 & L7 & L9)

which may be done e.g. via

P( R | L1 & L4 & L7 & L9) = P( R & L1 & L4 & L7 & L9)  / P( L1 & L4 & L7 & L9)

Next, the numerator and denominator may be calculated using the intersection version of the inclusion-exclusion formula

http://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_intersections

This requires us to calculate a bunch of stuff like

P(~R & ~L1 & ~ L4)
P(~R & ~L1 & ~ L7)
...

and if we wanted to get even higher-order dependencies, stuff like

P(~R & ~L1 & ~ L4 & ~L7)
P(~R & ~L1 & ~ L7 & ~L9)
...

Note that each of these conditional probabilities involves calculation of the probability of an AndLink, e.g.

Andlink
   NotLink R
   NotLink L1
   NotLink L4

for P(~R & ~L1 & ~ L4). This may be done via the methods described in the previous section; or it could be done via applying the inclusion/exclusion based AndRule to be described below (which works similarly to the AndImplicationRule we are describing in this section now).

The Harder Case, where the Conjunction Has Many Terms

Next, if there is a larger number of Li in the conjunction at the source of the implication, then one approach is to approximate the inclusion-exclusion formula by leaving out small terms. This basically comes down to "frequent rule-combination mining" followed by inclusion-exclusion calculations based on the frequent rules only...

Then we can do basic FIM style data mining find the reasonably frequent pairs (~L_i & ~L_j) [where frequency is calculated using the methods outlined above for estimating predicate truth value across the Atomspace]

Then for the reasonably frequent pairs, we find the reasonably frequent triples (~L_i & ~L_j & ~L_k)

(I suspect that might be all the iterations needed in the vast majority of real-world cases, but in principle one could go on to quadruples as well. )

Then, one could do as suggested in above in the small-conjunction case, but restricting attention only to those tuples identified as reasonably frequent.

AndRule

The same methods just outlined for

Implication
    AND L1 L2 ... Lk
    R


can obviously be applied to

AND L1 L2 ... Lk

equally well. For small conjunctions one can use inclusion-exclusion; for larger conjunctions one can use FIM on the conjuncted terms, to do inclusion-exclusion using only nontrivial terms.