OpenCogPrime:PLNBookErrata

From OpenCog
Jump to: navigation, search

The PLN book really has more errors than it should. A version has been partially ported to this wiki PLNBook and the following errors could be applied to it.

Entering errata you find here will be much appreciated.

Relatively Substantial Errors

page 73-74

As noted by Jeremy Zucker:

I was working through the Heuristic Derivation of the Independence Assumption-based deduction rule ..., and found 3 typos.

They don't affect the main results ...


On pages 73 and 74 of the derivation, you replaced B with U-B and cranked through the algebra:

P((C\intersect A)|(U-B)) = P(A|(U-B)) P(C|(U-B)) = |A\intersect (U-B)||C\intersect(U-B)|/|U-B|^2

so far, so good, but in the next step, you replace |C\intersect (U-B)| with [P(C)-P(A\intersect B)] instead of [P(C)-P(C\intersect B)]

It is no big deal, because in the subsequent step, you go back to [P(C)-P(C\intersect B)]

However, later on in the derivation, when you expanded

P(C|A) = P(C\intersect A)/P(A)

you divided the second term by P(A) incorrectly, resulting in the expression
 
= P(A|B)P(C|B)P(B)/P(A) + [1-P(A\intersect B)][P(C) -P(C\intersect B)]/(1-P(B))

instead of:

= P(A|B)P(C|B)P(B)/P(A) + [1-P(A\intersect B)/P(A)][P(C) -P(C\intersect B)]/(1-P(B))

Fortunately, in the next step, you recover by replacing [1-P(A \intersect B)] with [1-P(B|A)], so the two mistakes cancel each other out.

Now you would be home free here, except that at the end, when you translate the final equation

P(B|A)P(C|B) + [1-P(B|A)][P(C) - P(B)P(C|B)]/(1-P(B)

into:

s_AC = s_AB s_BC + (1-s_AB)(s_C - s_C s_BC)/(1-s_B)

you claim that this "is the formula mentioned above", when it is actually:

s_AC = s_AB s_BC + (1-s_AB)(s_C - s_B s_BC)/(1-s_B)

page 90

Top of the page

P(B|A) = P(A|B)P(A) / P(B)

should be

P(B|A) = P(A|B)P(B) / P(A)

Likewise

sAB = sBA*sA / sB

should be

sAB = sBA*sB / sA

I don't know if the induction formula suffered from that mistake, but it should be checked.

page 233

The formulas describing the semantics of the Context relationship, given at the bottom of the page, only hold when R is a form of inheritance/implication or similarity/equivalence relationship; they don't hold for relationships R generally

page 255-256

This is an IMPORTANT one...

The text says

"Formally, we may introduce the ASSOC operator, defined as

ExtensionalEquivalence
        Member $X (ExOut ASSOC $C)
        AND
                ExtensionalImplication
                        Subset $Y $E
                        Subset $Y $C
        NOT
                ExtensionalImplication
                        Subset $Y $E
                        NOT [Subset $Y $C]

but it should actually be

ExtensionalEquivalence
        Member $E (ExOut ASSOC $C)
        ExOut
                Func
                    List
                        Inheritance E$ $C
                        Inheritance
                                   NOT $E
                                   $C

where

Func(x,y) = [x-y]^+

and ^+ denotes the positive part ...

and where if $E and $C are relationships, Inheritance is replaced with Implication.

This one was a bad cut and paste error that somehow crept into the final version from a very old version of the manuscript ;-(

Note that a better notation is

AttractionLink E C

which is introduced at

http://www.opencog.org/wiki/OpenCogPrime:PredictiveAttraction

Generally speaking, the PLN book is unacceptably fuzzy on the interpretation of ASSOC in the initial discussion, and then the formal definition that is supposed to clarify it, is fucked up as noted above ...

The best way to interpret

P(cat | E)

in the discussion in this section would be as

P(Situation S involving cat | Situation S involving E)

This is how we actually did it when using intensional inference of this sort for NL corpus analysis (work that Izabela and Ben Goertzel did in 2005), for instance we looked at

P(sentence involving word "cat" |  sentence involving word "fur")

So then, the intensional similarity of two words was basically a measure of whether the two words tended to co-occur-in-sentences with the same other words...

I think there was a more recent write-up of this stuff, but somehow what got into the book finally was something older and less clear, with some errors ... I look forward to correcting this in subsequent printings ;-p

Furthermore, Nil Geisweiller has suggested a modified approach which may be better in practice.

In the modified approach, we would replace

ExtensionalEquivalence
        Member $E (ExOut ASSOC $C)
        ExOut
                Func
                    List
                        Inheritance E$ $C
                        Inheritance
                                   NOT $E
                                   $C

with

ExtensionalEquivalence
        Member $E (ExOut ASSOC_int $C)
        ExOut
                Func
                    List
                        IntensionalInheritance E$ $C
                        IntensionalInheritance
                                   NOT $E
                                   $C

and

ExtensionalEquivalence
        Member $E (ExOut ASSOC_ext $C)
        ExOut
                Func
                    List
                        ExtensionalInheritance E$ $C
                        ExtensionalInheritance
                                   NOT $E
                                   $C

and then define

ASSOC = ASSOC_int OR ASSOC_ext

where OR is an appropriate fuzzy OR (most simply, fuzzy OR is just max).

Typos and Formatting Errors and Such

from Charles Griffiths

p 6, rules 3 & 4, problem with < vs <=

p 16/17 repeated line at page boundary

p 17 "And there is a variety of probabilistic approaches..." (there are)

p 19 "there remain many similarities." (many similarities remain.)

p 26 "Chapter 10 The" (Chapter 10. The)

p 50 why do you mix [L, U] and <(L, U], ...> ? p 14 notation is <L, U, ...> and p 57 has <[L, U], ...>

p 80 "I balls in it, I black ones" (N balls in it, b black ones)

p 140 "D2 is the second-order distribution for D2." (for premise 2.)

p 164 red d

p 271 conbinations


page 25

Section 2.2, we have the definition of Context as follow

  Context
        C
        R A B <t>

is simply

  R (A AND C) (B AND C) <t>

I think it is highly ambiguous, a better representation would be:

  Context <t>
        C
        R A B

is simply

  R (A AND C) (B AND C) <t>


page 28

  • Relationships representing symmetrized higher-order conditional probabilities:
    • ExtensionalEquivalence
    • Equivalence (mixed)
    • IntensionalInheritance <== should be IntensionalEquivalence

page 108

from Kaj Sotala:

calculating the heuristic formula for G(A,B,C), there is the equation

P_subsume(A,B) = min[0, ([P(A) - P(B)] / [P(A) + P(B)])

Since this formula would return negative probabilities in cases where P(B) > P(A), and a probability of 0 otherwise, I assume that the "min" should be "max"...

page 110

P(B) = P(B|A) + P(B|¬A) P(¬A)

P(A) is missing after P(B|A)

page 131

First page of Chapter 6

or are such that both of the intervals [L1_i, L_i] and
[U_i, U1_i] each have probability mass b_i/2, when interval-type
is symmetric.

it should be

(1-b_i)/2

instead of

b_i/2

page 133

Section 6.2.1

 "The first step, in our approach, is to obtain initial probability
 intervals. We obtain the following sets of initial probabilities
 shown in Tables 1-3, corresponding to credibility levels b of 0.95,
 and 0.982593, respectively"

Should be

 "The first step, in our approach, is to obtain initial probability
 intervals. We obtain the following sets of initial probabilities
 shown in Tables 1-2, corresponding to credibility levels b of 0.90,
 and 0.95, respectively"

page 138

Section 6.2.2

 "Deduction Rule Results Using Symmetic Intervals"

Should be

 "Asymmetric".

page 140

Section 6.2.5

",,"

page 143

Section 7.1

 "P ( Member Q C < y > | Member Q C < x > ) = t"

Should be

 "P ( Member Q C < y > | Member Q A < x > ) = t"

page 144

Section 7.2

(t_{1,A}=0, ..., t_{n,A}=1})

page 212

In Section 10.6.2.1 "Boolean Operators for Combining Terms", in the subsection "extensional union", the relationship

Equivalence
       Subset x (A ANDExt B)
       (Subset x A) AND (Subset x B)

is given.

Actually it seems this is best interpreted as a heuristic equivalence, which holds on average and doesn't have strength 1.

In the multiset approach described later on this errata page, we have

P(A & B) = P( mult(A &B) )
P(A) = P(mult(A))

It follows that

P(X | A & B)  = P(X|A) P(X|B)

if we make two big assumptions:

1) A and B are independent

2) A and B are independent in X [meaning A&X and B&X are independent]

[note, this is if not iff ... the equation could still hold if the dependencies exist but go in opposite directions... e.g. if P(A&B) > P(A)P(B) but P(A&B&X) < P(A&X)P(B&X) ]

So, in the multiset based approach, that equivalence relationship is a heuristic which is "true on average" in the above sense, rather than a strength-1 equivalence...


page 252

" Or one can take a probabilistic approach to defining complexity by introducing some reference process H, and defining

c(F;T) = ASSOC(G,H;T) "

G should be F.

page 257

In the following PLN formula

Intensional Inheritance A B

Intensional Inheritance => IntensionalInheritance

page 259

Top of the page, there is a parenthesis problem in :

s = (OR(X,Y).tv.s

page 265-266

Last sentence on page 265, "...until it finds one that has a conclusion of a form ..." should be "...until it finds that it has a conclusion of a form ..."

page 267

First sentence, "... some set of known pre to the given target predicate.", should probably be, "some set of known premises to the given target predicate."

page 287

at the top of the page:

SS_Initiation(showering_event_43) => SS_InitiatedAt(showering_event_43)

SS_Initiation(shaving_event_33) => SS_InitiatedAt(shaving_event_33)

at the bottom of the page:

SS_Initiation(B) => SS_InitiatedAt(B)

SS_Initiation(A) => SS_InitiatedAt(A)

this later is actually on the top of the next page.

index

Many index terms are underlined or shown in Courier font ... index terms really should all be in the same font without underlining

Clarifications and Comments (that aren't errata)

The Nature of Causality

The position taken in the PLN book (and elaborated more in The Hidden Pattern, from a philosophical perspective) is that causality is not an elegant, mathematical thing but rather a part of human "folk psychology" ... part of the way we humans intuitively understand the world ... that is a mixture of different factors. See http://goertzel.org/PLN_causality.pdf for an earlier version of the text from the book....

One psychologically important aspect of causality was left out of the discussion in the book, namely the relationship between causality and action. This factor was mentioned in The Hidden Pattern, but in a less formal way.

The most direct kind of causal relationship perceived by an organism is one that involves the organism's own actions, i.e.

PredictiveImplication
   I do X
   Y happens

or more formally

L :=
PredictiveImplication
   Execution X
   Y

Given another implication


M :=
PredictiveImplication
   A
   B

the mind may then assess M as causal if M derives a lot of positive evidence via inference based on L.

In general, a mind may assess a predictive implication relationship as causal if it derives a lot of positive evidence for this implication's truth value from predictive implications whose sources are ExecutionLinks.

The concept here is that: we think A causes B if we can imagine ourselves in a position where we enact A and this results in B happening.

Causation is thus getting reduced to the "feeling of free will."

I'm not positing this as the only, true or ultimate explanation of causality -- but I think it's a significant aspect of the mix of fuzzy intuitions that underly our folk psychology notion of causality (and an aspect that was left out in the PLN book's discussion, though mentioned in The Hidden Pattern in a less formal way).

Relation between predicate and term logic representations of the same relationships

The same statement can be expressed in both term logic and predicate logic, eg "all ravens are black", "Joe is a raven".


Term logic:

    raven -> black
    Joe -> raven

InheritanceLink
___ ConceptNode: raven
___ ConceptNode: black

InheritanceLink
___ SemeNode: Joe
___ ConceptNode: Raven

Predicate logic:
    raven(X) -> black(X)
    raven(Joe)

To avoid confusion let's write the latter as

isRaven(X) ==> isBlack(X)
isRaven(Joe)

which would be

AverageLink $X
__ImplicationLink
_____ EvaluationLink
________ PredicateNode: isRaven
________ VariableNode: $X
_____ EvaluationLink
________ PredicateNode: isBlack
________ VariableNode: $X


EvaluationLink
___ PredicateNode: isRaven
___ SemeNode: Joe



But then we have that

ForAll $X
__ Equivalence
_____ EvaluationLink
________ PredicateNode: isRaven
________ VariableNode: $X
_____ MemberLink
________ VariableNode: $X
________ SatisfyingSetLink
___________ PredicateNode: isRaven

So that the predicate-argument relation is equivalent to a fuzzy membership relation

So, the relation between predicate and term logic relationships in PLN boils down to the relation between fuzzy membership and probabilistic inheritance relationships

About the semantics of SubsetLinks between fuzzy sets

The semantics of the equation at the top of page 29, in section 2.4.1.1, is not adequately clear ... and some heuristic formulas are given there without a clear explanation that they are in fact just heuristics that could be replaced by precise probabilistic formulas.

The following paper explains how to replace those heuristics with something that has a precise probabilistic foundation but is slightly complicated:

http://goertzel.org/MyPapers/FuzzyProbabilistic.pdf

About PLN and NARS

Someone asked Ben Goertzel:

NARS has been reimplemented by Pei
 four times in four different languages. Since it predates PLN, I
wonder, were there ever any plans to just basically change the "truth
 value" formulas in one of the implementations, making it more
 "probabilistic"?

He answered...

That is how PLN began, when Pei was working for me at Webmind Inc. 
during 3 years in the 1990s. 

Jeff Pressing and I invented PLN originally as a 
"probabilistic version of NARS"

We had a software system then that could do either 
PLN or NARS reasoning (using a subset of the rules and 
formulas of each) depending on a parameter setting.

But it eventually became apparent that you can't 
really do things that way.

It's not just about the truth value formulas, 
it's about the underlying semantics, which are very 
different for NARS than for any probabilistic reasoning system.  
The meaning of an inferred frequency value for an 
inheritance relationship is just very different 
in NARS and in PLN, and this leads to all 
sorts of other differences.

For instance, in PLN one can derive higher-order 
inference rules [using "higher-order" in the NARS sense] 
from first-order rules... due to the probabilistic semantics.  
In NARS you can't.

In PLN one can attach semantics to variable expressions 
with unbound variables, using "mean value" semantics.  
In NARS you can't...

The way intension vs. extension is handled in NARS 
doesn't make sense once you introduce probabilistic 
truth values ... so if you want to talk about intension 
separately from intension, you need to do something else 
(the approach taken in the PLN book being one route...)

etc.

Basically, the deeper you get, the more it becomes clear 
that you can't just paste out NARS truth value formulas 
and paste in probabilistic ones, even though PLN and NARS 
do wind up with a lot of similarities...


Intensional Node Probabilities

Nil suggested the following:

In the PLN book it is said that:

A <w>

means

SubSet Universe A <tv>

I'm wondering if we could consider the intensional and mixed inheritance as well, that is:

A <w>

means

Inheritance Universe A <w>

or even

IntensionalInheritance Universe A <w>


Ben said:

Yes, if we look at

ExtensionalEquivalence

       Member $E (ExOut ASSOC $C)
       ExOut
               Func
                   List
                       Inheritance E$ $C
                       Inheritance
                                  NOT $E
                                  $C


where $C is the universe, then we get

Member $E (ExOut ASSOC $C) = [E.tv - (NOT E).tv]^+ = 2 E.tv -1

so that where

IntensionalInheritance Universe A <w>

then w is a normalized sum of terms of the form

(Member E (ExOut ASSOC A)) * (2 E.tv -1)

so it's pretty much a reweighting of the set A_ASSOC, right?

So what we find is that the intensional node probability of A is a measure of how much probability mass is concentrated in A's association-set.

Of course, E.tv may also be calculated intensionally, extensionally or mixedly in the above...

When doing purely intensional inference, presumably one would use purely intensional node probabilities

When doing mixed inference, one may want to use mixed node probabilities

One of the lessons I learned from Pei is that human commonsense inference mixes up intension and extension in complex ways...

ben


Calculating the truth values of unquantified variable expressions

In the book it says

"The VariableScope link is a kind of "average quantifier": the truth value of

BindLink $X
    F($X)

is defined as the weighted average of the truth value of F($X), i.e. as the sum

w($x) F($x) / normalizer

where w($x) is defined as the truth value of $x in the system."

It seems useful to elaborate a bit on the normalizer/

A simple approach is

normalizer = Sum{ w($x) ; $x goes over all entities in the system's
memory that match F's input type restrictions}

There is a lot of formalism in the OpenCogPrime wikibook about input and output type restrictions of predicates.

We can make similar distinctions to in a programming language, distinguishing

Concept

Concept --> Concept

(Concept --> Concept) --> Concept

and so forth

The subtle issue that arises here pertains to probabilistic overlap between Atoms.

Note for instance that in the Atom formalism, we represent

cat

as a ConceptNode, and also represent

{cat, dog}

as a ConceptNode ...

The thing is, to handle this sort of situation *right*, one would need to account for dependencies among the different Atoms in the system in defining the truth value of a VariableScope link...

In other words, rather than just using a weighted average, we'd have to use the inclusion-exclusion formula from set theory, and do stuff like the following

Let

h($x) = w($x) F($x)

Then, F had domain ConceptNode, and the only ConceptNodes in the system were A and B, we'd need something like

w(A) F(A) + w(B) F(B) - w(A&B) F(A&B)

which bypasses the problem that w(A) and w(B) may both rely on the same pieces of evidence...

Then we still have the funniness of applying F to the logical intersection A&B ... but that funniness is just irreducible... I mean there is no semantic problem there, as if A and B are concepts then A&B is a concept and F is supposed to be applicable to concepts...

But the problem is that applying the inclusion-exclusion formula to a large space of concepts (or other Atoms) is just wholly intractable...

So, as a simple and tractable approximation, I just suggest to ignore all dependencies and use a simple weighted average...

What else can be done?

This leads to all sorts of cognitive biases, but ... well ... so be it ...