Noisy Smokes Example: Experiments
This wiki page describes some experiments on PLN/ECAN synergy that can usefully be carried out in the near term (written in Dec 2015).
These experiments involve an example we refer to as the “noisy smokes example.”
For the basic motivations see
The second paragraph of Section 6 of the above says
“We hypothesize that if one modified the smokes example via adding a substantial amount of irrelevant evidence about other aspects of the people involved, then one would have a case where ECAN could help PLN, because it could help focus attention on the relevant relationships. “
The “noisy smokes” example is intended to explore this hypothesis via adding a bunch of irrelevant information to the standard “smokes” example.
For code/datea comprising a version of the “noisy smokes” example, see
XX (Misgana, please link to code here) XX
What follows is a somewhat compressed description of a series of experiments that can be undertaken to explore this example in the near term… Once all these experiments are done, we will really be “doing it right” …. (And just experiments 1-3 would constitute “doing it pretty much right”, I would say….)
Hack node truth values for nodes like (ConceptNode “smokes”) and (PredicateNode “friends”) by hand…. Set them to the correct values given the data — i.e. the strength of “smokes” is the percentage of people in the Atomspace who smoke, and the strength of “friends” is the percentage of pairs of people in the Atomspace who are friends…
Using these values, surprisingness can be calculated for Atoms via a couple methods. The nicest method is to use relative entropy, I suppose. Let’s see if Nil has implemented disributed-tv-based code we can use for this. Otherwise we could temporarily insert a bad hack like “surprisingness of s1 relative to s2 is abs(s1-s2)”, which would sorta work for initial experiments…
For starters, in this step, surprisingness is calculated locally. e.g. the surprisingness of “Ben smokes” is calculated relative to “$X smokes” and the surprisingness of “Ben and Bob are friends” is calculated relative to “$X and $Y are friends”. Broader use of context in estimating surprisingness will be handled in a later step.
Given all this, we can then
A) stimulate Atoms with STI based on their surprisingness. I guess could this be done by the FC, for example? When the FC creates a new Atom, or changes the TV of an existing Atom, it can check the surprisingness of that Atom and then stimulate the Atom as appropriate…
B) run PLN forward chaining with premise-selection guided by STI, i.e. once one has a premise P1 and a rule R chosen, one can then find several premises P2 that match (P1, R), and select among the several possible premises P2 via e.g. (tournament selection with probability proportional to STI).
B1) Note, we can also try running PLN FC with the pattern-matcher using internal search guided by STI (Misgana made modified callbacks for this a while ago). Given the simplicity of this example I am unsure whether this will actually work better in this case, but it’s worth trying too…
We can also introduce forgetting, so that Atoms with lowest LTI tend to get forgotten from RAM. Initially this shouldn’t change anything much…
The next step would be to actually calculate the node probabilities for e.g. “smokes” and “friends” automatically. The easiest way to do this is to make the revision rule work correctly upon Atom creation/update, so that e.g. when
is entered into the Atomspace, or has its count increased, then the node “smokes” has its count increased correspondingly…. Eddie has been looking at this in the context of the bio-Atomspace….
Next step is to take into account context. Actually the probability that an arbitrary entity known to the Atomspace smokes, is not very interesting. Who cares how many non-smoking laptops there are, for example What matters is what percentage of *people* smoke…
What seems sensible is that when the FC (or BC) is launched, that instance of inference optionally will have a Context Atom associated with it. The Context will be passed along to all sub-inferences generated by default (though some of these sub-inferences may spawn new Contexts and propagate those, if these involve inference rules explicitly using ContextLink…)…. Then all inferences will take place within that Context by assumption.
Suppose the Context is C. Then if an Atom A comes up in inference, and nothing is known specifically about the TV of A in context C, then the generic context-free TV of A may be assumed and used. But if specific knowledge about the TV of A in context C is there in the Atomspace, then it should be used.
And some inference rules may explicitly use this context C in doing their stuff. (An example would be the macro inference rule to be described in Experiment 3 below.)
The desire of the system to find surprising Atoms, will bias estimates of node probabilities (and other things). For instance, if very few people smoke, but the system is looking for surprising information, then it will discover information about a lot of people who do smoke. If the node probability of “smoke” is updated in a naive way, then the system will as a consequence get an overestimate of the probability of “smoke” …
To avoid this we need to use a smarter inference approach. E.g. for estimating the node probability of “smoke” in the context of “people”, we could do something like: choose N random people from the Atomspace, and then use inference to estimate the degree to which each of them smokes. This “sampling” approach is also not the most brilliant possible strategy, but it does seem like a generally useful tool to have. This would seem to be an “inference macro” that the chainer should have at its disposal.