OpenCoggy Probabilistic Programming

From OpenCog
Jump to: navigation, search

EXPRESSING OPENCOG COGNITIVE PROCESSES IN TERMS OF PROBABILISTIC SAMPLING

This page contains preliminary notes toward a reformulation of most or all OpenCog cognitive processes in terms of iterated probabilistic-programming-style sampling queries ... via introduction and appropriate use of a new link type called SampleLink.

The motivation is to make OpenCog more simple and elegant, via abstracting the "hard parts" of the OpenCog cognitive algorithms into the SampleLink construct. Now, various cognitive algorithms must then be used together to evaluate the SampleLinks -- so we still need the whole "cognitive synergy" complexity of OpenCog, in a way. But the SampleLink construct is used as a focal point for the complexity, which may help simplify the design both theoretically and implementation-wise.

The bulk of the page goes through various OpenCog cognitive processes and attempts to formulate simplistic versions of them in terms of Atomese with SampleLinks.

Aside from SampleLink, some of the other Atomese used also doesn't exist yet; but SampleLink is the only "big deal"; the other new Atomese suggestions are just minor variations on what already works.

(The ideas here were conceived by Ben Goertzel and the bulk of the page was written by Ben, but Nil Geisweiller has reviewed the contents and fixed/improved various things. Thanks are also due to Alexey Potapov for inspiring Ben to think in this direction, though Alexey is not to be blamed for the specifics of these ideas....)

GENERAL PROCESS (Probabilistic Growth and Mining of Combinations)

The conceptual inspiration for the code sketches presented here is the notion of Probabilistic Growth and Mining of Combinations, or PGMC.

Conceptual Idea of PGMC

At the highest level, the PGMC idea is that the essential core of cognition can be summed up as two processes:

  • synthesis (forward combination) ... existing combinations of entities are grown by adding new elements onto them to make bigger combinations. The choice of which elements to add is biased via history, i.e. via which combinations have proved useful in the past.
  • analysis (backward combination) ... Given an existing entity, an effort is made to form it via combination of 2 or more other entities. Which "backward combinations" of this nature are formed, is biased via history, i.e. via which combinations have proved useful in the past.

This is an extremely general rendition of cognitive process; and in this view, there is a lot of flexibility in terms of what kinds of entities are in the mind combining with each other.

Also, there is the question of how the biasing works. A key idea is that patterns regarding which combinations have been successful in past history, can themselves be expressed as combinations -- and the pattern recognition process itself can be executed via forward growth and backward combination.

The general philosophy underlying PGMC is discussed a bit here

http://www.goertzel.org/dynapsyc/2006/ForwardBackward.htm

and in more depth in the book The Hidden Pattern (Ben Goertzel, 2006).

Probabilistic Formulation of PGMC

To phrase this a little more mathematically ... here is a simple summary of the PGMC process as I see it being used in OpenCog in the near future ...

1) Probability Estimate Guided Growth (PEGG)

Given a quality measure F, and one or more combinatory operators C.

Choose x w/ probability proportional to F

Choose y (and optionally also choose C from a set of options) so that C(x,y) holds, with probability proportional to estimated F(C(x,y))

Calculate actual F(C(x,y)), and assign y and C(x,y) important proportionately

2) Probability Estimation

In many cases, given the nature of C, there will be certain heuristics for choosing C and y that have a decent chance of making F(C(x,y)) large. An example is that the greedy heuristic in pattern mining, which chooses new patterns that are direct extensions of existing patterns (instead of just exploring random patterns in the space of possible patterns).

But in some cases PEGG must also be used for Probability Estimation ... in this case the quality measure F for an Atom x measures how well x estimates the quality of some entity x1 according to some measure F1. An example is the use of pattern mining to estimate the surprisingness of a pattern in a set of multi-step inferences; in this case, goal of the pattern mining is to get a good estimate of the surprisingness of an inference pattern P, because the surprisingness of an inference pattern P will be the quality measure used for PEGG-driven forward and backward chaining inference.

The guiding idea expressed on this wiki page, is to use the above ideas to reformulate OpenCog cognitive processes in concise and relatively standard Atomese, via relying on a new link type called SampleLink, and on the URE (Unified Rule Engine) to chain various SampleLink invocations together.

Basically, SampleLink is used for the "biasing" involved in choosing which forward or backward combinations to make. The URE is then used to chain together multiple steps that are chosen via the biasing operation.


A Note on DefineLink Syntax

In the remainder of this page I will assume a slightly different syntax for DefineLink than what is currently implemented.

DefineLink, as currently implemented, has the following format only:

DefineLink
  <alias>
  <definition>

So, typically, in order to define a function you need to wrap the variables and the body in a LambdaLink

DefineLink
  <alias>
  LambdaLink
    <variables>
      <body>

However, it has previously been suggested to have DefineLink support a different format, which I will use here:

DefineLink
  <alias>
  <variables>
  <body>

as syntactic sugar for the currently functional format.


SAMPLE LINK

The SampleLink (not currently implemented in OpenCog) would need to have the following functionality:

SampleLink
  PredicateNode F
  PredicateNode C

would need to choose a random sample from the of Atoms X satisfying

EvaluationLink <s>
  PredicateNode C
  X

s > 0

, according to the distribution F (which may be unnormalized).

Note that in the currently proposed approach, SampleLink only returns one individual, and one needs to repeatedly call SampleLink to get a larger sample. One could also implement a SampleLink variant that returned a ListLink of n samples, where n was specified as an additional argument.

That is, SampleLink is driving "fitness-proportionate sampling" where F is the fitness, subject to the constraint C. This is closely related to "optimization queries" in probabilistic programming.

The second argument C is not strictly necessary as one could equivalently say


SampleLink
  AndLink
    PredicateNode F
    PredicateNode C

but the use of a separate argument slot for C allows more convenient and efficient processing for the case where the sampling process is to be restricted to a small set of Atoms, e.g. if C is

DefineLink
  DefinedPredicate "C"
  VariableNode $X
  PresentLink <-- probably not necessary
    MemberLink
      VariableNode $X
      SetLink [123]

so that C is true only of elements in a certain SetLink ...

In that case processing can proceed first by enumerating the elements permitted by the filter C, and then sampling from this set according to F. Of course, this could be done in the format

SampleLink
  AndLink
    PredicateNode F
    PredicateNode C

as well, but the format

SampleLink
  PredicateNode F
  PredicateNode C

explicitly expresses C as a filter... We might also want to allow forms

SampleLink
  PredicateNode F
  ConceptNode C

(restricting sampling to members of C, according to MemberLinks) and

SampleLink
  PredicateNode F
  SetLink [123]

SampleLink
  PredicateNode F
  ListLink [123]

restricting attention to elements of the given set or list...

A third argument may optionally be added to indicate the sampling algorithm used

SampleLink
  PredicateNode F
  PredicateNode C
  SchemaNode "indicate sampling method here"

where the GSN must be either a GroundedSchemaNode or a DefinedSchemaNode

ACTUALLY DOING THE SAMPLING

Everything on this page is based on SampleLink, so that begs the question of how to actually do the sampling. But I'm not going to focus on that here. That's the next step. Here my focus is on expressing simplified versions of all the different OpenCog cognitive processes in the language of "Atomese + sampling"

Clearly basic MC type sampling will be inadequate except in very simple cases. Since SampleLink is doing "fitness proportionate sampling" with an aim toward resolving optimization queries, I believe PLN can be an appropriate tool for doing the sampling. Using PLN with a history-based-sampling-driven control mechanism, PLN can be made to find Atoms satisfying a specified predicate, which is what SampleLink requires....

Thus it becomes "turtles all the way down", as we rely on SampleLink to drive PLN inference control, and rely on PLN inference control to drive SampleLink. But AGI is always turtles all the way down...

A little more detail

Specifically, how might we use PLN to solve the problem: "Choose a bunch of x, sequentially (or in batches), with probability of choosing x proportional to F(x)?"

One approach is to have PLN able to take, as a target,

Find x so that "F(x) is true to degree around p"

where p is some specific number. Or

Find x so that "F(x) is true with degree in interval I"

works also...

If we can run PLN with this sort of target, then

  • as we evaluate F(x) for multiple x, we can build a model of the distribution (histogram) of F(x) values for all x in the Atomspace
  • Given this distributional model D_F, when we want to choose a new x, we can choose p from D_F, and then search for x so that "F(x) is true with degree p"

To support queries of the form

Find x so that "F(x) is true to degree around p"

in PLN, we would need to do a little work to propagate the p-values through the backward chainer. In other words, given a target Atom A (associated with a target strength value p) and an inference rule R, what target strength values should be set for the premises of R in backward chaining? This is not hard to answer given the specific quantitative truth value formulas for the rules R, but the backward chainer would have to be modified to incorporate this feature.

Also, to make this work, we would need the distribution model D_F to be stored for ongoing use, meaning that we need F to have a DTV (distributional truth value) rather than a simple truth value.

This is a strange kind of sampling then: What we are sampling are the p-values used as targets for PLN backward chaining.

Pattern Mining

In the above proposal, PLN is playing the leading/driving role in doing the "sampling" behind a SampleLink. However, for PLN to do its stuff, it will need help from other cognitive processes. Pattern Mining can learn new Atoms that can be used inside PLN inference; ECAN attention allocation can help guide PLN; etc.

The whole "cognitive synergy" mess of OpenCog is needed here, to evaluate SampleLinks. So the SampleLink proposal doesn't especially solve the subtle combinatorial-explosion problems at the heart of AGI, it just channels the combinatorial explosion problems underlying the various cognitive processes in OpenCog into one place (evaluating SampleLinks)....

FORMULATION OF VARIOUS OPENCOG ALGORITHMS IN TERMS OF ATOMESE WITH SAMPLE-LINK

Now comes the "meat" of the page.... What follows are a bunch of rough sketches of OpenCog "cognitive processes" formulated as Atomese making central use of SampleLink.

These are not intended as fully fleshed out, maximally functional AI processes. Rather, they are intended as simple examples, to illustrate how each TYPE of cognitive process considered can be expressed in terms of the style of probabilistic programming presented here.

EDA/GA

I will start with a very simplistic version of "something vaguely MOSES-ish", just for illustration...

Given a fitness function F (let's say mapping into [0,1])

Select x with probability proportional to F (or rather try to).

Select z so that z = Cross(x,y) for some y, with probability proportional to estimated F(z) (and also fitness of y? we can leave that out for now...)

Select w so that w = Mutate(z), with probability proportional to estimated F(w)

Atomese sketch:

SignatureLink
  GroundedSchemaNode "Best estimate"
  ArrowLink
    VariableTypeNode "PredicateNode"   \\ exact fitness function
    VariableTypeNode "PredicateNode"   \\ estimated fitness function


ExecutionOutputLink
  DefinedSchemaNode "mutation step"
  ExecutionOutputLink
    DefinedSchemaNode "crossover step"
    ExecutionOutputLink
      DefinedSchemaNode "initial selection step"

DefineLink
  DefinedSchemaNode "initial selection step"
  SampleLink
    PredicateNode "FitnessFunction"
    PredicateNode "Population"


DefineLink
  DefinedSchemaNode "crossover step"
  VariableNode $x
  SampleLink
    ExecutionOutputLink  \\ sampling according to the estimated fitness
      GroundedSchemaNode "Best estimate"
      PredicateNode "FitnessFunction"
    LambdaLink
      VariableNode $z
      MemberLink
        VariableNode $z
        RangeLink
          LambdaLink
            VariableNode $y
            ExecutionOutputLink
              GroundedSchemaNode "Cross"
              ListLink
                VariableNode $y
                VariableNode $x

where RangeLink corresponds to the codomain or a subset of it of the schema given in input.

DefineLink
  DefinedSchemaNode "mutation step"
  VariableNode $z
  SampleLink
    AndLink
      ExecutionOutputLink  \\ sampling according to the estimated fitness
        GroundedSchemaNode "Best estimate"
        PredicateNode "FitnessFunction"
      SatisfactionLink    \\ this clause makes really wild mutants less likely
        VariableNode $a
        SimilarityLink
          VariableNode $z
          VariableNode $a
    LambdaLink
      VariableNode $a
      MemberLink            \\ $a is produced by mutating $z
        VariableNode $a
        RangeLink
            ExecutionOutputLink
              GroundedSchemaNode "mutate"
              VariableNode $z

assuming similarity(y,z) measures degree to which y is a mutant of z

FORWARD CHAINING INFERENCE

Given a quality metric Q, such as "interestingness"

Given a source x, selected w/ probability proportionate to importance

Select a rule R and additional source y, with probability proportional to estimated Q( R(x,y))

Give the result z = R(x,y), and the formal representation of "z = R(x,y) , an importance boost proportional to the actual calculated Q( R(x,y))

Atomese sketch:

DefineLink
  DefinedSchemaNode "forward step"
  SequentialExecutionLink
    ExecutionOutputLink
      DefinedSchemaNode "stimulate Atom"
    ExecutionOutputLink
      DefinedSchemaNode "importance_guided_combination_selection_and_evaluation"
        SampleLink
          DefinedSchemaNode "get STI"

The above uses SequentialExecutionLink, which doesn't exist in the code currently, and differs from SequentialAndLink in that it doesn't care what its arguments evaluate to.

DefineLink
  DefinedSchemaNode "importance_guided_combination_selection_and_evaluation"
  VariableNode $A
  ExecutionOutputLink
    GroundedSchemaNode "py: evaluate_and_supply_missing_premises_via_STI"
      ListLink
        SampleLink
          SatisfactionLink  \\ the set of PLN rule combinations that include $A as premise
            VariableNode $X
            AndLink
              MemberLink
                VariableNode $X
                ConceptNode "PLNRuleCombinationPatterns"
              EvaluationLink
                DefinedPredicateNode "combination includes premise matching"
                ListLink
                  VariableNode $X
                  VariableNode $A
        VariableNode $A

and

DefineLink
  DefinedSchemaNode "stimulate Atom"
  ExecutionOutputLink
    DefinedSchemaNode "forward chaining Atom stimulation constant"

BACKWARD CHAINING INFERENCE

Given a quality metric Q, such as "strength * confidence"

Given a target z, selected w/ probability proportionate to importance or via some external query

Select a rule R and sources x, y, with probability proportional to estimated Q( R(x,y))

Give the formal representation of "z = R(x,y)", and the entities x and y, an importance boost proportional to the actual calculated Q( R(x,y))


Atomese sketch:

DefineLink
  DefinedSchemaNode "backward step from target"
	VariableNode $Z
    SequentualAndLink
      ExecutionOutputLink
        DefinedSchemaNode "boost STI"
      SampleLink
        DefinedSchemaNode "get combination utility"
          SatisfactionLink
            VariableNode $X
            AndLink
              MemberLink
                VariableNode $X
                ConceptNode "CombinationPatterns"
              EvaluationLink
                DefinedPredicateNode "combination produces"
                ListLink
                  VariableNode $Z
                  VariableNode $X

See below for the definition of DefinedSchemaNode "get combination utility".

FUZZY MATCHING

Given a cue x, selected w/ probability proportionate to importance or via some external query

Select a match y with probability proportional to estimated Match-Degree(x,y) [i.e. estimated quality of Match(x,y)]

Calculate actual Match-Degree(x,y), and return y as a response if it's high enough

Atomese sketch:

DefineLink
  DefinedSchemaNode "guess fuzzy match"
  VariableNode $x
  SampleLink
    LambdaLink
      VariableNode $y
      EvaluationLink
        GroundedPredicateNode "fuzzy estimated match degree"
        ListLink
          VariableNode $x
          VariableNode $y


To get the actual, accurate degree of the match found, one could do

EvaluationLink
  GroundedPredicateNode "fuzzy match degree"
    VariableNode $x
    ExecutionOutputLink
      DefinedSchemaNode "guess fuzzy match"
      VariableNode $x

SURFACE REALIZATION

(simple algorithm similar to current SuReal)

Given a cue x

Select a match y from the set of sentence-structures corresponding to known sentences, with probability proportional to estimated Match-Degree(x,y)

Calculate actual Match-Degree(x,y), and return y as a response if it's high enough

The current (crude) SuReal algorithm is just like fuzzy matching, but restricts focus to Atoms that are SetLinks produced from interpreting sentences w/ the OpenCog NLP pipeline...

Atomese sketch:

DefineLink
  DefinedSchemaNode "guess fuzzy sentence match"
  VariableNode $x
  SampleLink
    LambdaLink
      VariableNode $y
      EvaluationLink
        GroundedPredicateNode "estimated fuzzy match degree"
          VariableNode $x
          VariableNode $y
    PredicateNode "is sentence interpretation"


EvaluationLink
  GroundedPredicateNode "fuzzy sentence match degree"
    VariableNode $x
    ExecutionOutputLink
      DefinedSchemaNode "guess fuzzy sentence match"
      VariableNode $x

AGGLOMERATIVE CLUSTERING

Select x that isn't already a member of some larger cluster (or if one is building overlapping clusters: so that isn't already a member of too many existing clusters)

Select z so that z = Merge(x,y) , with probability proportional to estimated Cluster-Quality(z)

Calculate actual Cluster-Quality(z), and reject z if the value is too low

Atomese sketch:

\\ Basic clustering step

DefineLink
  DefinedSchemaNode "clustering step"
  ExecutionOutputLink
    DefinedSchemaNode "agglomerate onto"
    SampleLink
      PredicateNode "getSTI"
      SatisfactionLink
        TypedVariableLink
          VariableNode $x
          TypeChoice
            TypeNode "ConceptNode"
            TypeNode "SetLink"
        VariableNode $x

\\ The above assumes that clusters are concepts or sets ... of course the assumption could be
\\ made more flexible, this is just an illustration

\\ Given x, take a stab at trying to expand x into a bigger cluster

DefineLink
  DefinedSchemaNode "agglomerate onto"
  VariableNode $x
  ExecutionOutputLink
    DefinedSchemaNode "optionally make new cluster"
    ExecutionOutputLink
      DefinedSchemaNode "make cluster pair"
      VariableNode $x


\\ Given x, guesses some y to glom onto x to make a bigger cluster

DefineLink
  DefinedSchemaNode "merge step"
  VariableNode $x
  SampleLink
    ExecutionOutputLink  \\ sampling according to the estimated cluster quality
      GroundedSchemaNode "Estimated cluster quality"
    LambdaLink
      VariableNode $z
      MemberLink
        VariableNode $z
        RangeLink
          LambdaLink
            VariableNode $y
            ExecutionOutputLink
              SchemaNode "Merge"
              ListLink
                VariableNode $y
                VariableNode $x




\\ Makes a pair (proto-cluster, proto cluster quality)

DefineLink
  DefinedSchemaNode "make cluster pair"
  VariableNode $x
  ExecutionOutputLink
    DefinedSchemaNode "pair builder"
    ListLink
      VariableNode $x
      DefinedSchemaNode "merge step"
      DefinedSchemaNode "calculate cluster quality"

DefineLink
  DefinedSchemaNode "pair builder"
  ListLink
    VariableNode $x
    VariableNode $F
    VariableNode $C
  ListLink
    ExecutionOutputLink
      VariableNode $F
      VariableNode $x
    ExecutionOutputLink
      VariableNode $C
      ExecutionOutputLink
        VariableNode $F
        VariableNode $x


DefineLink "cluster creation threshold"
  NumberNode "0.2"

DefineLink "new cluster STI"
  NumberNode "0.6"

\\ Makes a new cluster from the proto-cluster if it's good enough

DefineLink
  DefinedSchemaNode "optionally make new cluster"
  VariableNode $pair
  ExecutionOutputLink
    DefinedSchemaNode "optionally make new Atom"
    ListLink
      VariableNode $pair
      DefinedSchemaNode "cluster creation threshold"
      DefinedSchemaNode "new cluster STI"

\\ Helper function used by the above, also useful in pattern mining as discussed later

DefineLink
  DefinedSchemaNode "optionally make new atom"
  ListLink
    VariableNode $pair
    VariableNode $threshold
    VariableNode $startingSTI
  IfElseLink
    GreaterThanLink
      ElementAtLink
        VariableNode $pair
        NumberNode "2"
      VariableNode $threshold
    SetSTILink
      ElementAtLink
        VariableNode $pair
        NumberNode "2"
      VariableNode $startingSTI
    SetSTILink
      ElementAtLink
        VariableNode $pair
        NumberNode "2"
      NumberNode "0"


PATTERN MINING

Given a size measure Concision(x), which is inversely proportional to the size of x; i.e Concision(0)=1, Concision(infinity)=0, and Concision(x) is monotone dereasing in x.

Given a quality measure Q(x), such as frequency or surprisingness of the pattern x

Let WQ(x) = Concision(x) * Q(x)

Choose x with a probability proportional to WQ(x)

Select y so that Extends(x,y) is true, with a probability proportional to the estimate of WQ(y)

Atomese sketch:

(this is very similar to agglomerative clustering, structurally, though of course very different in effect)

\\ Pattern mining step

DefineLink
  DefinedSchemaNode "pattern mining step"
  ExecutionOutputLink
    DefinedSchemaNode "grow pattern"
    SampleLink
      PredicateNode "getSTI"
      PredicateNode "isPattern" \\ returns true for Atoms containing variables bound by Lambda?

\\ Given x, take a stab at trying to expand x into a bigger pattern

DefineLink
  DefinedSchemaNode "grow pattern"
  VariableNode $x
  ExecutionOutputLink
    DefinedSchemaNode "optionally make new pattern"
    ExecutionOutputLink
      DefinedSchemaNode "make pattern pair"
      VariableNode $x


\\ Takes a pattern x and chooses a possible extension of it (y)

DefineLink
  DefinedSchemaNode "expand pattern"
  VariableNode $x
  SampleLink
    LambdaLink
      VariableNode $y
      AndLink
        PredicateNode "Extends"
          VariableNode $y
          VariableNode $x
        PredicateNode "estimated WQ"
          VariableNode $y


\\ Make a pair (pattern, pattern quality)

DefineLink
  DefinedSchemaNode "make pattern pair"
  VariableNode $x
  ExecutionOutputLink
    DefinedSchemaNode "pair builder"
    ListLink
      VariableNode $x
      DefinedSchemaNode "expand pattern"
      DefinedSchemaNode "calculate WQ"


DefineLink "pattern creation threshold"
  NumberNode "0.2"

DefineLink "new pattern STI"
  NumberNode "0.6"

DefineLink
  DefinedSchemaNode "optionally make new pattern"
  VariableNode $pair
  ExecutionOutputLink
    DefinedSchemaNode "optionally make new Atom"
    ListLink
      VariableNode $pair
      DefinedSchemaNode "pattern creation threshold"
      DefinedSchemaNode "new pattern STI"


ATTENTION ALLOCATION

A simple STI-spreading process could be implemented like this:

DefineLink
  DefinedSchemaNode "spread STI to chosen Atom"
  VariableNode $sourceAtom
  SequentualExecutionLink
    ExecutionOutputLink
      DefinedSchemaNode "spread STI among"
        ExecutionOutputLink
          DefinedSchemaNode "get STI"
          VariableNode $sourceAtom
        ExecutionOutputLink
          DefinedSchemaNode "get links to neighbors"
          VariableNode $sourceAtom
    ExecutionOutputLink
      DefinedSchemaNode "decrement STI"
        VariableNode $sourceAtom

where the source of spreading could be chosen w/ probability proportional to STI

DefineLink
  DefinedSchemaNode "choose Atom and spread importance from it"
  ExecutionOutputLink
    DefinedSchemaNode "spread STI to chosen Atom"
    SampleLink
      DefinedSchemaNode "get STI"

SIMPLE (TOY) REINFORCEMENT LEARNING

For this simple toy example, let's consider the case of an OpenCog system with only one goal. Also, let us consider that forward and backward chaining are both going on continually. Perhaps multiple instances of each are going on continually. Balancing between forward and backward chaining activity can be done adaptively but we won't consider that here as it's not critical to the points we want to make.

Let us assume that importance (STI) propagation is happening in the OpenCog Atomspace, concurrently with the backward and forward chaining processes, and that a large fraction of the importance in the Atomspace is being distributed via the single goal. That is, the single goal is being allocated a substantial amount of importance at each point in time, and is spreading it along its links and throughout the Atomspace. In this scenario, importance is a reasonable proxy for "importance for the single goal."

A forward step could then be chosen via

DefineLink
  DefinedSchemaNode "forward step"
  SampleLink
    DefinedSchemaNode "get combination utility"
      SatisfactionLink
        $X
        MemberLink
          $X
          ConceptNode "CombinationPatterns"

where the utilities may be stored in the Atomspace,

DefineLink
  DefinedSchemaNode "get combination utility"
    VariableList
      VariableNode $C // Atom whose utility is being assessed
      VariableNode $G // The goal
    GetLink
      TypedVariableLink
        VariableNode $N
        TypeNode "NumberNode"
      ExecutionLink
        SchemaNode "utility"
        ListLink
          VariableNode $C
          VariableNode $G
        VariableNode $N

What the "forward step" above does, then, is to choose a random Atom according to the utility distribution, and then execute it.

Similarly -- but slightly more complexly -- a backward step could then be chosen via

DefineLink
  DefinedSchemaNode "backward step"
  SequentualAndLink
    ExecutionOutputLink
      DefinedSchemaNode "boost importance"
    SampleLink
      DefinedSchemaNode "get combination utility"
        SatisfactionLink
          $X
          AndLink
            MemberLink
              $X
              ConceptNode "CombinationPatterns"
            EvaluationLink
              DefinedPredicateNode "combination produces"
              ListLink
                ExecutionOutputLink
                  DefinedSchemaNode "get bc target"
                $X

Here, as compared to the forward step, we need the SampleLink to fulfill an additional requirement: the set of Atoms it samples from is restricted to Atoms that are able to produce the desired target of the backward chaining step. The desired target itself is to be chosen from a different distribution, which in the simplest case could just be

DefineLink
  DefinedSchemaNode "get bc target"
  SampleLink
    DefinedSchemaNode "get importance"

That is, in the simplest case, we can choose the target for backward chaining according to the "goal importance" values associated with Atoms (and note that, syntactically, we can assume if a SampleLink only has one argument, the second argument is implicitly the entire Atomspace).

The DefinedSchemaNode "boost importance", invoked in the "backward step" Schema, is assumed to boost the goal importance values of the Atoms involved in the combination that has been produced by the SampleLink and then executed. To make this works sensibly, either there needs to be some importance decay process in the system (such as OpenCog's ECAN module has built in), or the "boost goal importance" function has to explicitly take goal importance from other Atoms to give it to the Atoms in the current combination.


Sugaring the Syntax A Bit

Just for fun, let's see what the above "Attention Allocation" example would look like if we were using a more concise Atomese syntax... along the general lines suggested at Atomese_Syntax_Musings , but even more so...

The simple STI spreading algorithm given above would be written like this in a fairly well sugared Atomese:

Define "spread STI to chosen Atom"
  $sourceAtom
  SequentialExecution
    @ "spread STI among"
        @ "get STI"
        $sourceAtom
    @ "get links to neighbors"
        $sourceAtom
    @ "decrement STI"
        $sourceAtom

Define "choose Atom and spread importance from it"
  @ "spread STI to chosen Atom"
    Sample
      DefinedSchema "get STI"

(the new thing I've experimentally introduced here is @ as a shorthand for (ExecutionLink (DefinedSchemaNode( ... )

These syntactic sugars are not currently implemented, and will probably be revised or replaced with other sugars before implementation happens. But the above example may give some flavor of what a more elegantly syntaxed "Atomese as probabilistic programming language" might look like...

Videos on Probabilistic Programming

Nil gives a tutorial on Probabilistic Programming with examples at the Cogathon (at Robotics Garage, Science Park in Hong Kong) https://www.youtube.com/watch?v=CvUDMvMnFVc

Probability & Pattern Mining for Inference Control - Video Lecture with Ben Goertzel https://www.youtube.com/watch?v=5t_XxFWSoWo