Here's a quick sketch of how to set up some rules for the rule engine:
; Import ure module (use-modules (opencog ure))
; Define rules (define foo (DefinedSchema "foo")) (DefineLink foo (BindLink ...))
(define bar (DefinedSchema "bar")) (DefineLink bar (BindLink ...))
; Define rulebase (define rbs (Concept "my-rule-base"))
; Add and weight rules to the rulebase (ure-add-rules rbs (list (cons foo (stv 0.4 0.1)) (cons bar (stv 0.6 0.5)))
; Create a convenience wrapper (fc stands for forward chainer) (define (my-fc SRC) (cog-fc (Concept "my-rule-base") SRC)
Each of these steps is explained and reviewed in greater detail below.
- All configuration parameters for the URE live in the AtomSpace.
- Multiple configurations for different rule systems (PLN, R2L, Atomese Reduct, etc) may co-exist side by side on the same AtomSpace.
- Rule systems can be decomposed into subsystems (R2L into R2L-en, R2L-in, etc) [not fully implemented].
Given these requirements the following initial format has been suggested, inspired by Amen's r2l-en configuration /opencog/nlp/relex2logic.
A set of rules is named by using a ConceptNode. For example,
ConceptNode "PLN" ConceptNode "R2L"
names two differnt rule systems: PLN and R2L. These systems may be subdivided into subsystems. For example:
ConceptNode "PLN-crisp" ConceptNode "PLN-uncertain" ConceptNode "PLN-quantifier"
Subsystems should be connected to each other by inheritance relationships [not implemented yet]:
InheritanceLink ConceptNode "PLN-quantifier" ConceptNode "PLN"
These inheritance relationships are used to pass configuration parameters to the subsystems. For instance, if the control policy is the same across all the PLN subsystems, the parameters pertaining to the control policy will only need to be defined on "PLN".
All systems maybe inherited from [not implemented yet]
InheritanceLink ConceptNode "PLN" ConceptNode "URE"
Any parameter set to URE will automatically be inherited by all systems, unless they are overwritten within the (sub-)system.
A rule is defined as follows:
DefineLink <rule-alias> <rule-body>
BindLink <variables> AndLink <clause-1> ... <clauses-n> <conclusion>
<variables> is a variable or list thereof (it is strongly advised to type each variable so that the URE doesn't get trapped into infinite recursions, see TypedVariableLink),
<clauses-i> is either a pattern to match or a precondition (a virtual clause).
<conclusion> is either a pattern or a formula call (note that in order to be compatible with the backward chainer it must be a formula call, see the related issue)
ExecutionOutputLink <formula> <arguments>
- a ListLink, in such a case the first argument represents the conclusion pattern and the following ones the premises (note that one can wrap the premises in a SetLink in case the formula is symmetrical, this may indeed speed up the backward chainer),
- or something else, in such case it represents the conclusion.
Also it is highly recommended to have the formula's premises being optional, using for instance in scheme
(define (formula (conclusion . premises) ...)
that way in case some premises are missing (which can happen if formula calls are nested and some results turn out to be undefined) no exception will be raised by the pattern matcher which will speed up formula application (because processing exceptions is deadly slow).
Adding Rules to Ruleset
Rules are added to a ruleset by declaring them as members. Rules must be named to be added to a ruleset. For example:
MemberLink DefinedSchemaNode "my-rule" ConceptNode "PLN"
The truth-value on the MemberLink may be set, to define a preference for the usage of the rule. Semantically it represents the probability that a rule will produce the desire outcome. Uncertainty is taken into account so be careful how you set the confidence. The default TV (stv 1 0) will be used by default, the null confidence will have as consequence that the rule will be picked according to a uniform distribution.
The reason we want to use
as opposed to the rule itself (the BindLink) is to store the rule name in the AtomSpace. This is convenient to create more human-readable inference traces.
There exists an scheme function `ure-add-rules` to easily define a rule set, for instance
(ure-add-rules my-rb (list rule-1 ... rule-n))
MemberLink <rule-1> <my-rb> ... MemberLink <rule-n> <my-rb>
If you which to associate TVs to the rules you may use pairs of rule and TV instead
(ure-add-rules my-rb (list (rule-1 . tv-1) ... (rule-n . tv-n)))
which will produce
MemberLink <tv-1> <rule-1> <my-rb> ... MemberLink <tv-n> <rule-n> <my-rb>
The URE accepts a number of parameters to control its search, such as the number of iterations, whether it will go more depth first or breadth first, etc. Most parameters apply to both the forward and the backward chainer, while some only apply to one or the other.
Parameters are directly stored in the atomspace, with in mind making Atomese programs easier to them to control the URE. Given that there are ways to set URE parameters
- Directly code the parameters in the atomspace.
- Use helpers from the rule-engine (see below).
- Call the forward or backward chainer with optional arguments (see usage).
Common Parameters to Forward and Backward Chainer
Maximum number of iterations
Each iteration corresponds to an expansion, either forward or backward of a selected inference tree. For instance to set the maximum number of iterations to 20 of the my-rb rule-base, one may use the rule-engine helper
(ure-set-maximum-iterations my-rb 20)
which will add the following to the atomspace
ExecutionLink SchemaNode "URE:maximum-iterations" my-rb NumberNode 20
The complexity penalty actually control how much depth first vs breadth first the inference tree expansion will have. For example
(ure-set-complexity-penalty my-rb 0.1)
will favor simpler inference trees, thus making the search somewhat breadth first. A value of 0 is neutral, while a negative value favors depth first.
Likewise it will add the following to the atomspace
ExecutionLink SchemaNode "URE:complexity-penalty" my-rb NumberNode 0.1
Forward Chainer Parameters
Retry exhausted sources
The forward chainer is actually an iterative chainer, instead of explicitly building an inference tree and running it, it applies each selected rule one after the other, collect the results and re-use them as sources. For that reason depending on the knowledge and rule bases, the same rule could be applied over the same source while generating different results. That parameter allows to retry sources indefinitely even if they have been used by all unifying rules. For example to enable it
(ure-set-fc-retry-exhausted-sources my-rb #t)
which will add the following to the atomspace
EvaluationLink (stv 1 1) PredicateNode "URE:FC:retry-exhausted-sources" my-rb
If you enable retrying exhausted sources, it is highly recommended to set a maximum number of iterations, otherwise the URE might never end.
Backward Chainer Parameters
Maximum BIT size
Set a maximum number of inference trees the BIT (Back Inference Tree, which is in fact the population of inference trees) can hold. Negative (the default) means unlimited. For instance to limit it to 1000
(ure-set-bc-maximum-bit-size my-rb 1000)
which will add the following to the atomspace
ExecutionLink SchemaNode "URE:BC:maximum-bit-size" my-rb NumberNode 1000
The main ways to control the URE are
- Parameters, such as complexity penalty to control depth vs breadth, see configuration parameters.
- Inference rule weights, to determine the priority and uncertainty of the rules, see rule weights.
- Fitness functions, see fitness function. As of today only maximize confidence is implemented.
- Attention Allocation [not integrated yet]. Selecting the next inference tree to expand, source, premise or rule could be done via ECAN.
- Control rules. This is the finest mechanism to control the URE. The user can provide cognitive schematics as control rules, to essentially dynamically calculate the weights of the next inference rules to select.
For now only one fitness function is provided and the user has no control over it. It attempt to maximize the confidence of the target for the backward chainer. For the forward chainer no such fitness exists (beside simple parameters like complexity penalty)
However other fitness functions are desired. One in particular would be to maximize strength.
MemberLink <0.1 0.01> <PLN-modus-ponens-name> ConceptNode "PLN" MemberLink <0.2 0.01> <PLN-deduction-rule-name> ConceptNode "PLN"
would indicate that the deduction rule has a probability of 0.2 of producing the desire outcome, thus twice more than the modus ponens rule. The confidence of 0.01 allows greater exploration, if it were 1, then given the choice between these 2 rules, the deduction rule would always be picked. The URE uses Thompson Sampling to adequately balance exploration and exploitation based on the given weights.
Here are the follow criteria to terminate the URE
- The number of iterations has reached its maximum, see maximum number of iteration parameter.
- In case of the forward chainer, all sources have been exhausted, see retry exhausted sources parameter to disable it.
- In case of the backward chainer, all inference trees have been constructed and tried. Typically the space of inference trees if infinite, but in some cases, when the knowledge base and rule base are sufficiently limited, it is finite and the URE will terminate if has tried them all.
User Defined Chainer [not implemented yet]
Ultimately, we need to allow the user to define their own chainer (without having to fiddle with the C++ URE code). One possible way would be to expose to scheme (or python, haskell) primitive involved with the URE C++ code, such as selecting a rule, unifying a rule with a premise or target, applying such unified rule, etc. By a fortiori exposing to Atomese, then OpenCog itself would be able to implement its own chainer, and own control policy as well.
So far, there are two main functions provided by the URE, the forward and the backward chainer, called, respectively,
cog-bc. Before invoking either of them, you should have defined and loaded the rule-base, its rules and configuration parameters, inside the atomspace, as described above in this page.
To use the chainers, you need to pass in argument of
- The rule-base.
- The source (for
cog-fc) or target (for
cog-bc). You may wrap multiple sources in a SetLink. The empty SetLink considers the entire atomspace (or focus set) as sources, but will apply all rules at once. Another way to consider the entire atomspace (or focus set) as sources/targets is to use a mere VariableNode, preferably typed but not necessarily.
- Optionally the variable declaration of the source/target. If you wish to use this argument add #:vardecl <my-vardecl> in the argument list.
- Optionally the focus set. If you wish to use this argument add #:focus-set <my-focus-set> in the argument list.
- Optionally, configuration parameters defined in URE_Configuration#Configuration_Parameters.
Given a rule-base
(define my-rb (Concept "my-rb"))
(define target (Inheritance (Variable "$X") (Variable "$Y")))
with variable declaration
(define vardecl (VariableList (TypedVariable (Variable "$X") (Type "ConceptNode")) (TypedVariable (Variable "$Y") (Type "ConceptNode"))))
one may call the backward chainer with 100 iterations as follows
(cog-bc my-rb target #:vardecl vardecl #:maximum-iterations 100)
which will output a SetLink of inferred grounded inheritance links.
For more help on
cog-bc you may invoke guile's online help
(help cog-fc) (help cog-bc)