URE Configuration Format
- 1 Quick Start
- 2 Configuration overview
- 3 Naming rulesets
- 4 Configuration Parameters
- 5 Defining Rules
- 6 Adding rules to a ruleset
- 7 Control Policy
- 8 Usage
Here's a quick sketch of how to set up some rules for the rule engine:
; Define some rules (DefineLink (DefinedSchema "foo") (BindLink ...)) (DefineLink (DefinedSchema "bar") (BindLink ...)) ; Give a name to the rulebase (Inheritance (Concept "my-rule-base") (Concept "URE")) ; Add the rules to the rulebase (Member (DefinedSchema "foo" (stv 0.4 1)) (Concept "my-rule-base")) (Member (DefinedSchema "bar" (stv 0.6 1)) (Concept "my-rule-base")) ; Create a convenience wrapper. (define (my-forward-chainer SRC) (cog-fc (Concept "my-rule-base") SRC)
Each of these steps is explained and reviewed in greater detail below.
- All configuration parameters for the URE live in the AtomSpace.
- Multiple configurations for different rule systems (PLN, R2L, Atomese Reduct, etc) may co-exist side by side on the same AtomSpace.
- Rule systems can be decomposed into subsystems (R2L into R2L-en, R2L-in, etc).
Given those constraints the following initial format is suggested. It is inspired by (and almost identical to) Amen's r2l-en config file /opencog/nlp/relex2logic/r2l-en-rulebase.scm.
A set of rules is named by using a ConceptNode. For example,
ConceptNode "PLN" ConceptNode "R2L"
names two differnt rule systems: PLN and R2L. These systems may be subdivided into subsystems. For example:
ConceptNode "PLN-crisp" ConceptNode "PLN-uncertain" ConceptNode "PLN-quantifier"
Subsystems should be connectd to each other by inheritance relationships:
InheritanceLink ConceptNode "PLN-quantifier" ConceptNode "PLN"
These inheritance relationships are used to pass configuration parameters to the subsystems. For instance, if the control policy is the same across all the PLN subsystems, the parameters pertaining to the control policy will only need to be defined on "PLN".
All systems maybe inherit from
InheritanceLink ConceptNode "PLN" ConceptNode "URE"
Any parameter set to URE will automatically be inherited by all systems, unless they are overwritten within the (sub-)system.
The maximum number of iterations can be set to 20 for the URE as follows:
ExecutionLink SchemaNode "URE:maximum-iterations" ConceptNode "URE" NumberNode 20
This can be overwritten for PLN and all of the PLN subsystems as follows
ExecutionLink SchemaNode "URE:maximum-iterations" ConceptNode "PLN" NumberNode 2000
A rule is defined as follows:
DefineLink <rule-alias> <rule-body>
BindLink <variables> AndLink <clause-1> ... <clauses-n> <conclusion>
<variables> is a variable or list thereof (it is strongly advised to type each variable so that the URE doesn't get trapped into infinite recursions, see TypedVariableLink),
<clauses-i> is either a pattern to match or a precondition (a virtual clause).
<conclusion> is either a pattern or a formula call (note that in order to be compatible with the backward chainer it must be a formula call, see the related issue)
ExecutionOutputLink <formula> <arguments>
- a ListLink, in such a case the first argument represents the conclusion pattern and the following ones the premises (note that one can wrap the premises in a SetLink in case the formula is symmetrical, this may indeed speed up the backward chainer),
- or something else, in such case it represents the conclusion.
Also it is highly recommended to have the formula's premises being optional, using for instance in scheme
(define (formula (conclusion . premises) ...)
that way in case some premises are missing (which can happen if formula calls are nested and some results turn out to be undefined) no exception will be raised by the pattern matcher which will speed up formula application (because processing exceptions is deadly slow).
Adding rules to a ruleset
Rules are added to a ruleset by declaring them as members. Rules must be named to be added to a ruleset. For example:
MemberLink DefinedSchemaNode "my-rule" ConceptNode "PLN"
The truth-value on the MemberLink may be set, to define a preference for the usage of the rule. Semantically it represents the probability that a rule will produce the desire outcome. Uncertainty is taken into account so be careful how you set the confidence. The default TV (stv 1 0) will be used by default, the null confidence will have as consequence that the rule will be picked according to a uniform distribution.
The reason we want to use
as opposed to the rule itself (the BindLink) is to store the rule name in the AtomSpace. This is convenient to create more human-readable inference traces.
There exists an scheme function `ure-add-rules` to easily define a rule set, for instance
(ure-add-rules my-rbs (list rule-1 ... rule-n))
MemberLink <rule-1> <my-rbs> ... MemberLink <rule-n> <my-rbs>
If you which to associate TVs to the rules you may use pairs of rule and TV instead
(ure-add-rules my-rbs (list (list rule-1 tv-1) ... (list rule-n tv-n)))
which will produce
MemberLink <tv-1> <rule-1> <my-rbs> ... MemberLink <tv-n> <rule-n> <my-rbs>
Operation of the chainers is controlled by several parameters, including configuring a fitness function for selecting sources (for the forward chainer), targets (for the backward chainer), rules, specifying a termination criterion, and breath vs depth search.
There are at least 2 fitness functions involved:
- Fitness for choosing the next source (forward chainer)
- Fitness for choosing the next target (backward chainer)
- Fitness for choosing the next rule given a certain source or target
Fitness for choosing the next source
Currently in the code, the fitness for choosing the next source or target is hardwired in URECommons::tv_fitness. This will have to be addressed and this section updated accordingly.
Fitness for choosing the next target
These are not exposed as parameters yet, the default one is based on confidence, the target with the least confidence will be choosen first, since the default BC goal is to maximize confidence.
Fitness for choosing the next rule
The easiest way to control rule choice is by their associated TVs on the MemberLink. For instance
MemberLink <0.1 0.01> <PLN-modus-ponens-name> ConceptNode "PLN" MemberLink <0.2 0.01> <PLN-deduction-rule-name> ConceptNode "PLN"
would indicate that the deduction rule has a probability of 0.2 of producing the desire outcome, thus twice more than the modus ponens rule. The confidence of 0.01 allows greater exploration, if it were 1, then given the choice between these 2 rules, the deduction rule would always be picked.
TODO: further control is possible be specifying inference control rule, documentation is still in the making.
At this time, the only stopping criteria is the number of steps. The parameters are stored in ExecutionLinks. These take the form:
ExecutionLink SchemaNode "URE:maximum-iterations" <rule-base> <max-iterations>
Boolean criteria are represented with EvaluationLinks, taking the form
EvaluationLink <TV> PredicateNode "URE:attention-allocation" <rule-base>
If TV.strength is > 0.5, then it indicates that attention allocation is enabled.
Breath vs Depth Search
The backward chainer allows to control the degree of breath and depth search via a complexity penalty parameter. The higher the complexity penalty, the closer it is to a breath first search. The lower the complexity penalty, the closer it is to a depth first search.
ExecutionLink SchemaNode "URE:BC:complexity-penalty" <rule-base> <cpx>
cpx ranges from 0 to +inf.
The backward chainer allows to control the grows of the Back-Inference Tree (BIT). The following parameter will only allow it to grow till a certain size, once this since reached, portion of the BIT will be trimmed based on the likelihood of being expanded, that is portions that are the least likely to be expanded will be removed first.
ExecutionLink SchemaNode "URE:BC:maximum-bit-size" <rule-base> <size>
size is negative or null then the BIT can grow without limit.
User Defined Control Policy
Ultimately, we need to allow the user to define their own control policy (without having to fiddle with the C++ URE code). One possible way to do this would be to have the fitness functions and termination criteria being user re-definable. We would need to provide enough access functions to expose all relevant knowledge to control an inference, like it previous N steps, etc. Then this might be just expressive enough to let the user define any control policy he/she wants, just as macros (applying a sequence of rules in a certain order), mutual exclusivity, etc.
So far, there are two functions provided by the URE, the forward and the backward chainer, called, respectively,
cog-bc. Before invoking either of them, you should have defined and loaded the rule-base, its rules and configuration parameters, inside the atomspace, as described above in this page.
To use the chainers, you need to pass in argument of
- The rule-base.
- The source (for
cog-fc) or target (for
cog-bc). You may wrap multiple sources in a SetLink. The empty SetLink considers the entire atomspace (or focus set) as sources, but will apply all rules at once. Another way to consider the entire atomspace (or focus set) as sources is to use a mere VariableNode, possibly typed but not necessarily.
- Optionally the variable declaration of the source/target. If you wish to use this argument add #:vardecl <my-vardecl> in the argument list.
- Optionally the focus set. If you wish to use this argument add #:focus-set <my-focus-set> in the argument list.