- 1 Propositions About OCP
- 1.1 When PLN Inference Beats BOA
- 1.2 Conditions for the Usefulness of Hebbian Inference Control
- 1.3 Clustering-together of Smooth Theorems
- 1.4 When PLN is Useful Within MOSES
- 1.5 When MOSES is Useful Within PLN
- 1.6 On the Smoothness of Some Relevant Theorems
- 1.7 Recursive Use of "MOSES+PLN" to Help With Attention Allocation
- 1.8 The Value of Conceptual Blending
- 1.9 A Justification of Map Formation
- 1.10 Concluding Remarks
Propositions About OCP
On this page we present some speculations regarding the extension of the approach to MOSES-theory presented in OpenCogPrime:PropositionsAboutMOSES to handle OCP in general. This is of course a much more complex and subtle matter, yet we suggest that in large part it may be handled in a similar way. This way of thinking provides a different perspective on the OCP design — one that has not yet substantially impacted the practical aspects of the design, but may well be of use to us as we iteratively refine the design in the future, in the course of testing and teaching OCP AGI systems.
As with the propositions in previous section but even more so, the details of these heuristic propositions will likely change a fair bit when/if rigorous proof/statement is attempted. But we are intuitively fairly confident that the basic ideas described here will hold up to rigorous analysis.
Finally, one more caveat: the set of propositions listed here is not presented as complete. By no means! A complete theoretical treatment of OCP, along these lines, would involve a more substantial list of related propositions. The propositions given here are meant to cover a number of the most key points, and to serve as illustrations of the sort of AGI theory we believe/hope may be possible to do in the near and medium term future.
When PLN Inference Beats BOA
This proposition explains why, in some cases, it will be better to use PLN rather than BOA within MOSES, for modeling the dependencies within populations of program trees.
Slogan 5 Complex cross-modular dependencies which have similar nature for similar fitness functions ==> PLN inference is better than BOA for controlling exemplar extension
Proposition 5 Consider the classification problem of distinguishing fit genotypes from less fit genotypes, within a deme. If
- significantly greater classification accuracy can be obtained by classification rules containing "cross-terms" combining genotype elements that are distant from each other within the genotypes, but
- the search space for finding these classification rules is tricky enough that a greedy learning algorithm like decision-tree learning (which is used within BOA) isn't going to find the good ones
- the classification rules tend to be similar, for learning problems for which the fitness functions are similar
Then, PLN will significantly outperform BOA for exemplar extension within MOSES, due to its ability to take history into account.
Conditions for the Usefulness of Hebbian Inference Control
Now we turn from MOSES to PLN proper. The approximate probabilistic correctness of PLN is handled via PLN theory itself, as presented in the PLN book. However, the trickiest part of PLN in practice is inference control, which in the OCP design is proposed to be handled via "experiential learning." This proposition pertains to the conditions under which Hebbian-style, inductive PLN inference control can be useful.
Slogan 6 If similar theorems generally have similar proofs, then inductively-controlled PLN can work effectively
- Let L_0 = a simple "base level" theorem-proving framework, with fixed control heuristics
- For n>0, let L_n = theorem-proving done using L_(n-1), with inference control done using data mining over a DB of inference trees, utilizing L_(n-1) to find recurring patterns among these inference trees that are potentially useful for controlling inference
Then, if T is a set of theorems so that, within T, theorems that are similar according to "similarity provable in L_(n-1) using effort E" have proofs that are similar according to the same measure, then L_n will be effective for proving theorems within T
Clustering-together of Smooth Theorems
This proposition is utilized within Theorem 8, below, which again has to do with PLN inference control.
Slogan 7 "Smooth" theorems tend to cluster together in theorem-space
Proposition 7 Define the smoothness of a theorem as the degree to which its proof is similar to the proofs of other theorems similar to it. Then, smoothness varies smoothly in theorem-space. I.e., a smooth theorem tends to be close-by to other smooth theorems.
Above it was argued that PLN is useful within MOSES due to its capability to take account of history (across multiple fitness functions). But this is not the only reason to utilize PLN within MOSES; Propositions 6 and 7 above give us another theoretical reason.
Proposition 8 If similar theorems of the form "Program A is likely to have similar behavior to program B" tend to have similar proofs, and the conditions of Slogan 6 hold for the class of programs in question, then inductively controlled PLN is good (and better than BOA) for exemplar extension. (This is basically Proposition 6 + Proposition 7)
We have explored theoretical reasons why PLN should be useful within MOSES, as a replacement for the BOA step used in the standalone implementation of MOSES. The next few propositions work in the opposite direction, and explore rasons why MOSES should be useful within PLN, for the specific problem of finding elements of a set given a qualitative (intensional) description of a set. (This is not the only use of MOSES for helping PLN, but it is a key use and a fairly simple one to address from a theoretical perspective.)
Proposition 9 In a universe of sets where intensional similarity and extensional similarity are well-correlated, the problem of finding classification rules corresponding to a set S leads to a population of decently fit candidate solutions with high syntactic/semantic correlation [so that demes are good for this problem]
Proposition 10: In a universe of sets satisfying Proposition 9, where sets have properties with complex interdependencies, BOA will be useful for exemplar extension (in the context of using demes to find classification rules corresponding to sets)
Proposition 11: In a universe of sets satisfying Proposition 10, where the interdependencies associated with a set S's property-set vary "smoothly" as S varies, working inference is better than BOA for exemplar extension
Proposition 12: In a universe of sets satisfying Proposition 10, where the proof of theorems of the form "Both the interdependencies of S's properties, and the interdependencies of T's properties, satisfy predicate F" depends smoothly on the theorem statement, then inductively controlled PLN will be effective for exemplar extension
On the Smoothness of Some Relevant Theorems
We have talked a bit about smooth theorems, but what sorts of theorems will tend to be smooth? If the OCP design is to work effectively, the "relevant" theorems must be smooth; and the following proposition gives some evidence as to why this may be the case.
Proposition 13 In a universe of sets where intensional similarity and extensional similarity are well-correlated, probabilistic theorems of the form "A is a probabilistic subset of B" and "A is a pattern in B" tend to be smooth....
Note that: For a set S of programs, to say "intensional similarity and extensional similarity are well-correlated" among subsets of S, means the same thing as saying that syntactic and semantic similarity are well-correlated among members of S
Proposition 14 The set of motor control programs, for a set of standard actuators like wheels, arms and legs, displays a reasonable level of correlation between syntactic and semantic similarity
Proposition 15 The set of sentences that are legal in English displays a high level of correlation between syntactic and semantic similarity.
(The above is what, in Chaotic Logic, I called the "principle of continuous compositionality", extending Frege's Principle of Compositionality. It implies that language is learnable via OCP-type methods.... Unlike the other Propositions formulated here, it is more likely to be addressable via statistical than formal mathematical means; but insofar as English syntax can be formulated formally, it may be considered a *roughly-stated) mathematical proposition.)
Recursive Use of "MOSES+PLN" to Help With Attention Allocation
Proposition 16 The set of propositions of the form "When thinking about A is useful, thinking about B is often also useful" tends to be smooth — if "thinking" consists of MOSES plus inductively controlled PLN, and the universe of sets is such that this cognitive approach is generally a good one
This (Prop. 16) implies that adaptive attention allocation can be useful for a MOSES+PLN system, if the attention allocation itself utilizes MOSES+PLN
The Value of Conceptual Blending
Proposition 17: In a universe of sets where intensional similarity and extensional similarity are well-correlated, if two sets A and B are often useful in proving theorems of the form "C is a (probabilistic) subset of D", then "blends" of A and B will often be useful for proving such theorems as well.
This is a justification of conceptual blending for concept formation
A Justification of Map Formation
Proposition 18: If a collection of terms A is often used together in MOSES+PLN, then similar collections B will often be useful as well, for this same process ... assuming the universe of sets is so that intensional and extensional similarity are correlated, and MOSES+PLN works well
This is a partial justification of map formation, in that finding collections B similar to A is achieved by encapsulating A into a node A' and then doing reasoning on A'
The above set of propositions is certainly not complete. For instance, one might like to throw in conjunctive pattern mining as a rapid approximation to MOSES; and some specific justification of artificial economics as a path to effectively utilizing MOSES/PLN for attention allocation; etc.
But, overall, it seems fair to say that the above set of propositions smells like a possibly viable path to a theoretical justification of the OCP design.
To summarize the above ideas in a nutshell, we may say that the effectiveness of the OCP design appears intuitively to follow from the assumptions that:
- within the space of relevant learning problems, problems defined by similar predicates tend to have somewhat similar solutions
- according to OCP's knowledge representation, procedures and predicates with very similar behaviors often have very similar internal structures, and vice versa (and this holds to a drastically lesser degree if the "very" is removed)
- for relevant theorems ("theorems" meaning Atoms whose truth values need to be evaluated, or whose variables or SatisfyingSets need to be filled in, via PLN): similar theorems tend to have similar proofs, and the degree to which this holds varies smoothly in proof-space
- the world can be well modeled using sets for which intensional and extensional similarity are well correlated: meaning that the mind can come up with a system of "extensional categories" useful for describing the world, and displaying characteristic patterns that are not too complex to be recognized by the mind's cognitive methods
To really make use of this sort of theory, of course, two things would need to be done. For one thing, the propositions would have to be proved (which will probably involve some serious adjustments to the proposition statements). For another thing, some detailed argumentation would have to be done regarding why the "relevant problems" confronting an embodied AGI system actually fulfill the assumptions. This might turn out to be the hard part, because the class of "relevant problems" is not so precisely defined. For very specific problems like — to name some examples quasi-randomly — natural language learning, object recogntion, learning to navigate in a room with obstacles, or theorem-proving within a certain defined scope, however, it may be possible to make detailed arguments as to why the assumptions should be fulfilled.
Recall that what makes OCP different from huge-resources AI designs like AIXI (including AIXI-tl) and the GodelMachine is that it involves a number of specialized components, each with their own domains and biases and some with truly general potential as well, hooked together in an integrative architecture designed to foster cross-component interaction and overall synergy and emergence. The strength and weakness of this kind of architecture is that it is specialized to a particular class of environments. AIXItl and the Godel Machine can handle any type of environment roughly equally well (which is: very, very slowly), whereas, OCP has the potential to be much faster when it's in environment that poses it learning problems that match its particular specializations. What we have done in the above series of propositions is to partially formalize the properties an environment must have to be "OCP-friendly." If the propositions are essentially correct, and if interesting real-world environments largely satisfy their assumptions, then OCP is a viable AGI design.