MindOntology:EstimationOfDistributionAlgorithms

From OpenCog
Jump to: navigation, search

Estimation of Distribution Algorithms

The somewhat awkward term Estimation of Distribution Algorithm refers to an approach to optimization and program learning that combines aspects of evolutionary learning algorithms like genetic algorithms with probabilistic modeling, thus retaining the breadth and flexibility of typical evolutionary algorithms but achieving greater learning efficiency and greater ease of integration.

While EDA algorithms are quite different from Genetic Algorithms and Genetic Programming, conceptually they derive largely from this tradition, and the reader will have an easier time understanding them if he has familiarity with GA and GP as obtained e.g. from reading Goldberg's (1984) and Koza's (1991) classic texts.

The essential idea of EDAs is to replace the crossover and mutation involved in evolution with a probabilistic inference process that explicitly analyzes an evolving population, observing which patterns in the genetic material of individuals tend to correlate with fitness, and then creating new entities that display these fitness-encouraging patterns.

GA and GP have a great number of strengths, compared to other optimization approaches. In the case of GP in particular, there are really no competing methodologies possessing anywhere near a comparable breadth of applicability. But yet, anyone who has used GA/GP in practical applications has noticed the main shortcoming of the methodology: execution speed. Solving serious problems can require very large population sizes, and evolving large population sizes for an adequate number of generations can take a very long time. Because of this, it is desirable to look for ways to speed up GA/GP, or else for alternate methods that preserve the strengths of evolutionary algorithms but have faster execution speeds.

A number of techniques have been tried for speeding up GA/GP, and some have proven effective, but none dramatically so. One of the better techniques is fitness inheritance. In this strategy, for a certain percentage of the population, one does not actually evaluate fitness; instead one infers the likely fitness of the population element. A crude way of doing this is to assign an element the average fitness of its parents. A more sophisticated way is to do some probabilistic inference on the fitness of the existing population. Simple versions of this technique give modest improvements; on real-world problems the speedup is often around 20%. The speedup provided by advanced versions has not been well studied yet.

The search for alternate optimization methods preserving the strengths of evolutionary algorithms but providing greater efficiency, has proved somewhat more fruitful. This is where Estimation of Distribution Algorithms come in. An EDA uses the same basic framework as GA/GP — it iterates a population of estimated answers, and gradually converges on a population containing a lot of very good answers. But it dispenses with crossover and mutation, replacing them with probabilistic modeling — which may then be extended in various ways to incorporate various sorts of data not considered in the standard evolutionary learning approach.

In terms of natural evolution, these probabilistic modeling approaches can be thought of in terms of an eccentric sort of "fully eugenic evolutionary process." In this fully eugenic process, organisms do not mate with each other and produce offspring. Rather, new organisms are created by an "evolution master." who has:

  1. The ability to observe the DNA of all the organisms in the population.
  2. The ability to create a new organism from a DNA sequence, and place this organism in the population, in place of one of the existing population members.
  3. The ability to observe the fitness of all the organisms in the population (according to the fitness criterion that interests him).

Note that the evolution master can proceed a priori theoretical understanding of the way the DNA gives rise to organisms, and no knowledge useful for building such a theoretical understanding, except for his observations of the organisms' DNA sequences and fitness. On the other hand, if he does have a priori knowledge, there's nothing stopping him from using it to improve his judgments.

What will an evolution master fulfilling these criteria do? If he is highly rational, he will tabulate statistics regarding which genes, and which combination of genes, tend to lead to high fitness. Then he will create new organisms embodying the combination of genes that tend to lead to high fitness. He will do this according to probability theory. For instance, he won't create a whole bunch of new organisms embodying exactly the maximally-fitness-inducing genes found. Rather, he'll use the fitness-inducing genes and combination found with probabilities roughly proportional to their correlations with fitness. If he has a priori knowledge, this can be incorporated as "prior knowledge" in his probabilistic estimates.

A population ruled over by such an evolution master, it seems clear, will evolve more rapidly than one that operates by conventional evolution. The crossover operator performs a crude sort of probabilistic modeling over the evolving population. It is a heuristic relying on the assumption that a fit organism is likely to have fit genomic subsequences, so that the combination of the genomes of two fit organisms is likely to produce a genome leading to a fit organism. But this heuristic is just a crude approximation to what the evolution master is doing, which is actually gathering accurate statistics on the fitness of sets of genes.

Now, real-world evolution doesn't have the option to use an evolution master. A natural evolving population is a "fully distributed computing system," meaning that there is no centralized controller running the evolution process; rather, the evolution process must emerge from the local activities of the organisms in the population. Crossover is a wonderfully clever algorithm for distributed optimization. But in an AI system implemented on a cluster of von Neumann machines, such a widely distributed algorithm is not necessary. A centralized algorithm involving an evolution master involves less rather than more overhead, and also provides far greater mathematical efficiency.

Implementing a fully rational evolution master, using probability theory to make maximally accurate analysis of gene sets, is not computationally viable. The best we could do in a OCP context would be to use full-on OCP cognition to make inferences about the fitness of genes and gene combination. In some cases this might actually be warranted — if one were optimizing a function for which fitness evaluation was extremely, extremely, extremely expensive. In most cases, however, an approximation is needed — a probabilistic reasoning algorithm that is rapid to execute and captures a reasonable amount of statistical information about the fitness of genes and gene combination in the population. Now, this is exactly what crossover and mutation are — an approximate probabilistic reasoning algorithm, rapid to execute at each individual generation (though taking a large number of generations to converge to a good answer). But it is possible to introduce approximate probabilistic inference algorithms that are more effective than the standard genetic operations, without introducing unacceptable overhead.

For example, this is what the Bayesian Optimization Algorithm (BOA; Pelikan, 2002) — the first thoroughly-tested and demonstrated EDA — does, in a relatively simplistic way. In BOA, Bayes net or decision tree models are used to create probabilistic models of each gene in the genome. For instance, an ordering of the genes may be assumed, and then a decision tree may be used to predict the value of an element based on the values of the elements occurring earlier in the ordering. OCP-EDA moves away from BOA's very simple probabilistic inference framework and introduces subtler but more expensive probabilistic inference methods, optionally but not exclusively going to the extreme of using full OCP integrative intelligence for population modeling.