MOSES man page
From OpenCog
Man page for MOSES (raw dump).
MOSES(1) OpenCog Learning MOSES(1) NAME moses - meta-optimizing semantic evolutionary search solver SYNOPSIS moses -h|--help moses --version moses [-H problem_type] [-i input_filename] [-u target_feature] [--enable-fs=1] [-Y ignore_feature] [-m max_evals] [--max-time seconds] [-f logfile] [-l loglevel] [-r random_seed] [--mpi=1] [-n ignore_operator] [-N use_only_operator] [-o output_file] [--boost] [-A max_score] [-a algo] [-B knob_effort] [-b nsamples] [-C1] [-c result_count] [-D max_dist] [-d1] [-E reduct_effort] [-e exemplar] [-F] [-g max_gens] [-I0] [-j jobs] [-k prob‐ lem_size] [-L1] [-P pop_size_ratio] [-p probability] [--python=1] [--python3=1] [-Q hardness] [-q min] [-R threshold] [-S0] [-s] cache_size] [--score-weight column] [-T1] [-t1] [-V1] [-v temperature] [-W1] [-w max] [--well-enough=1] [-x1] [-y combo_program] [-Z1] [-z complexity_ratio] DESCRIPTION moses is the command-line wrapper to the MOSES program learning library. It may be used to solve learning tasks specified with file inputs, or to run various demonstration problems. Given an input table of values, it will perform a regression, generating, as output, a combo program that best fits the data. moses is multi-threaded, and can be distributed across multiple machines to improve run-time performance. EXAMPLE As an example, the input file of comma-separated values: a,b,out 0,0,0 0,1,0 1,0,0 1,1,1 when run with the flags -Hit -u out -W1 will generate the combo program and($a $b). This program indicates that the third column can be modelled as the boolean-and of the first two. The -u option specifies that it is the third column that should be modelled, and the -W1 flag indicates that col‐ umn labels, rather than column numbers, should be printed on output. OVERVIEW moses implements program learning by using a meta-optimization algorithm. That is, it uses two optimization algorithms, one wrapped inside the other, to find solutions. The outer optimization algorithm selects candidate pro‐ grams (called exemplars), and then creates similar programs by taking an exemplar and inserting variables (called knobs) in selected locations. The inner optimization algorithm then searches for the best possible 'knob set‐ tings', returning a set of programs (a deme) that provide the best fit to the data. The outer optimization algorithm then selects new candidates, and performs a simplification step, reducing these to a normal form, to create new exemplars. The process is then repeated until a perfect score is reached, a maximum number of cycles is performed, or forward progress is no longer being made. Program reduction is performed at two distinct stages: after inserting new knobs, and when candidates are selected. Reduction takes a program, and transforms it into a new form, the so-called elegant normal form, which, roughly speaking, expresses the program in the smallest, most compact form possible. Reduction is an important step of the algorithm, as it prevents the evolved programs from turning into 'spaghetti code', avoiding needless complexity that can cost during program evaluation. The operation of moses can be modified by using a large variety of options to control different parts of the above meta-algorithm. The inner opti‐ mization step can be done using one of several different algorithms, including hillclimbing, simulated annealing, or univariate Bayesian opti‐ mization. The amount of effort spent in the inner loop can be limited, as compared to how frequently new demes and exemplars are chosen. The total number of demes or exemplars maintained in the population may be con‐ trolled. The effort expended on program normalization at each of the two steps can be controlled. In addition to controlling the operation of the algorithm, many options are used to describe the format of the input data, and to control the printing of the output. moses also has a 'demo' mode: it has a number of built-in scoring functions for various 'well-known' optimization problems, such as parity, disjunc‐ tion, and multiplex. Each of these typically presents a different set of challenges to the optimization algorithm, and are used to illustrate 'hard' optimization problems. These demo problems are useful both for learning and understanding the use of moses, and also for developers and maintainers of moses itself, when testing new features and enhancements. COMBO PROGRAMS This section provides a brief overview of the combo programming language. Combo is not meant so much for general use by humans for practical program‐ ming, as it is to fit the needs of program learning. As such, it is fairly minimal, while still being expressive enough so that all common programming constructs are simply represented in combo. It is not difficult to convert combo programs into other languages, or to evaluate them directly. In combo, all programs are represented as program trees. That is, each internal node in the tree is an operation and leaves are either constants or variables. The set of all of the variables in a program are taken as the arguments to the program. The program is evaluated by supplying explicit values for the variables, and then applying operations until the root node is reached. By convention, variables are denoted by a dollar-sign followed by a number, although they may also be named. Variables are not explicitly typed; they are only implicitly typed during program evaluation. Thus, for example, or ($1 $2) is a combo program that represents the boolean-or of two inputs, $1 and $2. Built-in constants include true and false; numbers may be written with or without a decimal point. Logical operators include and, or and not. Arithmetic operators include +, - * and / for addition, subtraction, multi‐ plication and division. Additional arithmetic operators include sin, log, exp and rand. The rand operator returns a value in the range from 0.0 to 1.0. The predicate operator 0< (greater than zero) takes a single continu‐ ous-valued expression, and returns a boolean. The operator impulse does the opposite: it takes a single boolean-valued expression, and returns 0.0 or 1.0, depending on whether the argument evaluates to false or true. The cond operator takes an alternating sequence of predicates and values, terminated by an 'else' value. It evaluates as a chain of if-then-else statements, so that it returns the value following the first predicate to evaluate to true. Thus, as an example, the expression and( 0<( +($x 3)) 0<($y)) evaluates to true if $x is greater than -3 and $y is greater than zero. The above operators are built-in; combo may also be extended with custom- defined operators, although C++ programming and recompilation is required for this. OPTIONS Options fall into four classes: those for specifying inputs, controlling operation, printing output, and running moses in demo mode. General options -e exemplar Specify an initial exemplar to use. Each exemplar is written in combo. May be specified multiple times. -h, --help Print command options, and exit. -j num, --jobs=num Allocate num threads for deme optimization. Using multiple threads will speed the search on a multi-core machine. This flag may also be used to specify remote execution; see the section Distributed processing below. --version Print program version, and exit. Problem-type options moses is able to handle a variety of different 'problem types', such as regression, categorization and clustering, as well as a number of demo problems, such as parity and factorization. The -H option is used to spec‐ ify the problem type; the demo problem types are listed in a later section. This section lists the various "table" problems, where the goal is to train moses on an input table of values. -H type, --problem-type=type The type of problem may be one of: it Regression on an input table. That is, the input table consists of a set of columns, all but one considered 'inputs', and one is considered an output. The goal of regression is to learn a combo program that most accurately predicts the output. For boolean-valued and enumerated outputs, the scoring function simply counts the number of incorrect answers, and tries to minimize this score. That is, this scorer attempts to maximize accuracy (defined below). For contin-valued outputs, the mean-square varia‐ tion is minimized. pre Regression on an input table, maximizing precision, instead of accuracy (that is, minimizing the number of false posi‐ tives, at the risk of sometimes failing to identify true positives). Maximization is done while holding activation constant. This scorer is ideal for learning "domain experts", which provide a result only when they are certain that the result is correct. So, for a binary classifier, the output is meant to be interpreted as "yes, certain" or "no, don't know". An ensemble of such combo expressions is commmonly called a "mixture of experts". prerec Regression on an input table, maximizing precision, while attempting to maintain the recall (sensitivity) at or above a given level. recall Regression on an input table, maximizing recall (sensitiv‐ ity) while attempting to maintain the precision at or above a given level. This scorer is most commonly used when it is important to guarantee a certain level of precision, even if it means rejecting most events. In medicine and physics/radio applications, recall is exactly the same thing as sensitivity: this option searches for the most sensitive test while holding to a minimum level of preci‐ sion. bep Regression on an input table, maximizing the arithmetic mean of the precision and recall, also known as the "break- even point" or BEP. f_one Regression on an input table, maximizing the harmonic mean of the precision and recall, that is, the F_1 score. select Regression on an input table, selecting a range of rows from a continuous-valued distribution. The range of rows selected are expressed as percentiles. That is, if the input table has N rows, then rows that are selected will be rows N*lower_percentile through N*upper_percentile, where the rows are ordered in ascending rank of the output col‐ umn. That is, this scorer will rank (sort) the rows according to the output column, and then select those only in the indicated range. The lower and upper percentile can be sepcified with the -q and the -w options (the --min-rand-input and --max-rand-input options), and should lie between 0.0 and 1.0. ip Discovery of "interesting predicates" that select rows from the input table. The data table is assumed to consist of a number of boolean-valued input columns, and a contin-valued (floating point) target column. moses will learn predicates that select the most "interesting" subset of the rows in the table. The values in the output columns of the selected rows form a probability distribution (PDF); this PDF is considered to be "interesting" if it maximizes a linear combination of several different measures of the the PDF: the Kullback-Leibler divergence, the skewness, and the standardized Mann-Whitney U statistic. kl Regression on an input table, by maximizing the Kullback- Leibler divergence between the distribution of the outputs. That is, the output must still be well-scored, but it is assumed that there are many possible maxima. (XXX???) Huh? ann-it Regression on an input table, using a neural network. (kind-of-like a hidden Markov model-ish, kind of. XXX Huh???) For boolean-valued data tables, the scorers make use of the following gen‐ erally-accepted, standard defintions: TP True-positive: the sum of all rows for which the combo model pre‐ dicts T for a data-table row marked T. FP False-positive: the sum of all rows for which the combo model pre‐ dicts T for a data-table row marked F. FN False-negative: the sum of all rows for which the combo model pre‐ dicts F for a data-table row marked T. TN True-negative: the sum of all rows for which the combo model pre‐ dicts F for a data-table row marked F. accuracy Defined as the formula (TP + TN) / (TP + TN + FP + FN) activation Defined as the formula (TP + FP) / (TP + TN + FP + FN) precision Defined as the formula TP / (TP + FP) recall Also known as sensitivity, this is defined as the formula TP / (TP + FN) F_1 Harmonic mean of precision and recall, defined as the formula (2 * precision * recall) / (precision + recall) = 2TP / (2TP + FP + FN) bep Break-even point, the arithmetic mean of precision and recall, defined as the formula (precision + recall) / 2 Input specification options These options control how input data is specified and interpreted. In its primary mode of operation, moses performs regression on a a table of input data. One column is designated as the target, the remaining columns are taken as predictors. The output of regression is a combo program that is a function of the predictors, reproducing the target. Input files should consist of ASCII data, separated by commas or white‐ space. The appearance of # ; or ! in the first column denotes a comment line; this line will be ignored. The first non-comment row, if it is also non-numeric, is taken to hold column labels. The target column may be spec‐ ified using the -u option with a column name. The printing of column names on output is controlled with the -W1 flag. -i filename, --input-file=filename The filename specifies the input data file. The input table must be in 'delimiter-separated value' (DSV) format. Valid separators are comma (CSV, or comma-separated values), blanks and tabs (white‐ space). Columns correspond to features; there is one sample per (non-blank) row. Comment characters are hash, bang and semicolon (#!;) lines starting with a comment are ignored. The -i flag may be specified multiple times, to indicate multiple input files. All files must have the same number of columns. -u column, --target-feature=column The column is used as the target feature to fit. If no column is specified, then the first column is used. The column may be numeric, or it may be a column label. If it is numeric, it is taken to be the number of the column, with column 1 being the left-most. If column begins with an alphabetic character, it is taken to be a column label. In this case, the first non-comment row of the input file must contain column labels. -Y column, --ignore-feature=column The column should be ignored, and not used as input. Columns are specified as above. This option may be used multiple times, to ignore multiple columns. --score-weight=column The column is used to weight the score for each row. If this option is not used, then each row in the table contributes equally to the evaluation of the accuracy of the learned model. However, if some rows are more important than others to get right, this option can be used to indicate those rows. The accuracy of the model will be weighted by this number, when evaluating the score. A weight of zero effectively causes the row to be ignored. A negative weight enourages the system to learn models that get the row incorrect. For boolean problems, this is the same as flipping the output value. This option can only be used once, and, if used, it should specify a column containing an integer or floating-point value. -b num, --nsamples=num The number of samples to be taken from the input file. Valid values run between 1 and the number of rows in the data file; other values are ignored. If this option is absent, then all data rows are used. If this option is present, then the input table is sampled randomly to reach this size. -G1, --weighted-accuracy=1 This option is only used for the discretize_contin_bscore (when --discretize-threshold is used), if enabled, then the score corre‐ sponds to weighted accuracy Useful in case of unbalanced data. --balance=1 If the table has discrete output type (like bool or enum), balance the resulting ctable so all classes have the same weight. Algorithm control options These options provide overall control over the algorithm execution. The most important of these, for controlling behavior, are the -A, -a, -m, --max-time, -r, -v and -z flags. -a algorithm, --algo=algorithm Select the algorithm to apply to a single deme. This is the algo‐ rithm used in the 'inner loop': given a single exemplar decorated with tunable knobs, this algorithm searches for the best possible knob settings. Once these are found (or a timeout, or other termi‐ nation condition is reached), control is returned to the outer opti‐ mization loop. Available algorithms include: hc Hill-climbing. There are two primary modes of operation; each has strengths and weaknesses for different problem types. In the default mode, one begins with an initial collection of knob settings, called an instance. The settings of each knob is then varied, in turn, until one setting is found that most improves the score. This setting then becomes the new instance, and the process is repeated, until no further improvement is seen. The resulting instance is a local maximum; it is returned to the outer loop. The alternate mode of operation is triggered by using the -L1 flag (usually with the -T1 flag). In this case, as before, all knob settings are explored, one knob at a time. After finding the one knob that most improves the score, the algo is done, and the resulting instance is returned to the outer loop. If no knob settings improved the score, then all possible settings of two knobs are explored, and then three, etc. until improvement is found (or the allotted iterations are exceeded). In this alternate mode, the local hill is not climbed to the top; instead, any improvement is immediately handed back to the outer loop, for another round of exemplar selection and knob- building. For certain types of problems, including maximally misleading problems, this can arrive at better solutions, more quickly, than the traditional hill-climbing algorithm described above. sa Simulated annealing. (Deprecated). The -D flag controls the size of the neighborhood that is searched during the early, "high-temperature" phase. It has a significant effect on the run-time performance of the algorithm. Using -D2 or -D3 is likely to provide the best performance. The current implementation of this algorithm has numerous faults, making it unlikely to work well for most problems. un Univariate Bayesian dependency. -A score, --max-score=score Specifies the ideal score for a desired solution; used to terminate search. If the maximum number of evaluations has not yet elapsed (set with the -m option), and a candidate solution is found that has at least this score, then search is terminated. -m num, --max-evals=num Perform no more than num evaluations of the scoring function. Default value is 10000. --max-time= secs Run the optimizer for no longer than secs seconds. Note that timing is polled only in a small number of points in the algorithm; thus, actual execution time might exceed this limit by a few seconds, or even many minutes, depending on the problem type. In particular, knob-building time is not accounted for right away, and thus prob‐ lems with a long knob-building time will exceed this limit. If using this option, be sure to set the -m option to some very large value. Default value is 42 years. -n oper, --ignore-operator=oper Exclude the operator oper from the program solution. This option may be used several times. Currently, oper may be one of div, sin, exp, log, impulse or a variable #n. You may need to put variables under double quotes. This option has the priority over the -N option. That is, if an operator is both be included and ignored, then it is ignored. This option does not work with ANN. --linear-regression=1 When attempting to fit continuous-valued features, restrict searches to linear expressions only; that is, do not use polynomials in the fit. Specifying this option also automatically disables the use of div, sin, exp and log. Note that polynomial regression results in search spaces that grow combinatorially large in the number of input features; That is, for N features, a quadratic search will entail O(N^2) possibilities, a cubic search will explore O(N^3) possibili‐ ties, and so on. Thus, for any problem with more than dozens or a hundred features, linear regression is recommended. -r seed, --random-seed=seed Use seed as the seed value for the pseudo-random number generator. -v temperature, --complexity-temperature=temperature Set the "temperature" of the Boltzmann-like distribution used to select the next exemplar out of the metapopulation. A temperature that is too high or too low will make it likely that poor exemplars will be chosen for exploration, thus resulting in excessively long search times. The recommended temperature depends strongly on the type of problem being solved. If it is known that the problem has false maxima, and that the distance from the top of the false maxi‐ mum to the saddle separating the false and true maximum is H, then the recommended temerature is 30*H. Here, H is the 'height' or dif‐ ference in score from false peak to saddle, the saddle being the highest mountain pass between the false and true minumum. Varying the temperature by a factor of 2 or 3 from this value won't affect results much. Too small a temperature will typically lead to the system getting trapped at a local maximum. The demo parity problem works well with a temperature of 5 whereas the demo Ant trail problem requies a temperature of 2000. -z ratio, --complexity-ratio=ratio Fix the ratio of score to complexity, to be used as a penalty, when ranking the metapopulation for fitness. This ratio is meant to be used to limit the size of the search space, and, when used with an appropriate temperature, to avoid gettting trapped in local maxima. Roughly speaking, the size of the search space increases exponen‐ tially with the complexity of the combo trees being explored: more complex trees means more of them need to be examined. However, more complex trees typically result in higher scores. If an increase of N bits in the complexity typically leads to an increase of s points of the score, then the complexity ratio should be set to about N/s. In this way, the exploration of more complex tree is penalized by an amount roughly comparable to the chance that such complicated trees actually provide a better solution. The complexity ratio is used to calculate a scoring penalty; the penalty lowers the score in proportion to the solution complexity; specifically, the penalty is set to the complexity divided by the complexity ratio. Setting the ratio too low causes the algorithm to ignore the more complex solutions, ranking them in a way so that they are not much explored. Thus, the algorithm may get trapped examining only the simplest solutions, which are probably inappropriate. Setting this ratio too high will prevent a good solution from being found. In such cases, the algorithm will spend too much time evalu‐ ating overly-complex solutions, blithly leaving simpler, better solutions unexplored. The relationship between the score change and the complexity change is very strongly data-dependent, and must (currently) be manually determined (although it might be possible to measure it automati‐ cally). Input data tables with lots of almost-duplicate data may have very low ratios; complex problems with sparse data may have very high ratios. Initial recommended values would be in the range from 1 to 5; with 3.5 as the default. The parity demo problem works well with the 3.5 default, the Ant trail demo problem works well with 0.16. -Z1, --hc-crossover=1 Controls hill-climbing algorithm behavior. If false (the default), then the entire local neighborhood of the current center instance is explored. The highest-scoring instance is then chosen as the new center instance, and the process is repeated. For many datasets, however, the highest-scoring instances tend to cluster together, and so an exhaustive search may not be required. When this option is specified, a handful of the highest-scoring instances are crossed- over (in the genetic sense of cross-over) to create new instances. Only these are evaluated for fitness; the exhaustive search step is skipped. For many problem types, especially those with large neigh‐ borhoods (i.e. those with high program complexity), this can lead to an order-of-magnitude speedup, or more. For other problem types, especially those with deceptive scoring functions, this can hurt performance. --hc-crossover-min-neighbors=num If crossover is enabled, and an instance is smaller than this size, then an exhaustive search of the neighborhood will be performed. Otherwise, the search will be limited to a cross-over of the highest scoring instances. Exhaustive searches are more accurate, but can take much longer, when instances are large. Recommended values for this option are 100 to 500. Default is 400. --hc-crossover-pop-size=num If crossover is enabled, this controls the number of cross-ever instances that are created. A few hundred is normally sufficient to locate contain the actual maximum. Default value is 120. --boost=1 Enables the use of boosting. Currently, only the AdaBoost algorithm is implemented. If boosting is enabled, then the system focuses on learning combo programs that correctly classify those values that were mis-classified in the previous round of learning. The output is a set of weighted combo trees. At this time, boosting can only be used to learn binary classifiers: that is, to learn problems in which the scorer can provide a yes/no answer. Boosted dynamic feature selection Problems with a large number of input features (typically, hundreds or more) can lead to excessively long run-times, and overwhelming amounts of memory usage. Such problems can be tamed in two different ways: by static feature pre-selection (i.e. by giving moses fewer features to work with) or by reducing the number of features considered during the knob-building stage. Recall that, as moses runs, it iterates over two loops: an outer loop where an exemplar is selected and decorated with knobs, and an inner loop, where the knobs are 'turned', searching for the best knob-settings. Each knob corresponds to one feature added to a particular location in the exemplar tree. A single, given feature will then be used to build a number of dif‐ ferent knobs, each in a different place in the tree. As trees get larger, there are more places in each tree that can be decorated with knobs; thus, the number of knobs, and the search space, increases over time. If there are M places in the tree that can be decorated, and there are N features, then M*N knobs will be created. The search space is exponential in the number of knobs; so, e.g. for boolean problems, three knob settings are explored: present, absent and negated, leading to a search space of 3^(M*N). Worse, the resulting large trees also take longer to simplify and evaluate. Clearly, limiting the number of knobs created can be a good strategy. This can be done with dynamic, boosted feature selection. When enabled, a feature-selection step is performed, to find those features that are most likely to improve on the examplar. These are the features that will be strongly (anti-)correlated with incorrect answers from the currently- selected exemplar. By using only these features during knob building, the total number of knobs can be sharply decreased. The resulting size ofthe decorated tree is smaller, and the total search space is much smaller. Dynamic, boosted feature selection differs from static feature pre-selec‐ tion in many important ways. In dynamic selection, the total number of features available to moses is not reduced: just because a feature was not used in a given round of knob-decoration does not mean that it won't be used next time. Dynamic feature selection also resembles boosting, in that it focuses on fixing the wrong answers of the current exemplar. By con‐ trast, static pre-selection permanently discards features, making them unavailable to moses; it is also unable to predict which features are the most likely to be useful during iteration. Boosted, dynamic, feature selection is enabled with the --enable-fs=1 option. The number of features to use during knob building is specified using the --fs-target-size option. A number of additional flags control the behaviour of the feature selection algorithm; these are best left alone; the defaults should be adequate for almost all problems. The man page for the feature-selection command describes these in greater detail. --enable-fs=1 Enable integrated feature selection. Feature selection is disabled by default. --fs-target-size=num Select num features for use. This argument is mandatory if feature selection is enabled. --fs-algo Choose the feature-selection algorithm. Possible choices are sim‐ ple, inc, smd, hc and random. The default value is simple, which is the fastest and best algo. The simple algorithm computes the pair- wise mutual information (MI) between the target and each feature, and then selects the list of num features with the highest MI. It is strongly recommended that this algo be used. The smd algorithm implements "stochastic mutual dependency", by computing the MI between a candidate featureset and the target. Starting with an initially empty featureset, features are randomly added one-by-one, with the MI then computed. Only the featureset with the highest MI is kept; the process is then repeated until the featureset has the desired number of features in it, or the MI has stopped increasing. Note that the smd algorithm effectively prevents redundant features from being added to the featureset. Note that smd runs orders of magnitude more slowly than simple, and probably does not provide better results. The inc is similar to the smd algo, except that it adds many features, all at once, to the featureset, and then attempts to find the redundant features in the featureset, removing those. This is iteratively repeated until the desired number of fea‐ tures is obtained. Note that inc runs orders of magnitude more slowly than simple, and probably does not provide better results. The hc algorithm runs moses hill-climbing to discover those features most likely to appear. The random algorithm selects features ran‐ domly. It is useful only for limiting the numberof knobs created, but not otherwise slantnig the choice of features. --fs-threshold=num Set the minimum threshold for selecting a feature. --fs-inc-redundant-intensity=fraction When using the inc algorithm, set the threshold to reject redundant features. --fs-inc-target-size-epsilon=tolerance When using the inc algorithm, set the smallest step size used. --fs-inc-interaction-terms=num_terms When using the inc algorithm, set the number of terms used when com‐ puting the joint entropy. --fs-hc-max-score TODO write description --fs-hc-confidence-penalty-intensity TODO write description --fs-hc-max-evals TODO write description --fs-hc-fraction-of-remaining TODO write description Large problem parameters Problems with a large number of features (100 and above) often evolve exem‐ plars with a complexity of 100 or more, which in turn may have instances with hundreds of thousands of knobs, and thus, hundreds of thouands of nearest neighbors. Exploring one nearest neighbor requires one evaluation of the scoring function, and so an exhaustive search can be prohibitive. A partial search can often work quite well, especially when cross-over is enabled. The following flags control such partial searches. Note, however, that usually one can obtain better results by using dynamic feature selection, instead of using the options below to limit the search. The reason for this is that the options below cause the search to be lim‐ ited in a random fashion: the knobs to turn are selected randomly, with a uniform distribution, without any guidance as to whether they might mae a difference. By contrast, dynamic feature selection also limits the search space, by creating knobs only for those features most likely to make a dif‐ ference. Unless the scoring function is particularly decietful, limiting the search to the likely directions should perform better than limiting it to a random subset. --hc-max-nn-evals=num Controls hill-climbing algorithm behavior. When exploring the near‐ est neighborhood of an instance, num specifies the maximum number of nearest neighbors to explore. An exhaustive search of the nearest neighborhood is performed when the number of nearest neighbors is less than this value. --hc-fraction-of-nn=frac Controls hill-climbing algorithm behavior. When exploring the nearest neighborhood of an instance, frac specifies the fraction of nearest neighborhood to explore. As currently implemented, only an estimate of the nearest-neighborhood size is used, not the true size. However, this estimate is accurate to within a factor of 2. Thus, to obtain an exhaustive search of the entire neighborhood, set this to 2.0 or larger. Algorithm tuning options These options allow the operation of the algorithm to be fine-tuned for specific applications. These are "advanced" options; changing these from the default is likely to worsen algorithm behavior in all but certain spe‐ cial cases. -B effort, --reduct-knob-building-effort=effort Effort allocated for reduction during the knob-building stage. Valid values are in the range 0-3, with 0 standing for minimum effort, and 3 for maximum effort. Larger efforts result in demes with fewer knobs, thus lowering the overall dimension of the prob‐ lem. This can improve performance by effectively reducing the size of the problem. The default effort is 2. -Ddist, --max-dist=dist The maximum radius of the neighborhood around the exemplar to explore. The default value is 4. -d1, --reduce-all=1 Reduce candidates before scoring evaluation. Otherwise, only domi‐ nating candidates are reduced, just before being added to the metapopulation. This flag may be useful if scoring function evalua‐ tion expense depends strongly one the structure of the candidate. It is particularly important to specify this flag when memoization is enabled (with -s1). -E effort, --reduct-candidate-effort=effort Effort allocated for reduction of candidates. Valid values are in the range 0-3, with 0 standing for minimum effort, and 3 for maximum effort. For certain very symmetric problems, such as the disjunct problem, greater reduction can lead to significantly faster solu‐ tion-finding. The default effort is 2. -g num, --max-gens=num Create and optimize no more than num demes. Negative numbers are interpreted as "unlimited". By default, the number of demes is unlimited. --discard-dominated=1 If specified, the "dominated" members of the metapopulation will be discarded. A member of the metapopulation is "dominated" when some existing member of the metapopulation scores better on *every* sam‐ ple in the scoring dataset. Naively, one might think that an indi‐ vidual that does worse, in every possible way, is useless, and can be safely thrown away. It turns out that this is a bad assumption; dominated individuals, when selected for deme expansion, often have far fitter off-spring than the off-spring of the top-scoring (domi‐ nating) members of the metapopulation. Thus, the "weak", dominated members of the metapopulation are important for ensuring the vital‐ ity of the metapopulation as a whole, and are discarded only at con‐ siderable risk to the future adaptability of the overall population. Put another way: specifying this flag makes it more likely that the metapopulation will get trapped in a non-optimal local maximum. Note that the algorithms to compute domination are quite slow, and so keeping dominated individuals has a double benefit: not only is the metapopulation healthier, but metapopulation management runs faster. -L1, --hc-single-step=1 Single-step, instead of hill-climbing to the top of a hill. That is, a single uphill step is taken, and the resulting best demes are folded back into the metapopulation. Solving then continues as usual. By default, the hill-climbing algorithm does not single-step; it instead continues to the top of the local hill, before folding the resulting demes back into the metapopulation. If using this flag, consider using the -T1 flag to allow the search to be widened, so that if the initial exemplar is already at the top of a local hill, a search is made for a different (taller) hill. -N oper, --include-only-operator=oper Include the operator oper, but exclude others, in the solution. This option may be used several times to specify multiple operators. Currently, oper may be one of plus, times, div, sin, exp, log, impulse or a variable #n. Note that variables and operators are treated separately, so that including only some operators will still include all variables, and including only some variables still include all operators). You may need to put variables under double quotes. This option does not work with ANN. -P num, --pop-size-ratio=num Controls amount of time spent on a deme. Default value is 20. -p fraction, --noise=fraction This option provides an alternative means of setting the complexity ratio. If specified, it over-rides the -z option. For discrete problems, fraction can be interpreted as being the fraction of score values that are incorrect (e.g. due to noisy data). As such, only values in the range 0 < fraction < 0.5 are meaningful (i.e. less than half of the data values are incorrect). Typical recommended values are in the range of 0.001 to 0.05. For continuous-valued problems, it can be interpreted as the standard deviation of a Gaussian noise in the dependent variable. For the discrete problem, the complexity ratio is related to the fraction p by the explicit formula: complexity_ratio = - log(p/(1-p)) / log |A| where |A| is the (problem-dependent) alphabet size. See below for a detailed explanation. -T1, --hc-widen-search=1 Controls hill-climbing algorithm behavior. If false (the default), then deme search terminates when a local hilltop is found. If true, then the search radius is progressively widened, until another ter‐ mination condition is met. Consider using the -D flag to set the maximum search radius. Note that the size of the search space increases exponentially with the search radius, so this kind of search becomes very rapidly intractible. --well-enough=1 For problems with an enumerated ('nominal') output, the learned combo program is always of the form cond(pred_1 value_1 pred_2 value_2 ... pred_n value_n else_val) where pred_1 is a predicate, which, if true, causes the output to be value_1. If false, then pred_2 is tested, and so on. If none of the predicates evaluate to true, then the value of the cond expression is the else_val. The well-enough algorithm attempts to find predicates that maximize pre‐ cision, the point being that if a perfectly precise pred_1 can be found, then it can be left alone ('leave well-enough alone'), thus simplifying the remainder of the search problem. Performing this evaluation is costly, and may lead to a slow-down, without improving overall accuracy. Controlling RAM usage Deploying moses on large problems can cause all available RAM on a given machine to be used up. Actual RAM usage depends strongly on the dataset, and on various tuning paramters. Typical tasks require systems with dozens or hundreds of gigabytes of RAM. There are three primary consumers of RAM within moses: the metapopulation (the set of combo trees being evolved), the deme (the set of instances describing the knob setttings, when a single combo tree is being explored), and the score casche. The following options can be used to limit the size of each of these to a more managable size. -s num, --cache-size=num Enable memoization of candidate scores. This allows the number of scoring function evaluations to be reduced, by maintaining a cache of recently scored candidates. If a new candidate is found in the cache, that score is used, instead of a scoring function evaluation. The effectiveness of memoization is greatly increased by also using the -d1 flag. The maximum number of cached scores is set at num, which defaults to 3000. Careful: scores can be quite large, and use a lot of memory. --cap-coef=num A coefficient used to control the size of the metapopulation. The metapopulation is trimmed, by removing a random subset of the lowest scorers, so that the maximum size is given by the formula pop-size = cap_coef * (N + 250) * (1 + 2 * exp(-N/500)) where N is the number of metapop iterations (tree expansions) so far. The default value for this option is 50. Note that the prefered way of keeping the metapopulation small is to set a low temperature (the --complexity-temperature option). For a given temperature, trees which have such a poor score that they have a probability of less than one in a billion of being selected are automatically discarded from the metapopulation. In most cases, this is enough to keep the metapopulation under control. However, if there are a vast number of trees with almost exactly the same score, the temperature alone will not be enough to control the metapopula‐ tion size. If the metapopulation is growing out of control, then do verify that the temerpature is set to an appropriatte value, before making use of this flag. --hc-resize-to-fit-ram=1 If set to true, this will cause moses to try to limit memory usage so that it will fit in available RAM (and thus avoid an OOM kill). The RAM usage is controlled by automatically resizing the deme dur‐ ing the hillclimbing search. (The deme contains instances of knob settings that were explored). Note that the use of this option may introduce indeterminacy, when comparing runs from different machines, having different amounts of installed RAM. Output control options These options control the displayed output. Note that, be default, the ten best solutions are printed. These are printed in penalized-score-sorted order, but the actual printed score is the raw, un-penalized score. This can lead to the printed list seeming to be in random order, which can occur if the penalties are high. The penalties can be too high, if the complex‐ ity ratio is set too low, or the temperature is set too low. -C1, --output-only-best=1 Print only the highest-scoring candidates. -c count, --result-count=count The number of results to return, ordered according to score. If neg‐ ative, then all results are returned. -f filename, --log-file=filename Write debug log traces filename. If not specified, traces are writ‐ ten to moses.log. -F, --log-file-dep-opt Write debug log traces to a filename constructed from the passed option flags and values. The filename will be truncated to a maximum of 255 characters. -l loglevel, --log-level=loglevel Specify the level of detail for debug logging. Possible values for loglevel are NONE, ERROR, WARN, INFO, DEBUG, and FINE. Case does not matter. Caution: excessive logging detail can lead to significant program slowdown. The NONE option disables log file creation. This may make error debugging difficult. -o filename, --output-file=filename Write results to filename. If not specified, results are written to stdout. --python=1, --python3=1 Output the highest-scoring programs as python snippets, instead of combo. -S0, --output-score=0 Prevent printing of the score. -t1, --output-bscore=1 Print the behavioral score. -V1, --output-eval-number=1 Print the number of evaluations performed. -W1, --output-with-labels=1 Use named labels instead of position place-holders when printing candidates. For example, *("$temperature" "$entropy") instead of *($3 $4). This option is effective only when the data file contains labels in its header. -x1, --output-complexity=1 Print the complexity measure of the model, and the scoring penalty. Precision, recall, BEP, F_1 and prerec problem types The prerec, recall, bep, f_one and precision problem types are used to solve binary classification problems: problems where the goal is to sort inputs into one of two sets, while maximizing either the precision, the sensitivity, or some other figure of merit of the test. In moses, precision and recall (sensitivity) are defined as usual. Preci‐ sion is defined as the number of true positives, divided by the number of true positives plus the number of false positives. Classifiers with a high precision make very few mistakes identifying positives: they have very few or no false positives. However, precise classifiers may completely fail to identify many, if not most positives; they just don't make mistakes when they do identify them. Recall, also known as sensitivity, is defined as the number of true posi‐ tives divided by the sum of the number of true positives and false nega‐ tives. Classifiers with high recall will identify most, or maybe even all positives; however, they may also identify many negatives, thus ruining precision. A trivial way to maximize precision is to have a very low recall rate, and conversely, one can very easily have a good recall rate if one does not mind a poor precision. Thus a common goal is to maximize one, while hold‐ ing the other to a minimum standard. One common problem is find a classi‐ fier with the highest possible recall, while holding precision to a fixed minimum level; this may be accomplished with the -Hrecall option. Alter‐ nately, one may desire to maximize precision, while maintaining a minimum sensitivity; this may be accomplished with the -Hprerec option. Note that, although these two proceedures seem superficially similar, they can often lead to dramatically different models of the input data. This is in part because, during early stages, moses will choose exemplars that maximize one or the other, thus causing dramatically different parts of the solution space to be searched. A common alternative to maximizing one or the other is to maximize wither the arithmetic or the harmonic mean of the two. The arithmetic mean is sometimes called the "break-even point" or BEP; it is maximized when the -Hbep option is specified. The harmonic mean is known as the F_1 score, it is maximized when the -Hf_one option is specified. moses also provides a second way of maximizing precision, using the -Hpre option. This option searches for the test with the highest precision, while holding the 'activation' in a bounded range. The definition of 'activation' is idiosyncratic to moses; it is defined as the sum of true positives plus false positives: that is, it is the fraction of rows for which the trial combo program returned a positive answer, regardless of whether this was the right answer. Activation ranges from 0.0, to 1.0. It is never desirable to maximize activation; rather, most commonly, one wants to peg activation at exactly the fraction of positives in the training set. The minimum level to which a fixed component should be held may be speci‐ fied with the -q or --min-rand-input option. Thus, for the -Hrecall prob‐ lem, the -q flag is used to specify the minimum desired precision. Simi‐ larly, for the -Hprerec problem, the -q flag is used to specify the minimum desired recall. For the -Hpre problem, the -w or --max-rand-input option should be used to make sure the activation does not get too high. The -q and -w options also set lower and upper bounds for the BEP problem as well. When maximizing BEP, the system attempts to keep the absolute value of the difference between precision and recall less than 0.5. This maximum difference can be over-ridden with the -w option. Adherence to the bounds is done by means of a scoring penalty; combo pro‐ grams that fail to lie within bounds are penalized. The harshness or hard‐ ness of the penalty may be specified by means of the -Q or --alpha option. Values much greater than one enforce a hard boundary; values much less than one make for a very soft boundary. Negative values are invalid. Contin options Options that affect the usage of continuously-valued variables. These options specify values that are used in a variety of different ways, depending on the chosen problem type. See appropriate sections for more details. -Q hardness, --alpha=hardness The harshness of hardness of a limit that must be adhered to. Default 0.0 (limits disabled). -q num, --min-rand-input=num Minimum value for continuous variables. Default 0.0. -w num, --max-rand-input=num Maximum value for continuous variables. Default 1.0. -R num, --discretize-threshold=num Split a continuous domain into two pieces. This option maybe be used multiple times to split a continuous domain into multiple pieces: that is, n uses of this option will create n+1 domains. Demo options These options pertain to the various built-in demo and example problem modes. Such demo problems are commonly used to evaluate different machine learning algorithms, and are thus included here to facilitate such compari‐ son, as well as to simplify moses regression and performance testing. -H type, --problem-type=type A number of demonstration problems are supported. In each case, the top results are printed to stdout, as a score, followed by a combo program. type may be one of: cp Combo program regression. The scoring function is based on the combo program specified with the -y flag. That is, the goal of the run is to deduce and learn the specified combo program. When specifying combo programs with continuous variables in them, be sure to use the -q, -w and -b flags to specify a range of input values to be sampled. In order to determine the fitness of any candidate, it must be compared to the specified combo program. The comparison is done at a variety of differ‐ ent input values. If the range of sampled input values is inappropriate, or if there are not enough sampled values, then the fitness function may select unexpected, undesired candi‐ dates. dj Disjunction problem. The scoring function awards a result that is a boolean disjunction (or) of N boolean-valued variables. The resulting combo program should be or($1 $2 ...). The size of the problem may be specified with the -k option. mux Multiplex problem. The scoring function models a boolean digi‐ tal multiplexer, that is, an electronic circuit where an "address" of n bits selects one and only one line, out of 2^n possible lines. Thus, for example, a single address bit can select one of two possible lines: the first, if its false, and the second, if its true. The -k option may be used to specify the value of n. The actual size of the problem, measured in bits, is n+2^n and so increases exponentially fast. pa Even parity problem. The resulting combo program computes the parity of k bits, evaluating to true if the parity is even, else evaluating to false. The size of the problem may be specified with the -k option. sr Polynomial regression problem. Given the polynomial p(x)=x+x^2+x^3+...x^k, this searches for the shortest program consisting of nested arithmetic operators to compute p(x), given x as a free variable. The arithmetic operators would be addition, subtraction, multiplication and division; exponenti‐ ation is not allowed in the solution. So, for example, using the -k2 option to specify the order-2 polynomial x+x^2, then the shortest combo program is *(+(1 $1) $1) (that is, the solution is p(x)=x(x+1) in the usual arithmetical notation). -k size, --problem-size=size Specify the size of the problem. The interpretation of size depends on the particular problem type. -y prog, --combo-program=prog Specify the combo program to be learned, when used in combination with the -H cp option. Thus, for example, -H cp -y "and(\$1 \$2)" specifies that the two-input conjunction is to be learned. Keep in mind that $ is a reserved character in many shells, and thus must be escaped with a backslash in order to be passed to moses. COMPLEXITY PENALTY The speed with which the search algorithm can find a reasonable solution is significantly affected by the complexity ratio specified with the -z or -p options. This section provides the theoretical underpinning for the meaning of these flags, and how they affect the the algorithm. The complexity penalty has two slightly different interpretations, depending on whether one is considering learning a discretely-valued problem (i.e. boolean-val‐ ued) or a continuously-valued problem. The general structure of the argu‐ ment is broadly similar for both cases; they are presented below. Similar arguments apply for classification problems (learning to classify data into one of N categories), and for precision maximization. Discrete case Let M be the model to be learned (the combo program). Let D be the data, assumed to be a table of n inputs i_k and one output o, with each row in the form: i_1 ... i_n o Here, i_k is the k'th input and o the output. In the below, we write o = D(x) where x=(i_1, ..., i_n) is an input data row. We want to assess the probability P(M|D) of the model M conditioned on the data D. In particular, we wish to maximize this, as it provides the fit‐ ness function for the model. According to Bayes theorem, P(M|D) = P(D|M) * P(M) / P(D) Consider the log likelihood LL(M) of M knowing D. Since D is constant, we can ignore P(D), so: LL(M) = log(P(D|M)) + log(P(M)) Assume each output of M on row x has probability p of being wrong. So, P(D|M) = Prod_{x\in D} [p*(M(x) != D(x)) + (1-p)*(M(x) == D(x))] where D(x) the observed result given input x. Then, log P(D|M) = Sum_{x\in D} log[p*(M(x) != D(x)) + (1-p)*(M(x) == D(x))] Let D = D_eq \cup D_ne where D_eq and D_ne are the sets D_eq = {x \in D | M(x) == D(x) } D_ne = {x \in D | M(x) != D(x) } Then log P(D|M) = Sum_{x\in D_ne} log(p) + Sum_{x\in D_eq} log(1-p) = |D_ne| log(p) + |D_eq| log(1-p) = |D_ne| log(p) + |D| log(1-p) - |D_ne| log(1-p) = |D_ne| log(p/(1-p)) + |D| log(1-p) Here, |D| is simply the size of set D, etc. Assuming that p is small, i.e. much less than one, then, to second order in p: log(1-p) = -p + p^2/2 + O(p^3) So: log P(D|M) = |D_ne| log(p) - p (|D| - |D_ne|) + O(p^2) Next, assume P(M) is distributed according to Solomonoff's Universal Dis‐ tribution, approximated by (for now) P(M) = |A|^-|M| = exp(-|M|*log(|A|)) where A is the alphabet of the model, |A| is the alphabet size, and |M| is the complexity of the model. Note that this distribution is identical to the Boltzmann distribution, for an inverse temperature of log(|A|). Putting it all together, the log-likelihood of M is: LL(M) = -|M|*log(|A|) + |D_ne| log(p/(1-p)) + |D| log(1-p) To get an expression usable for a scoring function, just bring out the |D_ne| by dividing by -log(p/(1-p)), to get score(M) = - [ LL(M) - |D| log(1-p) ] / log(p/(1-p)) = -|D_ne| + |M|*log|A| / log(p/(1-p)) = -|D_ne| - |M| |C_coef| Note that, since p<1, that log(p) is negative, and so the second term is negative. It can be understood as a complexity penalty. That is, we define the complexity penalty as complexity_penalty = |M| |C_coef| The complexity ratio, as set by the -z option, is given by complexity_ratio = 1 / |C_coef| By contrast, the -p option may be used to set p directly, as given in the formulas above. The value of |A| is computed internally, depending on the specific problem type (discrete vs. continuous, number of included-excluded operators, etc.) The complexity of each solution is also computed, using an ad-hoc complexity measure. Continuous case A similar argument to the above holds for the case of a continuously-valued observable. Let dP(..) be the notation for a probability density (or measure). As before, start with Bayes theorem: dP(M|D) = dP(D|M) * P(M) / P(D) Since D is constant, one may ignore the prior P(D), and write the log like‐ lihood of M knowing D as: LL(M) = log(dP(D|M)) + log(P(M)) Assume the output of of the model M on input x has a Gaussian distribu‐ tions, of mean M(x) and variance V, so that dP(D|M), the probability den‐ sity of the data D given the modem M is: dP(D|M) = Prod_{x\in D} (2*Pi*V)^(-1/2) exp(-(M(x)-D(x))^2/(2*V)) As before, assume a model distribution of P(M) = |A|^-|M| where |A| is the alphabet size and |M| the complexity of the model. After simplification, and dropping a constant term that does not depend on either the model complexity or the dataset itself (the dataset size is a con‐ stant), one then can deduce a scoring function: score(M) = -|M|*log(|A|)*2*V - Sum_{x\in D} (M(x)-D(x))^2 As before, |M|*log(|A|)*2*V can be interpreted as a scoring penalty. Alternately, one may interpret each row x as a feature; then the penalty term |M|*log(|A|)*2*V can be interpreted as an additional feature that must be fit. DISTRIBUTED PROCESSING moses provides two different styles of distributed processing for cluster computing systems. One style is to use MPI (as implemented in the Open‐ MPI/MPICH2 systems), the second is to use SSH. The first style is best suited for local area networks (LANs) and compute clusters. The second style allows operation over the open Internet, but is more problematic, as it may require manual cleanup of log files and failed jobs. Because MPI is easy to install and manage, it is the recommended method for distributing moses operation across many machines. When moses is run in distributed fashion, one single node, the root node, maintains control over a pool of workers that execute on remote nodes. The root node maintains the set of candidate solutions, and assigns these to the workers for additional exploration, as the workers become free. The results are automatically collected by the root, and are automatically merged into the candidate population. When termination criteria are met, processing will terminate on all nodes, and the root node will report the merged, best results. MPI Using MPI requires a moses binary with MPI support compiled in, with either the OpenMPI or the MPICH2 implementations. MPI support in moses is enabled with the --mpi=1 command-line flag. When this flag is specified, moses may be run as usual in an MPI environment. Details will vary from one system configuration to another, but a typical usage might resemble the following: mpirun -n 15 --hostfile mpd.hosts moses --mpi=1 -j 12 <other moses params> The above specifies that the run should be distributed over fifteen nodes, with the node names specified in the mpd.hosts file. The --mpi=1 flag indicates to moses that it is being run in an MPI environment. The -j12 flag tells moses to use up to twelve threads on each node, whenever possi‐ ble; this example assumes each node has 12 cores per CPU. To maximize CPU utilization, it seems best to specify two MPI instances per node. This is because not all parts of moses are parallelized, and some parts are subject to lock contention. Thus, running multiple instances per node seems to be an effective way to utilize all available compute power on that node. It is almost always the case that moses RAM usage is relatively small, and so RAM availability is rarely a problem. The network utiliza‐ tion by moses is also very modest: the only network traffic is the report‐ ing of candidate solutions, and so the network demands are typically in the range of 1 Megabit per second, and are thus easily supported on an Ethernet connection. The moses workload distributes in an 'embarrassingly parallel' fashion, and so there is no practical limit to scaling on small and medium compute clusters. When performing input data regression, be sure that the input data file is available on all nodes. This is most easily achieved by placing the input data file on a shared filesystem. Each instance of moses will write a log file. In order to avoid name collision on the log files, the process id (PID) will automatically be incorporated into the log-file name when the --mpi=1 option is specified. Log file creation can be disabled with the -lNONE option; however, this is not recommended, as it makes debugging and progress monitoring difficult. SSH The SSH style of distributed processing uses the ssh command to establish communications and to control the pool of workers. Although moses provides job control when the system is running normally, it does not provide any mechanism for cleaning up after hung or failed jobs; this is outside the scope of the ssh implementation. The use of a job manager, such as LSF, is recommended. Remote machines are specified using the -j option, using the notation -j N:REMOTE_HOST. Here, N is the number of threads to use on the machine REMOTE_HOST. For instance, one can enter the options -j4 -j16:my_server.org (or -j16:user@my_server.org if one wishes to run the remote job under a different user name), meaning that 4 threads are allo‐ cated on the local machine and 16 threads are allocated on my_server.org. Password prompts will appear unless ssh-agent is being used. The moses executable must be on the remote machine(s), and located in a directory included in the PATH environment variable. Beware that a lot of log files are going to be generated on the remote machines. DIVERSITY moses can enforce diversity in the metapopulation, by taking into account a diversity pressure during ranking, effectively pushing down semantically redundant candidates to the bottom of the metapopulation. This has 2 effects 1. Selected exemplars for new demes are more likely to be diverse. 2. Top candidates output to the user at the end of learning are more likely to be diverse. The diversity pressure operates over the semantics of the candidates by measuring the distance between the behavioral scores of the current candi‐ date to be ranked and the top candidates already ranked. The exact algo‐ rithm is described in Section 1.3 of https://github.com/opencog/moses/blob/master/moses/moses/documenta‐ tion/diversity_report_v0.3_disclosable.pdf. Diversity options User options can be used to determine how much pressure to apply and how the distance between behavioral scores should be calculated. If you do not wish to understand much of these options but still want to enforce diver‐ sity simply set a positive diversity pressure (typically around 0.01) with option --diversity-pressure, and enable autoscale with option --diversity- autoscale=1. --diversity-pressure=num Set a diversity pressure on the metapopulation. Programs behaving similarily to others are more penalized. That value sets the impor‐ tance of that penalty. It ranges from 0 to +inf. If --diversity- autoscale is enabled (recommended) then it typically ranges from 0 to 1. Default 0.0. --diversity-autoscale=bool Automatically rescale the --diversity-pressure parameter so that a pressure may typically range from 0 to 1, regardless of the fitness function. Although this option is disabled by default for backward compatibility, it is highly recommended. Default 0. --diversity-exponent=num Set the exponent of the generalized mean (or sum, if --diversity- normalize is set to 0) aggregating the penalties between a candidate and the set of all candidates better than itself (taking into account diversity). If the value tends towards 0 it tends to the geometric mean, towards +inf it tends to the max function. If nega‐ tive or null is it the max function. Default -1.0. --diversity-normalize=bool If set to 1 then the aggregating function is a generalized mean. Otherwize it is a generalized sum (generalize mean * number of ele‐ ments). If --diversity-exponent is set to negatively then this doesn't have any impact as the aggregating function is the max any‐ way. Default 1. --diversity-dst=distance Set the distance between behavioral scores, then used to determine the uniformity penalty. 3 distances are available: p_norm, tanimoto and angular. Default p_norm. --diversity-p-norm=num Set the parameter of the p_norm distance. A value of 1.0 corresponds to the Manhatan distance. A value of 2.0 corresponds to the Euclidean distance. A value of 0.0 or less correspond to the max component-wise. Any other value corresponds to the general case. Default 2.0. --diversity-dst2dp=type Set the type of function to convert distance into penalty. 4 options are available: auto, inverse, complement and power. When auto is selected the function is selected depending on the distance, if the distance is p_norm, then inverse is selected, otherwise complement is selected. Default auto. TODO Finish documenting these algo flags: --fs-scorer -M -R discretize target var These input flags: -G Interesting patterns flags: -J -K -U -X SEE ALSO The eval-table man page. More information is available at http://wiki.opencog.org/w/MOSES AUTHORS moses was written by Moshe Looks, Nil Geisweiller, and many others. This manual page is being written by Linas Vepstas. It is INCOMPLETE. 3.7.0 March 8, 2019 MOSES(1)