# Exporting MOSES Models Into the Atomspace

## Theory

Often we need to pipe MOSES models into the AtomSpace to allow PLN to infer new knowledge.

### Thoughts on the Future of MOSES and the Atompsace

Currently, OpenCog’s MOSES “probabilistic evolutionary learning” component is implemented separately from the Atomspace (OpenCog’s core knowledge store), although one can export programs learned via MOSES into the Atomspace. This is acceptable but is an imperfect design that persists for historical reasons. OpenCog is considering Reimplementing MOSES in the Atomspace Atop the Pattern Miner - in which case one could use the Pattern Miner infrastructure, with only fairly modest modification, to implement a variety of MOSES that operates in the Atomspace. This would have various advantages, including enabling the hybridization of pattern mining and MOSES in various ways.

## Practice

The Basic Mapping section is as the name suggests - a basic hands on to get MOSES models into the Atomspace (including the files used). Following this is a more detailed tutorial called 'Mapping MOSES models into the AtomSpace (Detailed)'.

### Basic Mapping

There are several scripts to help export MOSES models into the Atomspace, though they may not all be required.

Firstly one needs to have existing MOSES models to import into the atomspace. Here is a hands on tutorial on create models in MOSES

#### Files

All these files can be found in this git repo

##### export_models_and_fitness.sh

Used to convert models and their scores into a scheme readily dumpable into the AtomSpace

export_models_and_fitness.sh (code)
```#!/bin/bash

# Overview
# --------
# Little script to export in scheme format (readily dumpable into the
# AtomSpace) the models and their scores, given to a CSV, following
# Mike's format, 3 columns, the combo program, its recall (aka
# sensitivity) and its precision (aka positive predictive value).
#
# Usage
# -----
# Run it without argument to print the usage.
#
# Description
# -----------
# The model will be labeled FILENAME:moses_model_INDEX
#
# where FILENAME is the basename of the filename provided in argument,
# and INDEX is the row index of the model (starting by 0)
#
# The exported hypergraphs are
#
# 1. The model itself associated with its label (MODEL_PREDICATE_NAME)
#
#     PredicateNode MODEL_PREDICATE_NAME
#     MODEL
#
# 2. The label associated with its accuracy
#
#     PredicateNode "accuracy"
#         PredicateNode PREDICATE_MODEL_NAME
#         PredicateNode TARGET_FEATURE_NAME
#
# 3. The label associated with its balanced accuracy [REMOVED]
#
#     PredicateNode "balancedAccuracy"
#         PredicateNode PREDICATE_MODEL_NAME
#         PredicateNode TARGET_FEATURE_NAME
#
# 4. The label associated with its precision [REMOVED]
#
#     PredicateNode MODEL_PREDICATE_NAME
#     PredicateNode TARGET_FEATURE_NAME
#
# 5. The label associated with its recall
#
#     PredicateNode TARGET_FEATURE_NAME
#     PredicateNode MODEL_PREDICATE_NAME

set -u                          # raise error on unknown variable read
# set -x                          # debug trace

####################
# Source common.sh #
####################
PRG_DIR="\$(dirname "\$PRG_PATH")"
. "\$PRG_DIR/common.sh"

####################
# Program argument #
####################
if [[ \$# == 0 || \$# -gt 3 ]]; then
echo "Usage: \$0 MODEL_CSV_FILE [-o OUTPUT_FILE]"
echo "Example: \$0 chr10_moses.5x10.csv -o chr10_moses.5x10.scm"
exit 1
fi

shift
OUTPUT_FILE="/dev/stdout"
while getopts "o:" opt; do
case \$opt in
o) OUTPUT_FILE="\$OPTARG"
;;
esac
done

#############
# Functions #
#############

# Given
#
# 1. a model predicate name
#
# 2. a combo model
#
# return a scheme code defining the equivalence between the model name
# and the model:
#
#     PredicateNode MODEL_PREDICATE_NAME
#     MODEL
model_name_def() {
local name="\$1"
local model="\$2"
cat <<EOF
(PredicateNode "\${name}")
\$model)
EOF
}

# Given
#
# 1. a model predicate name
#
# 2. a target feature name
#
# 3. an accuracy
#
# return a scheme code relating the model predicate with the accuracy:
#
#     PredicateNode "accuracy"
#         PredicateNode PREDICATE_MODEL_NAME
#         PredicateNode TARGET_FEATURE_NAME
model_accuracy_def() {
local name="\$1"
local target="\$2"
local accuracy="\$3"
cat <<EOF
(PredicateNode "accuracy")
(PredicateNode "\$name")
(PredicateNode "\$target")))
EOF
}

# Like above but for balanced accuracy
model_balanced_accuracy_def() {
local name="\$1"
local target="\$2"
local accuracy="\$3"
cat <<EOF
(PredicateNode "balancedAccuracy")
(PredicateNode "\$name")
(PredicateNode "\$target")))
EOF
}

# Given
#
# 1. a model predicate name
#
# 2. a target feature name
#
# 3. a precision
#
# return a scheme code relating the model predicate with its precision:
#
#     PredicateNode PREDICATE_MODEL_NAME
#     PredicateNode TARGET_FEATURE_NAME
model_precision_def() {
local name="\$1"
local target="\$2"
local precision="\$3"
cat <<EOF
(PredicateNode "\$name")
(PredicateNode "\$target"))
EOF
}

# Given
#
# 1. a model predicate name
#
# 2. a target feature name
#
# 3. a recall
#
# return a scheme code relating the model predicate with its recall:
#
#     PredicateNode TARGET_FEATURE_NAME
#     PredicateNode PREDICATE_MODEL_NAME
model_recall_def() {
local name="\$1"
local target="\$2"
local recall="\$3"
cat <<EOF
(PredicateNode "\$target"))
(PredicateNode "\$name")
EOF
}

########
# Main #
########

# Count the number of models and how to pad their unique numeric ID
rows=\$(nrows "\$MODEL_CSV_FILE")
npads=\$(python -c "import math; print int(math.log(\$rows, 10) + 1)")

# Check that the header is correct (if not maybe the file format has
# changed)
fi

# Create a temporary pipe and save the scheme code
tmp_pipe=\$(mktemp -u)
mkfifo "\$tmp_pipe"

OLDIFS="\$IFS"
IFS=","
i=0                             # used to give unique names to models
while read combo recall precision; do
# Output model name predicate associated with model
scm_model="\$(combo-fmt-converter -c "\$combo" -f scheme)"
echo "\$(model_name_def "\$model_name" "\$scm_model")"

# Output model precision
echo "\$(model_precision_def "\$model_name" aging \$precision)"

# Output model recall
echo "\$(model_recall_def "\$model_name" aging \$recall)"

((++i))
done < <(tail -n +2 "\$MODEL_CSV_FILE") > "\$OUTPUT_FILE"
IFS="\$OLDIFS"```
##### relate_features_and_genes.sh

Used to generate scheme code to relate MOSES features and their corresponding genes

relate_features_and_genes.sh (code)
```#!/bin/bash

# Scripts that take a feature CSV file and generate the corresponding
# hypergraphs relating them to geneNodes.  That is for each feature of
# name <GENE_NAME> produce:
#
#     PredicateNode <GENE_NAME>
#         VariableNode \$X
#             PredicateNode "overexpressed"
#                 GeneNode <GENE_NAME>
#                 \$X

set -u
# set -x

####################
# Source common.sh #
####################
PRG_DIR="\$(dirname "\$PRG_PATH")"
. "\$PRG_DIR/common.sh"

####################
# Program argument #
####################
if [[ \$# == 0 || \$# -gt 3 ]]; then
echo "Usage: \$0 FEATURE_CSV_FILE [-o OUTPUT_FILE]"
echo "Example: \$0 oldvscontrolFeatures.csv -o oldvscontrol-features-and-genes.scm"
exit 1
fi

shift
OUTPUT_FILE="/dev/stdout"
while getopts "o:" opt; do
case \$opt in
o) OUTPUT_FILE="\$OPTARG"
;;
esac
done

########
# Main #
########

# Check that the header is correct (if not maybe the file format has
# changed)
fi

OLDIFS="\$IFS"
IFS=","
while read feature freq level; do
cat <<EOF
(PredicateNode \$feature)
(VariableNode "\\$X")
(PredicateNode "overexpressed")
(GeneNode \$feature)
(VariableNode "\\$X"))))
EOF
done < <(tail -n +2 "\$FEATURE_CSV_FILE") > "\$OUTPUT_FILE"
IFS="\$OLDIFS"```
##### test.sh

(Obsolete) An obsolete script to experiment with MOSES learning and PLN reasoning.. You may need to configure settings.sh (i.e. for setting your OpenCog path). Usage is as follows:

```mkdir <MY_EXP>
cd <MY_EXP>
../scripts/test.sh ../scripts/settings.sh
```
test.sh (code)
```#!/bin/bash

# Script test to attempt to load MOSES models in scheme format to the
# AtomSpace so that PLN can then reason on them.
#
# It performs the following
#
# 1. Launch an OpenCog server
#
# 2. Load background knowledge from a Scheme file (like feature
# definitions)
#
# 3. Split dataset into k-fold train and test sets
#
# 4. Run MOSES on some problem
#
# 5. Parse the output and pipe it in OpenCog
#
# 6. Use PLN to perform reasoning, etc.

set -u
# set -x

if [[ \$# != 1 ]]; then
echo "Usage: \$0 SETTINGS_FILE"
exit 1
fi

#############
# Constants #
#############

PRG_DIR="\$(dirname "\$PRG_PATH")"
ROOT_DIR="\$(dirname "\$PRG_DIR")"
SET_PATH="\$1"
SET_BASENAME="\$(basename "\$SET_PATH")"

#############
# Functions #
#############

# Given an error message, display that error on stderr and exit
fatalError() {
echo "[ERROR] \$@" 1>&2
exit 1
}

warnEcho() {
echo "[WARN] \$@"
}

infoEcho() {
echo "[INFO] \$@"
}

# Convert human readable integer into machine full integer. For
# instance \$(hr2i 100K) returns 100000, \$(hr2i 10M) returns 10000000.
hr2i() {
local val=\$1
local val=\${val/M/000K}
local val=\${val/K/000}
echo \$val
}

# Pad \$1 symbol with up to \$2 0s
}

# Split the data into train and test, renaming FILENAME.csv by
# FILENAME_train.csv and FILENAME_test.csv given
#
# 1. Dataset csv file with header
#
# 2. A ratio = train sample size / total size
#
# 3. A random seed
train_test_split() {
local DATAFILE="\$1"
local RATIO="\$2"

# Reset random seed
RANDOM="\$3"

# Define train and test outputs
local DATAFILE_TRAIN=\${DATAFILE//.csv/_train.csv}
local DATAFILE_TEST=\${DATAFILE//.csv/_test.csv}

# Copy header into train and test files
head -n 1 "\$DATAFILE" > "\${DATAFILE_TRAIN}"
head -n 1 "\$DATAFILE" > "\${DATAFILE_TEST}"

# Subsample
if [[ \$(bc <<< "\$RATIO * 32767 >= \$RANDOM") == 1 ]]; then
echo "\$line" >> "\${DATAFILE_TRAIN}"
else
echo "\$line" >> "\${DATAFILE_TEST}"
fi
done < <(tail -n +2 "\$DATAFILE")
}

# Given
#
# 1. a model name
#
# 2. a combo model
#
# return a scheme code defining the equivalence between the model name
# and the model.
model_def() {
name="\$1"
model="\$2"
echo "(EquivalenceLink (stv 1.0 1.0) (PredicateNode \"\${name}\") \$model)"
}

########
# Main #
########

# 0. Copy in experiment dir and source settings

infoEcho "Copy \$SET_PATH to current directory"
cp "\$SET_PATH" .
. "\$SET_BASENAME"

# 1. Launch an OpenCog server

infoEcho "Launch cogserver"
cd "\$opencog_repo_path/scripts/"
./run_cogserver.sh "\$build_dir_name" &
cd -
sleep 5

if [[ "\$scheme_file_path" =~ ^[^/] ]]; then # It is relative
scheme_file_path="\$ROOT_DIR/\$scheme_file_path"
fi

(echo "scm"; cat "\$scheme_file_path") \
| "\$opencog_repo_path/scripts/run_telnet_cogserver.sh"

# 3. Create train and test data

infoEcho "Create train and test data"
if [[ "\$data_path" =~ ^[^/] ]]; then # It is relative
data_path="\$ROOT_DIR/\$data_path"
fi
cp \$data_path .
data_basename="\$(basename "\$data_path")"
train_test_split "\$data_basename" "\$train_ratio" "\$init_seed"
data_basename_train=\${data_basename//.csv/_train.csv}
data_basename_test=\${data_basename//.csv/_test.csv}

# 4. Run MOSES

infoEcho "Run MOSES"
moses_output_file=results.moses
. "\$PRG_DIR/moses.sh"

# 5. Parse MOSES output and pipe it in OpenCog

infoEcho "Load MOSES models into the AtomSpace"
(echo "scm";
i=0
echo "\$(model_def "\$moses_model_name" "\$line")"
((++i))
done < "\$moses_output_file"
) | "\$opencog_repo_path/scripts/run_telnet_cogserver.sh"

# 6. Use PLN to perform reasoning, etc.
# TODO

# 7. Kill cogserver```

### Mapping MOSES models into the AtomSpace (Detailed)

Here are more detailed suggestions about how to go about mapping models, their scores, and their features into AtomSpace hypergraphs.

#### Models

Models are exported in the following format:

```EquivalenceLink <1, 1>
PredicateNode <MODEL_PREDICATE_NAME>
<MODEL_BODY>```

#### Features

We need to related GeneNodes, used in the GO description, and PredicateNodes, used in the MOSES models. For that I suggest to use the predicate "overexpressed" as follows:

```EquivalenceLink <1, 1>
PredicateNode <GENE_NAME>
VariableNode "\$X"
PredicateNode "overexpressed"
GeneNode <GENE_NAME>
VariableNode "\$X"
```

which says that PredicateNode over individual \$X is equivalent to "GeneNode is overexpressed in individual \$X".

#### Fitnesses

Here we will discuss two fitnesses (as used by Mike): accuracy (1 - score, in Mike's terminology) and precision. We will then discuss confidence.

##### Accuracy

${\displaystyle ACC=(TP+TN)/(P+N)}$

We define an Accuracy predicate, that takes a model and dataset (or target feature) as arguments. The model, \$M, is itself a predicate that evaluates to 1 (the confidence is let aside for now) when the individual \$X is classified positively, 0 when it is classified negatively.

Similarity the target feature, \$D, is also a predicate that evaluates to 1 when the individual \$X has its target feature active, 0 otherwise.

```EquivalenceLink <1, 1>
VariableList
\$M
\$D
PredicateNode "accuracy"
\$M
\$D
VariableList
\$M
\$D
\$X
GetStrength
\$M
\$X
GetStrength
\$D
\$X
```

It turns out the ${\displaystyle TV}$ on the AverageLink is going to match the accuracy, given \$M and \$D. Indeed, the accuracy is the average number of times the model is correct with respect to the dataset. With this representation, given the dataset and the model, PLN can directly build the Accuracy predicate.

In the absence of dataset, and given the accuracy of each model, one may directly write down the Accuracy predicate for each model, and target feature:

```EvaluationLink <model accuracy>
PredicateNode "accuracy"
PredicateNode <MODEL>
PredicateNode <TARGET FEATURE>
```
##### Precision

The cool thing about precision is that it translates directly into an Implication ${\displaystyle TV}$ strength - that is:

```ImplicationLink <TV.s = model precision>
PredicateNode <MODEL>
PredicateNode <TARGET FEATURE>
```

Indeed, According to PLN (assuming all individuals are equiprobable)

${\displaystyle TV.s=Sum_{x}min(P(x),Q(x))/Sum_{x}P(x)}$

where ${\displaystyle P}$ correspond to the predicate of a model, ${\displaystyle x}$ runs over the individuals of the dataset.

This corresponds indeed to the precision:

${\displaystyle precision=TP/(TP+FP)}$

as ${\displaystyle Sum_{x}P(x)}$ is indeed the number of positively classified individuals ${\displaystyle (TP+FP)}$, and ${\displaystyle Sum_{x}min(P(x),Q(x))}$ the number of correctly classified individuals, ${\displaystyle TP}$.

##### Recall

Similarly recall is easily translated into an Implication ${\displaystyle TV}$ strength - that is:

```ImplicationLink <TV.s = model precision>
PredicateNode <TARGET FEATURE>
PredicateNode <MODEL>
```

given that:

${\displaystyle recall=TP/(TP+TN)}$

##### Confidence

The confidence can be:

${\displaystyle c=n/(n+k)}$

where ${\displaystyle n}$ is the number of individuals, and ${\displaystyle k}$ is a parameter.

## Quiz

1. Why would you want to export MOSES Models Into Atomspace?

 Because it's fun God will bless your first born You can pattern mine the MOSES models in Atomspace OpenCog requires MOSES models to exist in Atomspace for PLN to work OpenCog NLP will be crippled without the laws of MOSES MOSES is far too metaphysical, and needs to be grounded in a bunch of atoms in order for it to make sense Well.. at some stage MOSES will be built into Atomspace - so at that time you won't need to export MOSES models into Atomspace Because god will punish you if you don't

2. Is there any 10 commandments of using MOSES models in Atomspace?

 True False

Your score is 0 / 0

## Notes

Maintained by: Nil Priority: Medium Priority

Bunch of scripts to run MOSES, pipe the models into OpenCog and apply PLN to infer new knowledge: