MOSES for Noobs - Flowers

From OpenCog
Jump to: navigation, search

Description

This page will show you how to use MOSES to classify various subtypes of the Iris flower.

Before you start, you may want to take a look at the MOSES terminology, and maybe browse through the MOSES man page. Having done the MOSES for Noobs tutorial and the subsequent MOSES for Noobs - Logic Gates tutorial might be useful too.

Prerequisites

This page has the following prerequisites:


A basic video tutorial on using MOSES with Nil Geisweiller is here

Contents

This tutorial expands on the previous MOSES for Noobs tutorials by letting MOSES classify various types of the Iris flower. Biologists have measured 4 features of 150 flowers. These 150 flowers are all subgroups of the Iris flower. We're going to let MOSES learn which of these features distinguish one subgroup from the next. For more information about the features recorded in the dataset and some examples of the flowers, see the original wiki page.

The original unprepared data file can be found here. To run MOSES for all 3 subgroups, you can use the modified files IrisSetosa.txt, IrisVersicolor.txt and IrisVirginica.txt.

Let's look at a few lines from the original iris.data first.

SL      SW      PL      PW      CLASS
5.1     3.5     1.4     0.2     Iris-setosa
4.9     3       1.4     0.2     Iris-setosa
4.7     3.2     1.3     0.2     Iris-setosa
4.6     3.1     1.5     0.2     Iris-setosa

As you can see, the target feature (CLASS), is in the last column of the data file rather than the first. Since MOSES assumes the target feature to be in the first column, we'll have to remember to specify that on the command line with the -u flag.

Also, the original data file categorizes the entries into 3 subgroups (Iris-setosa, Iris-versicolor and Iris-virginica). For various reasons, we've already changed the three datafiles to only distinguish two groups. In the IrisSetosa file, this means changing all the instances of 'Iris-setosa' to '1', and all instances of Iris-versicolor and Iris-virginica to 0. This way MOSES will hopefully provide us with a means to distinguish between these two groups. The other two files do the same for their target groups.

Running MOSES

See the above section for the required data files.

To let MOSES process a file, put the file somewhere you can let MOSES find it, and then include it in the command line. Assuming you put it in a subfolder called Flowers, it would be

moses -i Flowers/IrisSetosa.txt -W1 -u CLASS

The w1 parameter tells MOSES to use the column names found in the file in the generation of output, rather than just assigning indexes to all the input and output variables. The -u parameter tells MOSES the next parameter will specify which of the columns contains the target feature, in our case the CLASS column.

Understanding the output

The output should be something like

0 not(0<(+(-1 *($PW $PW) $PW))) 
0 not(0<(+(-1 *(+(1 $PW) $PW)))) 
-1 not(0<(+(-0.5 $PW))) 
-1 not(0<(+(-1 $PW $PW))) 
-1 not(0<(+(-1 *(2 $PW)))) 
-50 not(0<($SL)) 
-50 not(0<($SW)) 
-50 not(0<($PL)) 
-50 not(0<($PW)) 
-50 and(not(0<($SL)) 0<($SW)) 

As we learned before, the functions with the best score will appear first. In this case we have two functions that give a perfect score. What MOSES is saying here, is that if PW squared plus PW minus 1 results in a value below zero, it can be classified as IrisSetosa!

Let's try the other files too. This one may run a little longer.

moses -i Flowers/IrisVirginica.txt -W1 -u CLASS

Which gave the output

-4 not(0<(+(*(-1 $PL $PW) $SL $PW))) 
-4 not(0<(+(*(+(1 *(-1 $PL)) $PW) $SL))) 
-4 and(not(0<(+(*(-1 $PL $PW) $SL $PW))) 0<($SL)) 
-4 and(not(0<(+(*(-1 $PL $PW) $SL $PW))) 0<($SW)) 
-4 and(not(0<(+(*(-1 $PL $PW) $SL $PW))) 0<($PL)) 
-4 and(not(0<(+(*(-1 $PL $PW) $SL $PW))) 0<($PW)) 
-4 or(not(0<(+(*(-1 $PL $PW) $SL $PW))) not(0<($SL))) 
-4 or(not(0<(+(*(-1 $PL $PW) $SL $PW))) not(0<($SW))) 
-4 or(not(0<(+(*(-1 $PL $PW) $SL $PW))) not(0<($PL))) 
-4 or(not(0<(+(*(-1 $PL $PW) $SL $PW))) not(0<($PW))) 

Sadly no perfect score this time, but -4 out of 150 (97.3% success rate) is still pretty good.

Finally let's run through IrisVersicolor too:

moses -i Flowers/IrisVersicolor.txt -W1 -u CLASS

Which gives us:

-49 not(0<(+(-3 $SW))) 
-49 not(0<(+(-2 $SW))) 
-49 not(0<(+(-1.5 *(0.5 $SW)))) 
-49 not(0<(+(-1 *(0.333333 $SW)))) 
-49 not(0<(+(-1 *(0.5 $SW)))) 
-49 and(not(0<(+(-3 $SW))) 0<($SL)) 
-49 and(not(0<(+(-3 $SW))) 0<($SW)) 
-49 and(not(0<(+(-3 $SW))) 0<($PL)) 
-49 and(not(0<(+(-3 $SW))) 0<($PW)) 
-49 and(not(0<(+(-2 $SW))) 0<($SL)) 

This gives a less satisfying result, where only 101 out of the 150 input records are classified correctly by the best scoring functions. Practically, since the other 2 subgroups did yield satisfying functions, we could use any record that doesn't fit in the first two to categorize the record as Versicolor.

Next Steps

Of course the sets we used above have not been validated, we could just be overfitting! MOSES does come with an evaluation command "eval-table", but that requires writing a wrapper to fold the dataset and test each individual testtrain pair. This will be the subject of the next MOSES tutorial.

Q&A

Any questions?

Then please leave them here.