Using The Pattern Miner

From OpenCog
Jump to: navigation, search

Theory

Our Pattern Miner is to mine frequent and interesting patterns from Atomspace. It's a hypergraph pattern miner. In abstract, Pattern Mining is the process of extracting an (often large) number of patterns from some body of information, subject to some criterion regarding which patterns are of interest. Often (but not exclusively) it refers to algorithms that are rapid or “greedy”, finding a large number of simple patterns relatively inexpensively.

Patternism is the philosophical principle holding that, from the perspective of engineering intelligent systems, it is sufficient and useful to think about mental processes in terms of (static and dynamical) patterns.

For further info see the Pattern Miner wiki page.

To dive even deeper read about evaluating 'interestingness' in patterns at the wiki entry on Interesting Pattern Mining

While you are here, read about Measuring Surprisingness and how that relates to mining patterns.

And for an example of how the Pattern Miner is used on Bio data see OpenCog Based Pattern Mining and Inference on Bio Knowledge.

On the general concept of patterns and novelty, here are a couple of books that may be of interest:

Quiz

1. What aspect of OpenCog does the Pattern Miner mine for interesting patterns?

'Moses'.
'Atomspace'.
'Head'
'The Singularity'

2. What is Patternism?

'from the perspective of engineering intelligent systems, it is sufficient and useful to think about mental processes in terms of (static and dynamical) patterns'.
'The creed of a cult of pattern worshipers who sit around smoking mandalas and excreting crossword puzzles'.
'A futurological principle of shnovelty which predicts that the future of life in the universe will seek to enlarge their wave functions by surfing the skin of super massive blackholes'.

3. What is a good way of identifying 'interestingness'?

'Reading Godel, Escher, Bach backwards'.
'As a omen which foreshadows degrees of fun'.
'Through evaluating "interaction information" and "Measuring Surprisingness"'.
'Watching Adam Ford's YouTube videos'.
'Watching the Kenny Everret video show and pressing the button marked idiot and to see what comes out'.

Your score is 0 / 0


Practice

Hands On

Tutorial of running Pattern Miner in Opencog

Where to find the source code

The pattern mining core algorithm code is in:

<OPENCOG_REPO>/opencog/opencog/learning/PatternMiner/

A non-distributed (run on a single machine) Pattern Miner and a distributed version Pattern Miner are implemented. The non-distributed agent can be found (under the above directory):

PatternMinerSCM.cc

The distributed server and client agents can be found:

DistributedPatternMinerServer.h
DistributedPatternMinerServer.cc
DistributedPatternMinerClient.h
DistributedPatternMinerClient.cc

Install dependent libraries

  • Install boost

You may need to install the full boost. Maybe you already installed before.You may only installed part of boost before. Both cpprest and pattern miner depend on many different sub libs of boost, so you'd better installing the full boost.

sudo apt-get install libboost-all-dev
  • Install cpprest

See the install instruction at https://github.com/Microsoft/cpprestsdk/wiki/How-to-build-for-Linux. Remember to do make install after you make.

Steps to run a non-distributed pattern miner test

  • test corpus file :

Go to <OPENCOG_REPO>/opencog/learning/PatternMiner/ , make sure file ugly_male_soda-drinker_corpus.scm is in this folder.

  • Compile opencog.
  • Start Cogserver (see Starting_cogserver to learn how to). If the Cogserver is started successfully, you should be able to see the following output:
username@xxxxx:~/opencog/build$ ./opencog/cogserver/server/cogserver -c ../lib/opencog_patternminer.conf 
...
...
Listening on port 17001

If it's connected to the Cogserver, you should be able to see the following output:

username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
opencog> 
  • Enter guile shell
opencog> scm
  • Load pattern miner scm module
guile> (use-modules (opencog patternminer))

If you get compile error here, please pull the AtomSpace and make intall it. git pull https://github.com/opencog/atomspace.git master

  • Load the example knowledge base into the AtomSpace:
guile> (load "../opencog/learning/PatternMiner/ugly_male_soda-drinker_corpus.scm")

If it is loaded successfully, the last Link in this corpus will be output in the guile shell, for example:

(InheritanceLink (stv 1 1)
   (ConceptNode "Kecia")
   (ConceptNode "soda drinker")
)

guile>
  • Run pattern miner:
guile> (pm-run-patternminer)

Note to load corpus into AtomSpace before run this function. Sometimes it will take a long time if the corpus is large (larger than 3M) or pattern-max-gram is larger than 3. Need to make sure you have enough RAM before runing it. If you get compile error here, please pull the AtomSpace and make intall it. git pull https://github.com/opencog/atomspace.git master When the Pattern Miner starts to run, in the cogserver terminal tab it will output:

Debug: PatternMining start! Max gram = 3, mode = Depth_First
Using 1 threads. 
Corpus size: 60 links in total. 

If the test corpus is very small, it will finish immediately. Otherwise wait till it finished. When the mining is finished, all the patterns with a frequency higher than the threshold will be output to files and in the terminal it will output:

Start thread 0: will process Link number from 0 to (excluded) 60
100% completed in Thread 0.d 0.
Finished mining 1~3 gram patterns.

processedLinkNum = 60
PatternMiner:  mining finished!
PatternMiner:  done frequent pattern mining for 1 to 3gram patterns!
gram = 1: 25 patterns found! 
Debug: PatternMiner: writing  (gram = 1) frequent patterns to file FrequentPatterns_1gram.scm

gram = 2: 9 patterns found! 
Debug: PatternMiner: writing  (gram = 2) frequent patterns to file FrequentPatterns_2gram.scm

gram = 3: 6 patterns found! 
Debug: PatternMiner: writing  (gram = 3) frequent patterns to file FrequentPatterns_3gram.scm

And then, if enable-interesting-pattern is set to true, it will start to evaluate interestingness of all the patterns with a frequency higher than the threshold. It will output:

Calculating interestingness for 2 gram patterns by evaluating surprisingness
100% completed.PatternMiner:  done (gram = 2) interestingness evaluation!9 patterns found! Outputting to file ... 
Debug: PatternMiner: writing (gram = 2) interesting patterns to file SurprisingnessI_2gram.scm

Debug: PatternMiner: writing (gram = 2) interesting patterns to file SurprisingnessII_2gram.scm
surprisingness_II_threshold for 2 gram = 0.95
Debug: PatternMiner: writing (gram = 2) final top patterns to file FinalTopPatterns_2gram.scm

And then it will continue on calculating the next gram. When everything is finished, it will output:

Pattern Mining Finished! Total time: 0 seconds. 
  • To check the results: go to <OPENCOG_REPO>/opencog/build/</code>, there should be some result file generated:
FrequentPatterns_xgram.scm
SurprisingnessI_xgram.scm
SurprisingnessII_xgram.scm
FinalTopPatterns_xgram.scm

Results Samples

You can open the corpus file ugly_male_soda-drinker_corpus.scm. It's a tiny made up corpus contains 10 men and 10 women, some are ugly, some drink soda. Let's look at the results. Take FrequentPatterns_3gram.scm as example. An example in this result file is:

Frequent Pattern Mining results for 3 gram patterns. Total pattern number: 6

Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode human)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode ugly)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode woman)



Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode human)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode soda drinker)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode woman)



Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode human)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode man)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode soda drinker)



Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode human)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode man)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode ugly)



Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode human)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode soda drinker)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode ugly)


Pattern: Frequency = 5
(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode man)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode soda drinker)

(InheritanceLink )
  (VariableNode $var_1)
  (ConceptNode ugly)

(You can see the related pages for more details: Pattern_Miner_Scheme_Functions)

Steps to run a distributed pattern miner test

  • Prepare machines

You should have at least two machines, one with at a lot of memory (like 32 GB) should the central server, and any small machine (like 2 GB memory) can be used to run the clients. Of course, if you just want to run a small test, you can still used the same corpus ugly_male_soda-drinker_corpus.scm so that you don't need a big server machine. All your machines should in the same LAN, so that they can communicate with each other.

  • Compile Opencog in all your machines.

If you didn't install cpprest, when you run cmake of opencog, it will output "cpprest not found. Pattern Miner will not be built." If you installed cpprest, cmake will output "cpprest is found."

  • test corpus file :

The test database is from http://wiki.dbpedia.org/ , which extracts the infobox information from Wikipedia pages, contains around half million pieces of truth. Go to https://github.com/opencog/test-datasets/tree/master/pattern%20miner/opencog/learning/PatternMiner/ and download the file dbpeida_data.zip, upzip it. It contains 4 files:

dpedia.scm  - the whole test database
dpedia_part_1.scm - the first part of dpedia.scm 
dpedia_part_2.scm - the second part of dpedia.scm 
dpedia_part_3.scm - the thrid part of dpedia.scm 

Copy dpedia_part_1.scm to /opencog/learning/PatternMiner/ of your first client machine, dpedia_part_2.scm to your second client machine, dpedia_part_3.scm to your third machine. If you only have one client machine, then just copy dpedia.scm to your client machine. The mining client will not use a lot of memory, as long as it can load the corpus. You don't need to put any corpus file into the server machine.

  • Config file in the server

Open the /opencog/build/lib/opencog_patternminer.conf file.

    • Comment out the line "learning/PatternMiner/ugly_male_soda-drinker_corpus.scm" , you don't need to load any corpus in the server.
    • Find "Pattern_Max_Gram" and set it to 3, because 2-gram patterns are too shallow, but 4-gram patterns are too many to mind, you will run out of memory probably.
Pattern_Max_Gram = 3
    • Find "PMCentralServerIP" and set it to your server IP address, "120.0.0.1" won't work. For example:
PMCentralServerIP = "192.163.0.110"
  • Config file in your clients

Open the /opencog/build/lib/opencog_patternminer.conf file in each of your client machine.

    • Comment out the line "learning/PatternMiner/ugly_male_soda-drinker_corpus.scm", unless you still use this corpus as the test corpus.
    • Uncomment the corresponding one in the follows , which you want to run in this client:
#                        learning/PatternMiner/dpedia.scm
#                        learning/PatternMiner/dpedia_part_1.scm
#                        learning/PatternMiner/dpedia_part_2.scm
#                        learning/PatternMiner/dpedia_part_3.scm
    • Find "Pattern_Max_Gram" and set it to 3, because 2-gram patterns are too shallow, but 4-gram patterns are too many to mind, you will run out of memory probably. This should be set the same as how you set your server config file.
Pattern_Max_Gram = 3
    • Find "PMCentralServerIP" and set it to the same IP address as you have set in your server config file.
  • Start the pattern miner central server

In your server machine: The same run the non-distributed one, first start Cogserver and connect to Cogserver by rlwrap telnet. When it output:

username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
opencog> 

Run pattern miner central server agent:

opencog> loadmodule opencog/learning/PatternMiner/libDistributedPatternMinerServer.so

If it is loaded successfully, you should be able to see the following output:

done
opencog> 

And it should out put this message in the cogserver terminal:

Pattern Miner central server started! x threads using to parse patterns.
  • Start a pattern miner client

In one of your client machine: The same run the non-distributed one, first start Cogserver and connect to Cogserver by rlwrap telnet. When it output:

username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
opencog> 

Run pattern miner client agent:

opencog> loadmodule opencog/learning/PatternMiner/libDistributedPatternMinerClient.so

If it is loaded successfully, you should be able to see the following output:

done
opencog> 

And it should out put such a message in the cogserver terminal:

Current client UID = bc4c1a62-9f54-4859-bcb8-94f19fd19332
Registering to the central server: 192.168.0.3

And then if it succeeded in connecting to the server, it will output:

Registered to the central server successfully! 
Start pattern mining work! Max gram = 3, mode = Depth_First
Start thread 0 from xxxx  to xxxxx 

At the same time, the server terminal should output:

A new worker connected! ClientID = 
ClientUID=bc4c1a62-9f54-4859-bcb8-94f19fd19332
xxxx received patterns parsed...

Start all your other clients in the same way.

  • Finish Mining

When a client finishes mining, it will output:

100% completed in Thread x.
Finished mining 1~x gram patterns.

processedLinkNum = xxxxxx 
Totally xxxxxxx patterns found!
Current pattern mining worker finished working! Total time: xxxxxx seconds. 

And then it will report to the central server that this client stopped, if the central server receives the report, the client will output the following message and quit:

Report to the central server this worker stopped successfully! 
Client quited!

At the same time the server will output:

Got request to ReportWorkerStop: ClientID = 
ClientUID=bc4c1a62-9f54-4859-bcb8-94f19fd19332

When all the clients report finish working, the server will output:

All connected clients have finished and all the received patterns have been parsed by the server.
Enter 'y' or 'yes' to start evaluating pattern interestingness.
Enter any other words to keep waiting for more clients to connect

Now if you still have new clients need to connect to the server, you can enter other words.If you enter "y" or "yes" , the server will start to run the interestingness evaluation and then output result files.

  • Continue mining after network problems

If there is any network problem happens, as long as you do not shut down server and client processes, the mining should automatically continue after your network is recovered.

Info

People who look after this page: Adam, Shujing or Mandeep

Partly adapted from : Tutorial_of_running_Pattern_Miner_in_Opencog

Priority: Medium?