Using The Pattern Miner
Warning: that page needs to be updated for the new pattern miner. Meanwhile see [1] which contains references to examples at the end.
Contents
Theory
Our Pattern Miner is to mine frequent and interesting patterns from Atomspace. It's a hypergraph pattern miner. In abstract, Pattern Mining is the process of extracting an (often large) number of patterns from some body of information, subject to some criterion regarding which patterns are of interest. Often (but not exclusively) it refers to algorithms that are rapid or “greedy”, finding a large number of simple patterns relatively inexpensively.
Patternism is the philosophical principle holding that, from the perspective of engineering intelligent systems, it is sufficient and useful to think about mental processes in terms of (static and dynamical) patterns.
For further info see the Pattern Miner wiki page.
To dive even deeper read about evaluating 'interestingness' in patterns at the wiki entry on Interesting Pattern Mining
While you are here, read about Measuring Surprisingness and how that relates to mining patterns.
And for an example of how the Pattern Miner is used on Bio data see OpenCog Based Pattern Mining and Inference on Bio Knowledge.
On the general concept of patterns and novelty, here are a couple of books that may be of interest:
- Creativity and Art - Three Roads to Surprise by Margaret A. Bowden
- The Hidden Pattern by Ben Goertzel (free PDF)
Quiz
Practice
Hands On
Tutorial of running Pattern Miner in Opencog
Where to find the source code
The pattern mining core algorithm code is in:
<OPENCOG_REPO>/opencog/opencog/learning/PatternMiner/
A non-distributed (run on a single machine) Pattern Miner and a distributed version Pattern Miner are implemented. The non-distributed agent can be found (under the above directory):
PatternMinerSCM.cc
The distributed server and client agents can be found:
DistributedPatternMinerServer.h DistributedPatternMinerServer.cc DistributedPatternMinerClient.h DistributedPatternMinerClient.cc
Install dependent libraries
- Install boost
You may need to install the full boost. Maybe you already installed before.You may only installed part of boost before. Both cpprest and pattern miner depend on many different sub libs of boost, so you'd better installing the full boost.
sudo apt-get install libboost-all-dev
- Install cpprest
See the install instruction at https://github.com/Microsoft/cpprestsdk/wiki/How-to-build-for-Linux. Remember to do make install after you make.
Steps to run a non-distributed pattern miner test
- test corpus file :
Go to <OPENCOG_REPO>/opencog/learning/PatternMiner/
, make sure file ugly_male_soda-drinker_corpus.scm
is in this folder.
- Compile opencog.
- Start Cogserver (see Starting_cogserver to learn how to). If the Cogserver is started successfully, you should be able to see the following output:
username@xxxxx:~/opencog/build$ ./opencog/cogserver/server/cogserver -c ../lib/opencog_patternminer.conf ... ... Listening on port 17001
- Connect to Cogserver (see Connecting_to_the_Cogserver to learn how to).
If it's connected to the Cogserver, you should be able to see the following output:
username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. opencog>
- Enter guile shell
opencog> scm
- Load pattern miner scm module
guile> (use-modules (opencog patternminer))
If you get compile error here, please pull the AtomSpace and make intall it. git pull https://github.com/opencog/atomspace.git master
- Load the example knowledge base into the AtomSpace:
guile> (load "../opencog/learning/PatternMiner/ugly_male_soda-drinker_corpus.scm")
If it is loaded successfully, the last Link in this corpus will be output in the guile shell, for example:
(InheritanceLink (stv 1 1) (ConceptNode "Kecia") (ConceptNode "soda drinker") ) guile>
- Run pattern miner:
guile> (pm-run-patternminer)
Note to load corpus into AtomSpace before run this function. Sometimes it will take a long time if the corpus is large (larger than 3M) or pattern-max-gram is larger than 3. Need to make sure you have enough RAM before runing it. If you get compile error here, please pull the AtomSpace and make intall it. git pull https://github.com/opencog/atomspace.git master When the Pattern Miner starts to run, in the cogserver terminal tab it will output:
Debug: PatternMining start! Max gram = 3, mode = Depth_First Using 1 threads. Corpus size: 60 links in total.
If the test corpus is very small, it will finish immediately. Otherwise wait till it finished. When the mining is finished, all the patterns with a frequency higher than the threshold will be output to files and in the terminal it will output:
Start thread 0: will process Link number from 0 to (excluded) 60 100% completed in Thread 0.d 0. Finished mining 1~3 gram patterns. processedLinkNum = 60 PatternMiner: mining finished! PatternMiner: done frequent pattern mining for 1 to 3gram patterns! gram = 1: 25 patterns found! Debug: PatternMiner: writing (gram = 1) frequent patterns to file FrequentPatterns_1gram.scm gram = 2: 9 patterns found! Debug: PatternMiner: writing (gram = 2) frequent patterns to file FrequentPatterns_2gram.scm gram = 3: 6 patterns found! Debug: PatternMiner: writing (gram = 3) frequent patterns to file FrequentPatterns_3gram.scm
And then, if enable-interesting-pattern is set to true, it will start to evaluate interestingness of all the patterns with a frequency higher than the threshold. It will output:
Calculating interestingness for 2 gram patterns by evaluating surprisingness 100% completed.PatternMiner: done (gram = 2) interestingness evaluation!9 patterns found! Outputting to file ... Debug: PatternMiner: writing (gram = 2) interesting patterns to file SurprisingnessI_2gram.scm Debug: PatternMiner: writing (gram = 2) interesting patterns to file SurprisingnessII_2gram.scm surprisingness_II_threshold for 2 gram = 0.95 Debug: PatternMiner: writing (gram = 2) final top patterns to file FinalTopPatterns_2gram.scm
And then it will continue on calculating the next gram. When everything is finished, it will output:
Pattern Mining Finished! Total time: 0 seconds.
- To check the results: go to <OPENCOG_REPO>/opencog/build/</code>, there should be some result file generated:
FrequentPatterns_xgram.scm SurprisingnessI_xgram.scm SurprisingnessII_xgram.scm FinalTopPatterns_xgram.scm
Results Samples
You can open the corpus file ugly_male_soda-drinker_corpus.scm
. It's a tiny made up corpus contains 10 men and 10 women, some are ugly, some drink soda.
Let's look at the results. Take FrequentPatterns_3gram.scm
as example. An example in this result file is:
Frequent Pattern Mining results for 3 gram patterns. Total pattern number: 6 Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode human) (InheritanceLink ) (VariableNode $var_1) (ConceptNode ugly) (InheritanceLink ) (VariableNode $var_1) (ConceptNode woman) Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode human) (InheritanceLink ) (VariableNode $var_1) (ConceptNode soda drinker) (InheritanceLink ) (VariableNode $var_1) (ConceptNode woman) Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode human) (InheritanceLink ) (VariableNode $var_1) (ConceptNode man) (InheritanceLink ) (VariableNode $var_1) (ConceptNode soda drinker) Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode human) (InheritanceLink ) (VariableNode $var_1) (ConceptNode man) (InheritanceLink ) (VariableNode $var_1) (ConceptNode ugly) Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode human) (InheritanceLink ) (VariableNode $var_1) (ConceptNode soda drinker) (InheritanceLink ) (VariableNode $var_1) (ConceptNode ugly) Pattern: Frequency = 5 (InheritanceLink ) (VariableNode $var_1) (ConceptNode man) (InheritanceLink ) (VariableNode $var_1) (ConceptNode soda drinker) (InheritanceLink ) (VariableNode $var_1) (ConceptNode ugly)
(You can see the related pages for more details: Pattern_Miner_Scheme_Functions)
Steps to run a distributed pattern miner test
- Prepare machines
You should have at least two machines, one with at a lot of memory (like 32 GB) should the central server, and any small machine (like 2 GB memory) can be used to run the clients. Of course, if you just want to run a small test, you can still used the same corpus ugly_male_soda-drinker_corpus.scm so that you don't need a big server machine. All your machines should in the same LAN, so that they can communicate with each other.
- Compile Opencog in all your machines.
If you didn't install cpprest, when you run cmake of opencog, it will output "cpprest not found. Pattern Miner will not be built." If you installed cpprest, cmake will output "cpprest is found."
- test corpus file :
The test database is from http://wiki.dbpedia.org/ , which extracts the infobox information from Wikipedia pages, contains around half million pieces of truth. Go to https://github.com/opencog/test-datasets/tree/master/pattern%20miner/opencog/learning/PatternMiner/ and download the file dbpeida_data.zip, upzip it. It contains 4 files:
dpedia.scm - the whole test database dpedia_part_1.scm - the first part of dpedia.scm dpedia_part_2.scm - the second part of dpedia.scm dpedia_part_3.scm - the thrid part of dpedia.scm
Copy dpedia_part_1.scm to /opencog/learning/PatternMiner/ of your first client machine, dpedia_part_2.scm to your second client machine, dpedia_part_3.scm to your third machine. If you only have one client machine, then just copy dpedia.scm to your client machine. The mining client will not use a lot of memory, as long as it can load the corpus. You don't need to put any corpus file into the server machine.
- Config file in the server
Open the /opencog/build/lib/opencog_patternminer.conf file.
- Comment out the line "learning/PatternMiner/ugly_male_soda-drinker_corpus.scm" , you don't need to load any corpus in the server.
- Find "Pattern_Max_Gram" and set it to 3, because 2-gram patterns are too shallow, but 4-gram patterns are too many to mind, you will run out of memory probably.
Pattern_Max_Gram = 3
- Find "PMCentralServerIP" and set it to your server IP address, "120.0.0.1" won't work. For example:
PMCentralServerIP = "192.163.0.110"
- Config file in your clients
Open the /opencog/build/lib/opencog_patternminer.conf file in each of your client machine.
- Comment out the line "learning/PatternMiner/ugly_male_soda-drinker_corpus.scm", unless you still use this corpus as the test corpus.
- Uncomment the corresponding one in the follows , which you want to run in this client:
# learning/PatternMiner/dpedia.scm # learning/PatternMiner/dpedia_part_1.scm # learning/PatternMiner/dpedia_part_2.scm # learning/PatternMiner/dpedia_part_3.scm
- Find "Pattern_Max_Gram" and set it to 3, because 2-gram patterns are too shallow, but 4-gram patterns are too many to mind, you will run out of memory probably. This should be set the same as how you set your server config file.
Pattern_Max_Gram = 3
- Find "PMCentralServerIP" and set it to the same IP address as you have set in your server config file.
- Start the pattern miner central server
In your server machine: The same run the non-distributed one, first start Cogserver and connect to Cogserver by rlwrap telnet. When it output:
username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. opencog>
Run pattern miner central server agent:
opencog> loadmodule opencog/learning/PatternMiner/libDistributedPatternMinerServer.so
If it is loaded successfully, you should be able to see the following output:
done opencog>
And it should out put this message in the cogserver terminal:
Pattern Miner central server started! x threads using to parse patterns.
- Start a pattern miner client
In one of your client machine: The same run the non-distributed one, first start Cogserver and connect to Cogserver by rlwrap telnet. When it output:
username@xxxxx:~/opencog/build$ rlwrap telnet localhost 17001 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. opencog>
Run pattern miner client agent:
opencog> loadmodule opencog/learning/PatternMiner/libDistributedPatternMinerClient.so
If it is loaded successfully, you should be able to see the following output:
done opencog>
And it should out put such a message in the cogserver terminal:
Current client UID = bc4c1a62-9f54-4859-bcb8-94f19fd19332 Registering to the central server: 192.168.0.3
And then if it succeeded in connecting to the server, it will output:
Registered to the central server successfully! Start pattern mining work! Max gram = 3, mode = Depth_First Start thread 0 from xxxx to xxxxx
At the same time, the server terminal should output:
A new worker connected! ClientID = ClientUID=bc4c1a62-9f54-4859-bcb8-94f19fd19332 xxxx received patterns parsed...
Start all your other clients in the same way.
- Finish Mining
When a client finishes mining, it will output:
100% completed in Thread x. Finished mining 1~x gram patterns. processedLinkNum = xxxxxx Totally xxxxxxx patterns found! Current pattern mining worker finished working! Total time: xxxxxx seconds.
And then it will report to the central server that this client stopped, if the central server receives the report, the client will output the following message and quit:
Report to the central server this worker stopped successfully! Client quited!
At the same time the server will output:
Got request to ReportWorkerStop: ClientID = ClientUID=bc4c1a62-9f54-4859-bcb8-94f19fd19332
When all the clients report finish working, the server will output:
All connected clients have finished and all the received patterns have been parsed by the server. Enter 'y' or 'yes' to start evaluating pattern interestingness. Enter any other words to keep waiting for more clients to connect
Now if you still have new clients need to connect to the server, you can enter other words.If you enter "y" or "yes" , the server will start to run the interestingness evaluation and then output result files.
- Continue mining after network problems
If there is any network problem happens, as long as you do not shut down server and client processes, the mining should automatically continue after your network is recovered.
Info
People who look after this page: Adam, Shujing or Mandeep
Partly adapted from : Tutorial_of_running_Pattern_Miner_in_Opencog
Priority: Medium?