Pattern miner

From OpenCog
(Redirected from Pattern Miner)
Jump to: navigation, search

Introduction

The Pattern Miner mines frequent and interesting patterns from Atomspace. It's a hypergraph pattern miner. It can be seen as the invert function of pattern matching, that is given a dataset of grounding atoms, find a pattern matcher query that would return a frequent and interesting enough collection of matches.

What is a Pattern?

A pattern is a connected set of links and nodes with variables that occurs in the Atomspace repeatedly for multiple times. For example: Americans drink Coke:

(Lambda
   (Variable "$X")
   (And
      (Evaluation
         (Predicate "drink") 
         (List 
            (VariableNode "$X") 
            (ConceptNode “Coke")))
      (InheritanceLink
         (VariableNode "$var_1") 
         (ConceptNode "American"))))

If many instances can be found in the Atomspace that fit this pattern, then this is a frequent pattern. Such instances would be like:

(EvaluationLink
   (PredicateNode "drink") 
   (ListLink 
      (VariableNode "Ben") 
      (ConceptNode “Coke"))) 
(InheritanceLink
   (VariableNode "Ben") 
   (ConceptNode "American"))
...
(EvaluationLink
   (PredicateNode "drink") 
   (ListLink 
      (VariableNode "Andrew") 
      (ConceptNode “Coke"))) 
(InheritanceLink
   (VariableNode "Andrew") 
   (ConceptNode "American"))
...

Pattern Gram or Conjunct

  • 1-gram/1-conjunct pattern contains 1 clause.
  • 2-gram/2-conjunct pattern contains 2 clauses.
  • N-gram/N-conjunct pattern contains n clauses.
  • Gram indicates the number of clauses of a pattern.

Pattern Mining Algorithm

The pattern mining is implemented on top of the URE, so finding patterns is framed as a form of reasoning to justify the interestingness of patterns. The algorithm itself is still greedy in nature. First, frequent 1-conjunct patterns are produced

Extract frist gram patterns.jpg

then combined to produce N-conjunct patterns

Pattern growing.png

See [1] for an indept presentation of the algorithm.

Interesting Pattern Mining

Most of frequent patterns are boring. We use a surprisingness measure to find interesting patterns. For now only I-Surprisingness is implemented, see Interesting Pattern Mining and Measuring Surprisingness, in the future a broader measure of surprisingness, realizing the notion of surprisingness as what is contrary to expectation, will be implemented, see [2]

Usage and Examples

Use Cases

Here's a list of pattern miner use cases. This is handy to drive the pattern miner development, its API, etc. Ideally each of these should be turned into examples put under [3], as well as unit tests.

  • Mining of inference histories, for inference control
  • Mining of dialogue histories, for learning dialogue patterns (or more generally, verbal/nonverbal interaction patterns)
  • Mining of sets of genomic datasets or medical patient records, to find surprisingly common combinations of features
  • Mining of surprising combinations of visual features in the output of a relatively "disentangled" deep NN (such as the pyramid-of-InfoGANs that Ralf, Selameab, Tesfa, Yenat and I are working on)
  • Mining of surprising combinations of semantic relationships, in the R2L output of a large number of simple sentences read into Atomspace
  • Mining of surprising combinations of syntactic relationships, in an Atomspace containing a set of syntactic relationships corresponding to each word in the dictionary of a given language (to be done iteratively within the language learning algorithm Linas is implementing)
  • Mining of surprising (link-parser link combination, Lojban-Atomese-output combination) pairs, in a corpus of (link parses, Lojban-Atomese outputs) obtained from a parallel (English, Lojban) corpus

See Pattern_Miner_Prospective_Examples for detailed suggestions on some use cases.