# Pattern miner

(Redirected from Pattern mining)

## Introduction

The Pattern Miner mines frequent and interesting patterns from Atomspace. It's a hypergraph pattern miner. It can be seen as the invert function of pattern matching, that is given a dataset of grounding atoms, find a pattern matcher query that would return a frequent and interesting enough collection of matches.

### What is a Pattern?

A pattern is a connected set of links and nodes with variables that occurs in the Atomspace repeatedly for multiple times. For example: Americans drink Coke:

```(Lambda
(Variable "\$X")
(And
(Evaluation
(Predicate "drink")
(List
(VariableNode "\$X")
(ConceptNode “Coke")))
(VariableNode "\$var_1")
(ConceptNode "American"))))
```

If many instances can be found in the Atomspace that fit this pattern, then this is a frequent pattern. Such instances would be like:

```(EvaluationLink
(PredicateNode "drink")
(VariableNode "Ben")
(ConceptNode “Coke")))
(VariableNode "Ben")
(ConceptNode "American"))
...
(PredicateNode "drink")
(VariableNode "Andrew")
(ConceptNode “Coke")))
(VariableNode "Andrew")
(ConceptNode "American"))
...
```

### Pattern Gram or Conjunct

• 1-gram/1-conjunct pattern contains 1 clause.
• 2-gram/2-conjunct pattern contains 2 clauses.
• N-gram/N-conjunct pattern contains n clauses.
• Gram indicates the number of clauses of a pattern.

### Pattern Mining Algorithm

The pattern mining is implemented on top of the URE, so finding patterns is framed as a form of reasoning to justify the interestingness of patterns. The algorithm itself is still greedy in nature. First, frequent 1-conjunct patterns are produced

then combined to produce N-conjunct patterns

See the pattern miner README.md for an indept presentation of the algorithm.

### Interesting Pattern Mining

Most of frequent patterns are boring. We use a surprisingness measure to find interesting patterns. For now only I-Surprisingness is implemented, see Interesting Pattern Mining and Measuring Surprisingness, in the future a broader measure of surprisingness, realizing the notion of surprisingness as what is contrary to expectation, will be implemented, see [1]

## Use Cases

Here's a list of pattern miner use cases. This is handy to drive the pattern miner development, its API, etc. Ideally each of these should be turned into examples put under [2], as well as unit tests.

• Mining of inference histories, for inference control
• Mining of dialogue histories, for learning dialogue patterns (or more generally, verbal/nonverbal interaction patterns)
• Mining of sets of genomic datasets or medical patient records, to find surprisingly common combinations of features
• Mining of surprising combinations of visual features in the output of a relatively "disentangled" deep NN (such as the pyramid-of-InfoGANs that Ralf, Selameab, Tesfa, Yenat and I are working on)
• Mining of surprising combinations of semantic relationships, in the R2L output of a large number of simple sentences read into Atomspace
• Mining of surprising combinations of syntactic relationships, in an Atomspace containing a set of syntactic relationships corresponding to each word in the dictionary of a given language (to be done iteratively within the language learning algorithm Linas is implementing)
• Mining of surprising (link-parser link combination, Lojban-Atomese-output combination) pairs, in a corpus of (link parses, Lojban-Atomese outputs) obtained from a parallel (English, Lojban) corpus

See Pattern_Miner_Prospective_Examples for detailed suggestions on some use cases.