Pattern miner
The Pattern Miner is a system for discovering frequent and interesting patterns in the AtomSpace. It is no longer being developed or maintained, and thus is one of the OpenCog Fossils.
The reasons for abandonment are:
- It depends on the URE, which itself has been abandoned.
- Results were perhaps underwhelming? Or perhaps non-scalable? Mining a few hundred patterns from some ten-thousand is perhaps not hard, but isolating a hundred thousand of them out of tens of millions presents a scalability and performance challenge. (This challenge is met by the language learning project.)
- There's no clear answer to "what's next?" So you found a bunch of patterns ... now what?
A similar function is provided by the language learning project. The representational details differ significantly. Here, in the pattern miner, the expressions to be discovered are in the form of lambda-expressions, i.e. "atomic terms" the form of f(x,y,z,...) with explicit variables x,y,z, ... By contrast, the language learning system works with jigsaw-puzzle pieces, where "variables" are replaced by generic "connectors" that can "connect things together". The idea is that connectors are agnostic about what's an "input" and what's an "output" (what's a "variable" and what's a "term"), whereas lambda expressions always have a forced hierarchical structure (You must work with expressions (terms) and variables, when using lambdas (atomic terms).)
The language learning subsystem also explicitly organizes the discovered patterns into a sparse vector, so that conventional vector representations can be used. The goal is to bridge the gap between symbolic representations (graphlets) and vector representations (as typical of deep learning neural nets).
Introduction
The Pattern Miner mines frequent and interesting patterns from Atomspace. It's a hypergraph pattern miner. It can be seen as the invert function of pattern matching, that is given a dataset of grounding atoms, find a pattern matcher query that would return a frequent and interesting enough collection of matches.
What is a Pattern?
A pattern is a connected set of links and nodes with variables that occurs in the Atomspace repeatedly for multiple times. For example: Americans drink Coke:
(Lambda (Variable "$X") (And (Evaluation (Predicate "drink") (List (VariableNode "$X") (ConceptNode “Coke"))) (InheritanceLink (VariableNode "$var_1") (ConceptNode "American"))))
If many instances can be found in the Atomspace that fit this pattern, then this is a frequent pattern. Such instances would be like:
(EvaluationLink (PredicateNode "drink") (ListLink (VariableNode "Ben") (ConceptNode “Coke"))) (InheritanceLink (VariableNode "Ben") (ConceptNode "American")) ... (EvaluationLink (PredicateNode "drink") (ListLink (VariableNode "Andrew") (ConceptNode “Coke"))) (InheritanceLink (VariableNode "Andrew") (ConceptNode "American")) ...
Pattern Gram or Conjunct
- 1-gram/1-conjunct pattern contains 1 clause.
- 2-gram/2-conjunct pattern contains 2 clauses.
- N-gram/N-conjunct pattern contains n clauses.
- Gram indicates the number of clauses of a pattern.
Pattern Mining Algorithm
The pattern mining is implemented on top of the URE, so finding patterns is framed as a form of reasoning to justify the interestingness of patterns. The algorithm itself is still greedy in nature. First, frequent 1-conjunct patterns are produced
then combined to produce N-conjunct patterns
See the pattern miner README.md for an indept presentation of the algorithm.
Interesting Pattern Mining
Most of frequent patterns are boring. We use a surprisingness measure to find interesting patterns. For now only I-Surprisingness is implemented, see Interesting Pattern Mining and Measuring Surprisingness, in the future a broader measure of surprisingness, realizing the notion of surprisingness as what is contrary to expectation, will be implemented, see [1]
Usage and Examples
Use Cases
Here's a list of pattern miner use cases. This is handy to drive the pattern miner development, its API, etc. Ideally each of these should be turned into examples put under [2], as well as unit tests.
- Mining of inference histories, for inference control
- Mining of dialogue histories, for learning dialogue patterns (or more generally, verbal/nonverbal interaction patterns)
- Mining of sets of genomic datasets or medical patient records, to find surprisingly common combinations of features
- Mining of surprising combinations of visual features in the output of a relatively "disentangled" deep NN (such as the pyramid-of-InfoGANs that Ralf, Selameab, Tesfa, Yenat and I are working on)
- Mining of surprising combinations of semantic relationships, in the R2L output of a large number of simple sentences read into Atomspace
- Mining of surprising combinations of syntactic relationships, in an Atomspace containing a set of syntactic relationships corresponding to each word in the dictionary of a given language (to be done iteratively within the language learning algorithm Linas is implementing)
- Mining of surprising (link-parser link combination, Lojban-Atomese-output combination) pairs, in a corpus of (link parses, Lojban-Atomese outputs) obtained from a parallel (English, Lojban) corpus
See Pattern_Miner_Prospective_Examples for detailed suggestions on some use cases.