OpenCogPrime:NLP

From OpenCog
Jump to: navigation, search

This page provides a brief review of natural language processing within OpenCog. Additional details, including current status, HOW-TO's, and future plans, are provided in the NLP page.

NLP in OpenCog Prime

OpenCog Prime is sufficiently flexible to support multiple approaches to natural language comprehension and generation.

At risk of oversimplifying, I'll articulate four different possible approaches:

  1. rule-based, where grammatical and semantic-mapping rules are encoded by hand
  2. statistical linguistics, where such rules are induces from linguistic corpus analysis of large volumes of text
  3. experiential and embodied, where language is learned via an embodied agent controlled by OpenCog Prime, that converses with human agents in the course of its life
  4. hybrid

An argument for the hybrid approach is given in the paper http://www.goertzel.org/new_research/WCCI_AGI.pdf, presented at the Special Session on Human-Level Intelligence at the WCCI conference in Hong Kong in June 2008.

A depiction of some of the shortcomings of current statistical and rule-based NLP technology is given in the fragmentary PowerPoint presentation attached here XXX.

RelEx and OpenCog Prime

The RelEx NLP framework, which is integrated with OpenCog, is a combination of rule-based and statistical NLP.

As such, it can be used in its current form as a sort of "linguistic input channel" to an OpenCog Prime system, feeding it Atoms formed from English sentences, and (once the NL generation portion of it is complete) producing English sentences from appropriate collections of Atoms.

More interestingly, though, as described in XX, RelEx may be used as a seed for an experiential/embodied-learning based OpenCog Prime NLP facility.

In order to do this right, most of the existing RelEx code would have to be replaced with new code having similar functionality (but more flexibility) coded within OpenCog. However, the human-coded rule-sets that form the meat of RelEx would be preserved and imported into OpenCog as Atoms:

  1. the link parser dictionary
  2. the RelEx dependency mapping rules that map link parser output into dependency-grammar-style predicate-argument relationships
  3. the RelEx2Frame rules that map the dependency relations into more fully normalized semantic relationships

The link parser algorithm itself could be wrapped up in an OpenCogPrime:Task; and, the application of the semantic-mapping and RelEx2Frame rules could be carried out by the PLN forward-chainer within OpenCog.

The link parser dictionary could be represented as a collection of ExtensionalInheritanceLinks between Nodes representing word senses and nodes representing grammatical categories.

In this way, the RelEx infrastructure would be presented in a way that would be appropriately susceptible to adaptation via OpenCog's learning mechanisms. Experiential adaptation in embodied contexts could then be used to adaptively improve the Atoms representing the initial, human-coded RelEx rules.

There is really nothing preventing this from being done, it's just a bit of work.

Moving from Link Grammar to Word Grammar

RelEx is founded on link grammar, which is a grammatical formalism with some plusses and some minuses.

A moderately detailed plan has been laid out for replacing the link parser within RelEx with an alternative parser based on Word Grammar, called PROWL Grammar (Probabilistic Word Link Grammar).

Moving to PROWL Grammar would not change the overall nature of RelEx nor is role in OpenCog, but would "merely" constitute a technical swapping-out of one parsing approach for another. However, it's arguable that the Word Grammar parsing framework is more appropriate for AGI in general and OpenCog in particular than the Link Grammar framework. For details see the above-linked document.