PredicateLink Proposal

From OpenCog
Jump to: navigation, search

These ideas were conceived by Ben Goertzel and Jim Rutt in September 2015, with the goal of simplifying the use of predicates in OpenCog, and the management of diverse truth value types, attention value types, etc.

Motivation

before the details, the motivation...

Jim's and Ben's main goal in thinking this stuff through, was to figure out a good way to replace wired-in Atom properties like TV.strength and AV.STI, etc., with a more flexible property-table of some sort... This is because, time and time again, various of us (including e.g. Ben and Linas) find ourselves wanting to put more numbers in Atoms...

But it seemed a drag to introduce a separate property-table data-structure instead of just using the Atomspace. So we thought: perhaps properties like TV, AV, etc. can be represented as predicate-argument relations in the Atomspace...

But then the question becomes how to make this less of a step backwards in terms of efficiency.

To avoid a huge speed hit, one can cache selected predicate-argument relations in a property-table cache in each Atom [note, saving and loading don't need to worry about these caches much ... cache update can be done upon updating of a CachedPredicateLink or whatever equivalent mechanism is used...]

To avoid a huge memory hit, one can get rid of all those EvaluationLinks, and introduce PredicateLink as an alternative to PredicateNode

And so....

First Version of the Idea

Idea 1: PredicateLink

To save memory and time in dealing with predicates, we could replace PredicateNode with PredicateLink.... Instead of

EvaluationLink
      PredicateNode "eat"
      ListLink
            ConceptNode "Ben"
            ConceptNode "steak"

we'd have

PredicateLink "eat"
      ConceptNode "Ben"
      ConceptNode "steak"

and instead of

EvaluationLink
      PredicateNode "give"
      ListLink
            ConceptNode "Ben"
            ConceptNode "dog"
            ConceptNode "Steak"

we'd have

PredicateLink "give"
      ConceptNode "Ben"
      ConceptNode "dog"
      ConceptNode "steak"


The point here is just to decrease the amount of bureaucracy involved in dealing with a predicate, while still keeping everything represented explicitly in the Atomspace....

Idea 2: Truth and Attention Values as PredicateLinks

(This idea is similar to the current work on the ProtoAtom).

We could then represent truth value and attention value in this sort of way, e.g. instead of

InheritanceLink <.8>
    ConceptNode "Ben"
    ConceptNode "silly"

we could have

PredicateLink "TV_strength"
    InheritanceLink
        ConceptNode "Ben"
        ConceptNode "silly"
    NumberNode ".8"

But wait -- that's going to be really inefficient to have to follow links every time you want to get a truth value.... Yet, it's nice to have the truth value explicitly represented in the Atomspace, isn't it?

So perhaps the right solution is

CachedPredicateLink "TV_strength"
    InheritanceLink
        ConceptNode "Ben"
        ConceptNode "silly"
    NumberNode ".8"

-- where the meaning here is: when you have

CachedPredicateLink "L"
     X
     Y

then the pair

(L, name of node Y)   

or

(L, number that name of node Y denotes)

is added to the "property table" stored in node X.

This property table could be a very compact hashtable, or maybe just a list -- some experimentation could go into the choice of data-structure, but this is a standard CS problem...

...

Indeed this would slow things down slightly for accessing TV, AV, etc., but this slight slowdown might be worth it in order to gain flexibility and transparency...

(Of course there is also some memory waste in keeping a PredicateLink plus a cached version.... But the transparency of having the PredicateLink may be worth a lot...)

Linas's Critique

The proposal as currently worded obviously requires a very large redesign of the atomspace internals. Less obvious is that, as written, it implies that either we take a huge hit in performance, or we abandon the concept of the atomspace. Maybe there's some other trick that does not require picking one of these two alternatives; its hard to see what that would be.

There is an alternative proposal, the ProtoAtom, which addresses some of he ideas on this page. The ProtoAtom is compatible with the current atomspace, and is partly/mostly implemented. See the wiki page for more detail.

Criticisms of this proposal are:

  • Currently, links do not have names. Adding names to links requires a major tear-up to the atomspace, the backends, all existing datasets, the pattern matcher, the scheme, python and haskell bindings, as well as large quantities of "legacy" code that uses them without names. It would be very destabilising, and have a very long-lasting perturbation on the code-base. (I can't help noting I made this same proposal when I first started with opencog many many years ago, and it turned into a very prolonged, intense flame war. I eventually came to terms with the current structure.) Although, if we make this change, better sooner than later.
  • Couldn't care less about eliminating *some* uses of ListLink, although please consider that if this was done in general, it would wreck havoc with type checking.
  • Modern programming philosophy strongly discourages keywords and hints like "Cached"; it is up to the internal implementation to "do the right thing" with regards to caching.
  • Adding a property table to all atoms would make them significantly fatter. If you migrated TV and AV to it, it would not make the atom any smaller; the atom would still be larger, because you'd now have to store (at least) two keywords, as well as the actual numeric data. A naive implementation would probably double the size of the atom. It would also make access to TV and AV significantly slower, which may or may not impact misc algorithms.
  • Storing numbers with NumberNode is particularly challenging, since NumberNodes are atoms, and thus drag along a lot of baggage. A light-weight version of this idea is implemented in the ProtoAtom work, and specifically in the opencog/atoms/base/FloatValue.h class.
  • Storing mutable data in atoms would cause a huge churn of the atomspace tables. Currently, atoms are immutable; this is a core assumption for allowing them to be searchable in the atomspace indexes. Changing a TV on an atom would require removing it from the atomspace, (removing it from indexes); changing the TV, reinserting it into the atomspace. If the atoms are now nested inside of other atoms, changing a single TV may cause hundreds or thousands or tens of thousands of atoms to be re-indexed. It would be much simpler to abandon the concept of the atomspace entirely, and give up the ability to search for atoms by signature.
  • Above remarks also apply to UUID's and the database. A major redesign would be needed to avoid this cascade effect.
  • Abandoning the atomspace would imply that there would be multiple instances of atoms; they would no long be guaranteed to be unique. As long as users keep tabs on what they're using, things will be OK. Incoming sets would still work, so as long as you have a handle to some atom, you can still traverse the entire graph.
  • If you wanted to keep the atomspace, but make node names mutable, then you would not be able to index atoms by name or search for them by name.
  • If you wanted to keep the atomspace, but make node names be fixed, unless they were NumberNodes which were mutable, then NumberNodes would not be unique, (you could have thousands all with the value "0.8"). You also would not be able to index them. If you wanted to index them, you would have to generalize the attention value bank. But then, this is kind-of the whole point of the ProtoAtom work, and the opencog/atoms/base/FloatValue.h class.
  • If you want to save RAM, then perhaps the biggest saving would come from changing Nodes so that they have neither a TV nor an AV. This seems reasonable, as we essentally never use the TV or AV on individual atoms.

These are all really very large changes to the internals of the system, as well as some fundamental shifts to the overall philosophy of what opencog really is.

There is, however, a different proposal, the ProtoAtom, which does accomplish some (much?) of what this proposal makes, while also being compatible with the current code base.

Second Version of the Idea

Ben's reply to Linas's critique...

I don’t see these proposed changes as involving any fundamental philosophical shifts…. I just see them as aimed at making it easier to experiment with attaching different parameters to different Atoms….

To sum up:

1) Giving links names, and making node names mutable, are optional aspects of the proposal and we don't need to do them.

E.g., as noted in subsequent email discussions, we could get some memory savings if we want by getting rid of the ListLink used by EvaluationLink

EvaluationLink
  PredicateNode “eat”
  ConceptNode “Ben”
  ConceptNode “steak”

But this is a side-issue so let’s not focus on it…

2) The proposed CachedPredicateLink or CachedEvaluationLink may be awkward constructs, but we can avoid use of these and keep the identification of certain predicates for caching behind-the-scenes ... no problem there

For instance, instead of creating a CachedEvaluationLink or whatever, we could simply create some Atomspace-wide registry of PredicateNodes, and then a method like

addToPredicateCache(PredicateNode P)

which has to be invoked if one wants a PredicateNode included in the caches in each Atom. Same effect, I agree this is a cleaner design…

So e.g. we would call addToPredicateCache on the argument "PredicateNode 'TruthValue strength'" in order to add the strength to the cache...


3) Cached property structures (e.g. arrays) in the Atom can likely be made pretty compact and fast, without a huge performance hit

E.g., if we are just caching a handful of values in each node, we could just use an array instead of a {hashtable or similar}. Each Predicate being cached would have an int index associated with it. The assumption here would be that removals of Predicates from the “cached predicate list” would be very infrequent, so don’t have to be efficiently handled.

Looking up to find

Atom.cachedPropertyArray[1]

may be slower than looking up

Atom.TruthValue.strength

but it doesn’t seem necessarily a big deal..


4) To make it so that NumberNodes can frequently be replaced with other NumberNodes, without incurrent huge garbage collection overhead, may require some significant change. Introduction of a NumberAtom (in place of a NumberNode) would be one solution.

Instead of a string name, a NumberAtom would have an internal int or float, which would be mutable. (maybe we would have two types, FloatAtom and IntAtom ?) ... We would not of course index NumberAtoms by their internal number value...

...

All in all, these would be significant changes, but hardly major philosophy changes. We're just talking about

-- handling numbers differently

-- replacing the fixed list of numerical properties inside an Atom, with a more flexible/configurable list, which is a cache of certain properties represented explicitly in the Atomspace.

Reply to the Reply

The ProtoAtom work already implements all of the above desired features, including caching. It's compatible with the current atomspace design. It mostly already all works. There are some open questions, and decisions to be made: see the ProtoAtom page for more. Lets use that.