PredicateLink Proposal
These ideas were conceived by Ben Goertzel and Jim Rutt in September 2015, with the goal of simplifying the use of predicates in OpenCog, and the management of diverse truth value types, attention value types, etc.
Contents
Motivation
before the details, the motivation...
Jim's and Ben's main goal in thinking this stuff through, was to figure out a good way to replace wired-in Atom properties like TV.strength and AV.STI, etc., with a more flexible property-table of some sort... This is because, time and time again, various of us (including e.g. Ben and Linas) find ourselves wanting to put more numbers in Atoms...
But it seemed a drag to introduce a separate property-table data-structure instead of just using the Atomspace. So we thought: perhaps properties like TV, AV, etc. can be represented as predicate-argument relations in the Atomspace...
But then the question becomes how to make this less of a step backwards in terms of efficiency.
To avoid a huge speed hit, one can cache selected predicate-argument relations in a property-table cache in each Atom [note, saving and loading don't need to worry about these caches much ... cache update can be done upon updating of a CachedPredicateLink or whatever equivalent mechanism is used...]
To avoid a huge memory hit, one can get rid of all those EvaluationLinks, and introduce PredicateLink as an alternative to PredicateNode
And so....
First Version of the Idea
Idea 1: PredicateLink
To save memory and time in dealing with predicates, we could replace PredicateNode with PredicateLink.... Instead of
EvaluationLink PredicateNode "eat" ListLink ConceptNode "Ben" ConceptNode "steak"
we'd have
PredicateLink "eat" ConceptNode "Ben" ConceptNode "steak"
and instead of
EvaluationLink PredicateNode "give" ListLink ConceptNode "Ben" ConceptNode "dog" ConceptNode "Steak"
we'd have
PredicateLink "give" ConceptNode "Ben" ConceptNode "dog" ConceptNode "steak"
The point here is just to decrease the amount of bureaucracy involved
in dealing with a predicate, while still keeping everything
represented explicitly in the Atomspace....
Idea 2: Truth and Attention Values as PredicateLinks
(This idea is similar to the current work on the ProtoAtom).
We could then represent truth value and attention value in this sort of way, e.g. instead of
InheritanceLink <.8> ConceptNode "Ben" ConceptNode "silly"
we could have
PredicateLink "TV_strength" InheritanceLink ConceptNode "Ben" ConceptNode "silly" NumberNode ".8"
But wait -- that's going to be really inefficient to have to follow links every time you want to get a truth value.... Yet, it's nice to have the truth value explicitly represented in the Atomspace, isn't it?
So perhaps the right solution is
CachedPredicateLink "TV_strength" InheritanceLink ConceptNode "Ben" ConceptNode "silly" NumberNode ".8"
-- where the meaning here is: when you have
CachedPredicateLink "L" X Y
then the pair
(L, name of node Y) or (L, number that name of node Y denotes)
is added to the "property table" stored in node X.
This property table could be a very compact hashtable, or maybe just a list -- some experimentation could go into the choice of data-structure, but this is a standard CS problem...
...
Indeed this would slow things down slightly for accessing TV, AV, etc., but this slight slowdown might be worth it in order to gain flexibility and transparency...
(Of course there is also some memory waste in keeping a PredicateLink plus a cached version.... But the transparency of having the PredicateLink may be worth a lot...)
Linas's Critique
The proposal as currently worded obviously requires a very large redesign of the atomspace internals. Less obvious is that, as written, it implies that either we take a huge hit in performance, or we abandon the concept of the atomspace. Maybe there's some other trick that does not require picking one of these two alternatives; its hard to see what that would be.
There is an alternative proposal, the ProtoAtom, which addresses some of he ideas on this page. The ProtoAtom is compatible with the current atomspace, and is partly/mostly implemented. See the wiki page for more detail.
Criticisms of this proposal are:
- Currently, links do not have names. Adding names to links requires a major tear-up to the atomspace, the backends, all existing datasets, the pattern matcher, the scheme, python and haskell bindings, as well as large quantities of "legacy" code that uses them without names. It would be very destabilising, and have a very long-lasting perturbation on the code-base. (I can't help noting I made this same proposal when I first started with opencog many many years ago, and it turned into a very prolonged, intense flame war. I eventually came to terms with the current structure.) Although, if we make this change, better sooner than later.
- Couldn't care less about eliminating *some* uses of ListLink, although please consider that if this was done in general, it would wreck havoc with type checking.
- Modern programming philosophy strongly discourages keywords and hints like "Cached"; it is up to the internal implementation to "do the right thing" with regards to caching.
- Adding a property table to all atoms would make them significantly fatter. If you migrated TV and AV to it, it would not make the atom any smaller; the atom would still be larger, because you'd now have to store (at least) two keywords, as well as the actual numeric data. A naive implementation would probably double the size of the atom. It would also make access to TV and AV significantly slower, which may or may not impact misc algorithms.
- Storing numbers with NumberNode is particularly challenging, since NumberNodes are atoms, and thus drag along a lot of baggage. A light-weight version of this idea is implemented in the ProtoAtom work, and specifically in the opencog/atoms/base/FloatValue.h class.
- Storing mutable data in atoms would cause a huge churn of the atomspace tables. Currently, atoms are immutable; this is a core assumption for allowing them to be searchable in the atomspace indexes. Changing a TV on an atom would require removing it from the atomspace, (removing it from indexes); changing the TV, reinserting it into the atomspace. If the atoms are now nested inside of other atoms, changing a single TV may cause hundreds or thousands or tens of thousands of atoms to be re-indexed. It would be much simpler to abandon the concept of the atomspace entirely, and give up the ability to search for atoms by signature.
- Above remarks also apply to UUID's and the database. A major redesign would be needed to avoid this cascade effect.
- Abandoning the atomspace would imply that there would be multiple instances of atoms; they would no long be guaranteed to be unique. As long as users keep tabs on what they're using, things will be OK. Incoming sets would still work, so as long as you have a handle to some atom, you can still traverse the entire graph.
- If you wanted to keep the atomspace, but make node names mutable, then you would not be able to index atoms by name or search for them by name.
- If you wanted to keep the atomspace, but make node names be fixed, unless they were NumberNodes which were mutable, then NumberNodes would not be unique, (you could have thousands all with the value "0.8"). You also would not be able to index them. If you wanted to index them, you would have to generalize the attention value bank. But then, this is kind-of the whole point of the ProtoAtom work, and the opencog/atoms/base/FloatValue.h class.
- If you want to save RAM, then perhaps the biggest saving would come from changing Nodes so that they have neither a TV nor an AV. This seems reasonable, as we essentally never use the TV or AV on individual atoms.
These are all really very large changes to the internals of the system, as well as some fundamental shifts to the overall philosophy of what opencog really is.
There is, however, a different proposal, the ProtoAtom, which does accomplish some (much?) of what this proposal makes, while also being compatible with the current code base.
Second Version of the Idea
Ben's reply to Linas's critique...
I don’t see these proposed changes as involving any fundamental philosophical shifts…. I just see them as aimed at making it easier to experiment with attaching different parameters to different Atoms….
To sum up:
1) Giving links names, and making node names mutable, are optional aspects of the proposal and we don't need to do them.
E.g., as noted in subsequent email discussions, we could get some memory savings if we want by getting rid of the ListLink used by EvaluationLink
EvaluationLink PredicateNode “eat” ConceptNode “Ben” ConceptNode “steak”
But this is a side-issue so let’s not focus on it…
2) The proposed CachedPredicateLink or CachedEvaluationLink may be awkward constructs, but we can avoid use of these and keep the identification of certain predicates for caching behind-the-scenes ... no problem there
For instance, instead of creating a CachedEvaluationLink or whatever, we could simply create some Atomspace-wide registry of PredicateNodes, and then a method like
addToPredicateCache(PredicateNode P)
which has to be invoked if one wants a PredicateNode included in the caches in each Atom. Same effect, I agree this is a cleaner design…
So e.g. we would call addToPredicateCache on the argument "PredicateNode 'TruthValue strength'" in order to add the strength to the cache...
3) Cached property structures (e.g. arrays) in the Atom can likely be made pretty compact and fast, without a huge performance hit
E.g., if we are just caching a handful of values in each node, we could just use an array instead of a {hashtable or similar}. Each Predicate being cached would have an int index associated with it. The assumption here would be that removals of Predicates from the “cached predicate list” would be very infrequent, so don’t have to be efficiently handled.
Looking up to find
Atom.cachedPropertyArray[1]
may be slower than looking up
Atom.TruthValue.strength
but it doesn’t seem necessarily a big deal..
4) To make it so that NumberNodes can frequently be replaced with other NumberNodes, without incurrent huge garbage collection overhead, may require some significant change. Introduction of a NumberAtom (in place of a NumberNode) would be one solution.
Instead of a string name, a NumberAtom would have an internal int or float, which would be mutable. (maybe we would have two types, FloatAtom and IntAtom ?) ... We would not of course index NumberAtoms by their internal number value...
...
All in all, these would be significant changes, but hardly major philosophy changes. We're just talking about
-- handling numbers differently
-- replacing the fixed list of numerical properties inside an Atom, with a more flexible/configurable list, which is a cache of certain properties represented explicitly in the Atomspace.
Reply to the Reply
The ProtoAtom work already implements all of the above desired features, including caching. It's compatible with the current atomspace design. It mostly already all works. There are some open questions, and decisions to be made: see the ProtoAtom page for more. Lets use that.