I'm an opencog entusiast (in complicity with several other people). You may contact me using one of the below methods.
I'm working on Culturia (see below)
|Talk Page||My Talk Page|
Scale: Novice => Proficient => Experienced => Expert
|C||Novice||data structures, libraries.|
|Java||Novice||data structures, libraries.|
|Python||Experienced||WebDev, async programming, django, cffi, microservices|
|Scheme (GNU Guile)||Various attempt at understanding the world at large.||dynamic ffi|
A Note On My Name
My ⴰⵎⴰⵣⵉⵖ first name is "Amiruc" it comes from admiral (or the link is reversed). My pseudonym is "amz3" compressed representation of semantic graph pattern matching ie. a combinatorial transderivational search (the induced experience is a reference to the Little Lost Robot novel by Isaac Asimov (spoiler)).
The name is a reference to Culture and Empire.
- Handle bigger than RAM databases on a single node
- Implement Hierarchical AtomSpaces cf. Multiple AtomSpaces
- Implement spatial features (neo4j spatial)
And implement if there is interest a version control system.
- Prepare a design that can be transposed to C/C++
- Prepare a design that can be transposed to a distributed key/value store
- Fully API compatible with OpenCog
- Create a mini-opencog that is easy to study and tweak
- Solve NLP tasks using opencog principles to get a better understand of opencog
- Create a feedback loop pipeline on top of wikibase and other wikis
- Explore program generation
- Explore Perona Knowledge Base (PKL)
Things that i'd like to do:
- Fully implement Tinkerpop Gremlin query DSL (usable)
- Explore interaction between pattern matching and Gremlin (not interesting)
- Implement Earley parser to do pattern matching (not starte
Why Does it Work
Instead of storing the data in PostgreSQL and loading everything in RAM doing the necessary caching. Using lower level wiredtiger will allow:
- to build more fine tuned data model
- to build more versatile data models
Caching will not be perfect but in single node context can be enough. Using wiredtiger projections is possible to create indices, a subset of the data into RAM without the need to keep all all the database in RAM.
Note: More complex caching scheme involving graph partitioning could be investigated for multiple nodes.
This is brain dump of what I want to do.
Mind the following equivalence:
- AtomSpace in Culturia is <culture>
- atoms in Culturia is <facts>.
<culture> is not unlike AtomSpace. It's not an hypergraph.
<fact> is the equivalent of the Atom in AtomSpace
This is rough sketch of the underlying data model. It's modified later during the implementation of other features.
<fact> use a single record type and one table to store facts. It also use a single table to store outgoings and incomings sets called arrows
A <fact> has:
- a scheme association where one can store arbitrary values (a mapping)
And a separate table represent the <arrow>s:
- an incoming set
- an outgoing set
Note: In AtomSpace, Node is uniquely identified by its type and name. In Culturia, all <fact> are uniquely identifier by their name
Note: In AtomSpace, Link are identified by their type and outgoing set. I assume it's to speed up pattern matching. This idea hurts my brain right now.
Note: In AtomSpace, Link have something to do with arity. Probably linked to the above note.
Add the following indices:
- one index for generic thruth value
- an index to reverse arrows
- an index to map properties
- It would be nice to be able to re-use <fact> for transient hypergraphs
It can be solved using Z-order curve.
Are two "static" indices enough: one to index any time value and another for any space values; or multiple "Z-index" are required. This is probably a question of performance.
Stick to the following plan:
- the user call something like
(culture-index fact (fact-key fact 'created-at) 'temporal)
- automatically based on previously a registered properties
- using a pattern matcher over the <fact>
This is an extension of the hypergraph to multi-hypergraph which means a forest of hypergraphs otherwise said a set of nested hypergraphs. This is explained in Multiple AtomSpaces why it's useful. It seems to me that the main argument is that it improves PLN but I don't know PLN so I can't tell.
On the otherside from an application point of view it's nice to organise data into several spaces so I'm buying the idea.
The implementation I think of:
- prefix <fact> row with an integer representing the culture (fast lookup over
- Build in table a tuple space for <culture> of entities (culture, key, value)
with an index (key, value, culture)
- With that tuple space we can store arbitrary data
- That tuple space will be used to store key/value pairs and the <culture>
- When querying for <fact> we check that they are in the correct culture
"quickly" because the culture hierarchy is cached.
- When querying <arrow> we check that the end arrow is in a correct <culture>.
This impact performance for all hops.
> Instead of using a tuple space, it's possible to use hypergraph 0, to > represent the hierarchy [FIXME: why the other solution?].
version control system
I think this can be a nice feature to have. Creating bug, deleting the database, re-loading the database or trying to fix the issue through code can be painful. Instead there VCS.
- . Simple solution: copy the databases files. This requires twice the size of
the database. Ok.
- . Extend the transaction system of wiredtiger
- . Use similar approach to multiple hypergraph
Look for git internals
2015/09/22 - What Are The Civilian Applications?
I started the the implementation and pushed some code on github. This includes FFI bindings of wiredtiger with documentation and a few examples (that may not work as-is). The best way to get started with wiredtiger is to read Schema, Columns, Column Groups, Indices and Projections.
This is a rough draft:
<culturia>is the primary object, it's a handle
over the underlying wiredtiger database.
<culturia>is created, a first <ego> is built
and the meta
main. Eventually it returns and the first
<ego>is a revision of the version control system,
<ego>is the database workspace. It should be defined as a (uid, name, parent, comment)
<ego>as the database workspace, it's the handle used
to interact with the data.
<culturia>is a wrapper for the wiredtiger creation logic.
<ego>has git like commands:
- checkout-revision-or-branch - commit - create-branch
<ego>has procedures for direct access of the hypergraphs,
actually a hypergraph-hypergraph
<ego>is basically git and the file system except the
files are hypergraphs represented by
<culture>and and atoms represented by
- I did not bother code in to work with multiple threads, right now the
code is meant to work in single thread. [http://source.wiredtiger.com/2.6.1/threads.html wiredtiger support multiple threads] and transactions. Instead of using prefork, it's possible to create for the main thread (repl) to keep a set of wiredtiger database session ready to be used by other threads.
- A tried my best to compose
<fact>index primary keys to minimize
the number of indices (and queries) like make the
<culture>hierarchy but it doesn't work. A major decision is instead to build small indices and compose them in scheme. There is another pattern which implies to create nested tables using multiple rows but I'm not sure where its going.
- I dropped generic thruth values for the time being, I will factorize it inside
the "z-order" index code. It should play nice with the above "compose indices" thing.
- The whole system is really immutable. <fact> are 'marked' deleted in given
'ego' but not really deleted. Since <ego> commit create new version of <fact>, you can not delete a <fact> without deleting all the history of the <fact>.
- Implement the
main <culture>code in ego
- The storage of
<culture>expects actually a tree through
factstable which is the path of uid to the culture that contains the
<fact>. The primary way to query and the way to do range queries in wiredtiger is through primary key prefixes. So if a fact has a
1/2/3it will be in culture 1, culture 2 and 3.
- In principle
maincould stored as a
mainculture should be subject of
- Investigate again transient hypergraphs, define in plain text what they
should be able to do, and what they should not be able to do. Otherwise, use CLOS
- Boolean satisfiability problem
- Graph Programs (GP lang)
- timely dataflow tutorial in Rust (distributed computation) To compare with Gremlin.
- stand a probabilistic programming language
- Prototype based programming I think kevo is a stack based prototype-based programming language. See whether this can work with graphs and PushGP
- PUSH Genetic Programming system implemented in Clojure (PushGP)
“The distinction between class-based and prototype-based systems reflects a long-lasting philosophical dispute concerning the representation of abstractions. Plato viewed forms — stable, abstract, “ideal” descriptions of things — as having an existence more real than instances of those things in the real world. Class-based languages such as Smalltalk, C++ or Simula are Platonic in their explicit use of classes to represent similarity among collections of objects. Prototype-based systems such as Self [UnS87], Omega [Bla91, Bla94], Kevo [Tai92, Tai93], GlyphicScript [Gly94] and NewtonScript [SLS94] represent another view of the world, in which one does not rely so much on advance categorization and classification, but rather tries to make the concepts in the problem domain as tangible and intuitive as possible. A typical argument in favor of prototypes is that people seem to be a lot better at dealing with specific examples first, then generalizing from them, than they are at absorbing general abstract principles first and later applying them in particular cases.
Prototypes give rise to a broad spectrum of interesting technical, conceptual and philosophical issues. In this paper we take a rather unusual, non-technical approach and investigate object-oriented programming and the prototype-based programming field from a purely philosophical viewpoint. Some historical facts and observations pertaining to objects and prototypes are presented, and conclusions based on those observations are derived.”