User talk:Amz3

From OpenCog
Jump to: navigation, search

About Me

I'm an opencog entusiast (in complicity with several other people). You may contact me using one of the below methods.

I'm working on Culturia (see below)


Contact Information

Talk Page My Talk Page
Twitter @_amirouche_
Freenode Nick amz3
E-mail amirouche<at>hypermove.net
github amirouche
english blog hypermove.net
frenglish blog hyperdev.net

Skills

Scale: Novice => Proficient => Experienced => Expert

skill proficiency notes
C Novice data structures, libraries.
C++ Novice data structures.
Javascript Proficient interpreters, parsers, data structures, libraries, reactjs, forward.js
Java Novice data structures, libraries.
Python Experienced WebDev, async programming, django, cffi, microservices
Scheme (GNU Guile) Various attempt at understanding the world at large. dynamic ffi

A Note On My Name

My ⴰⵎⴰⵣⵉⵖ first name is "Amiruc" it comes from admiral (or the link is reversed). My pseudonym is "amz3" compressed representation of semantic graph pattern matching ie. a combinatorial transderivational search (the induced experience is a reference to the Little Lost Robot novel by Isaac Asimov (spoiler)).

Culturia

The name

The name is a reference to Culture and Empire.

Goal

Prototype a backend storage for AtomSpace using wiredtiger and Scheme.

  • Handle bigger than RAM databases on a single node

And implement if there is interest a version control system.

Sub goals

  • Prepare a design that can be transposed to C/C++
  • Prepare a design that can be transposed to a distributed key/value store

Stretch goals

  • Fully API compatible with OpenCog
  • Create a mini-opencog that is easy to study and tweak
  • Solve NLP tasks using opencog principles to get a better understand of opencog
  • Create a feedback loop pipeline on top of wikibase and other wikis
  • Explore program generation
  • Explore Perona Knowledge Base (PKL)

Side goals

Things that i'd like to do:

  • Fully implement Tinkerpop Gremlin query DSL (usable)
  • Explore interaction between pattern matching and Gremlin (not interesting)
  • Implement Earley parser to do pattern matching (not starte

Why Does it Work

Instead of storing the data in PostgreSQL and loading everything in RAM doing the necessary caching. Using lower level wiredtiger will allow:

  1. to build more fine tuned data model
  2. to build more versatile data models

Caching will not be perfect but in single node context can be enough. Using wiredtiger projections is possible to create indices, a subset of the data into RAM without the need to keep all all the database in RAM.

Note: More complex caching scheme involving graph partitioning could be investigated for multiple nodes.

Implementation draft

This is brain dump of what I want to do.

Mind the following equivalence:

  • AtomSpace in Culturia is <culture>
  • atoms in Culturia is <facts>.

<cutlure>

<culture> is not unlike AtomSpace. It's not an hypergraph.

<fact>

<fact> is the equivalent of the Atom in AtomSpace

This is rough sketch of the underlying data model. It's modified later during the implementation of other features.

Data Model

I strip Node and Link definition from the storage data model and leave it for higher level layers.

<fact> use a single record type and one table to store facts. It also use a single table to store outgoings and incomings sets called arrows

A <fact> has:

  1. a scheme association where one can store arbitrary values (a mapping)

And a separate table represent the <arrow>s:

  1. an incoming set
  2. an outgoing set

Note: In AtomSpace, Node is uniquely identified by its type and name. In Culturia, all <fact> are uniquely identifier by their name

Note: In AtomSpace, Link are identified by their type and outgoing set. I assume it's to speed up pattern matching. This idea hurts my brain right now.

Note: In AtomSpace, Link have something to do with arity. Probably linked to the above note.

Indices

Add the following indices:

  1. type
  2. name
  3. one index for generic thruth value
  4. an index to reverse arrows
  5. an index to map properties
API
  • It would be nice to be able to re-use <fact> for transient hypergraphs


Spatio-Temporal features

It can be solved using Z-order curve.

Are two "static" indices enough: one to index any time value and another for any space values; or multiple "Z-index" are required. This is probably a question of performance.

Stick to the following plan:

  1. the user call something like (culture-index fact (fact-key fact 'created-at) 'temporal)
  2. automatically based on previously a registered properties
  3. using a pattern matcher over the <fact>

Multiple <culture>

This is an extension of the hypergraph to multi-hypergraph which means a forest of hypergraphs otherwise said a set of nested hypergraphs. This is explained in Multiple AtomSpaces why it's useful. It seems to me that the main argument is that it improves PLN but I don't know PLN so I can't tell.

On the otherside from an application point of view it's nice to organise data into several spaces so I'm buying the idea.

The implementation I think of:

  1. prefix <fact> row with an integer representing the culture (fast lookup over
 the culture)
  1. Build in table a tuple space for <culture> of entities (culture, key, value)
 with an index (key, value, culture)
  1. With that tuple space we can store arbitrary data
  1. That tuple space will be used to store key/value pairs and the <culture>
 hierarchy
  1. When querying for <fact> we check that they are in the correct culture
 "quickly" because the culture hierarchy is cached.
  1. When querying <arrow> we check that the end arrow is in a correct <culture>.
 This impact performance for all hops.

> Instead of using a tuple space, it's possible to use hypergraph 0, to > represent the hierarchy [FIXME: why the other solution?].

version control system

I think this can be a nice feature to have. Creating bug, deleting the database, re-loading the database or trying to fix the issue through code can be painful. Instead there VCS.

  1. . Simple solution: copy the databases files. This requires twice the size of
  the database. Ok.
  1. . Extend the transaction system of wiredtiger
  2. . Use similar approach to multiple hypergraph

Look for git internals

Implementation Log

2015/09/22 - What Are The Civilian Applications?

I started the the implementation and pushed some code on github. This includes FFI bindings of wiredtiger with documentation and a few examples (that may not work as-is). The best way to get started with wiredtiger is to read Schema, Columns, Column Groups, Indices and Projections.

This is a rough draft:

  1. <culturia> is the primary object, it's a handle
 over the underlying wiredtiger database. 
  1. When <culturia> is created, a first <ego> is built
 and the meta <culture> called main. 
 Eventually it returns and the first <ego>.
  1. An <ego> is a revision of the version control system,
 the current <ego> is the database workspace. It should
 be defined as a (uid, name, parent, comment)
  1. <ego> as the database workspace, it's the handle used
 to interact with the data. <culturia> is a wrapper for
 the wiredtiger creation logic.
  1. <ego> has git like commands:
 - checkout-revision-or-branch
 - commit
 - create-branch
  1. <ego> has procedures for direct access of the hypergraphs,
 actually a hypergraph-hypergraph
  1. <ego> is basically git and the file system except the
 files are hypergraphs represented by <culture> and and atoms
 represented by <fact>
  1. I did not bother code in to work with multiple threads, right now the
 code is meant to work in single thread. 
 [http://source.wiredtiger.com/2.6.1/threads.html wiredtiger support 
 multiple threads] and transactions. Instead of using prefork, it's possible
 to create for the main thread (repl) to keep a set of wiredtiger database
 session ready to be used by other threads.

  1. A tried my best to compose <fact> index primary keys to minimize
 the number of indices (and queries) like make the <ego> work along
 <culture> hierarchy but it doesn't work. A major decision is instead
 to build small indices and compose them in scheme. There is another pattern 
 which implies to create nested tables using multiple rows but I'm not sure where 
 its going.
  1. I dropped generic thruth values for the time being, I will factorize it inside
 the "z-order" index code. It should play nice with the above "compose indices"
 thing.
  1. The whole system is really immutable. <fact> are 'marked' deleted in given
 'ego' but not really deleted. Since <ego> commit create new version of <fact>,
 you can not delete a <fact> without deleting all the history of the <fact>.

Next steps:

  1. Implement the main <culture> code in ego
  2. The storage of <culture> expects actually a tree through
 culture-path in the facts table which is the
 path of uid to the culture that contains the <fact>. The
 primary way to query and the way to do range queries in wiredtiger is 
 through primary key prefixes. So if a fact has a culture-path
 equal to  1/2/3 it will be in culture 1, culture 2 and 3.
  1. In principle main could stored as a <culture>
  1. The main culture should be subject of ego
  1. Investigate again transient hypergraphs, define in plain text what they
 should be able to do, and what they should not be able to do. Otherwise, 
 use CLOS

Bookeeping

AtomSpace

Pattern Matching

NLP

Language programming

“The distinction between class-based and prototype-based systems reflects a long-lasting philosophical dispute concerning the representation of abstractions. Plato viewed forms — stable, abstract, “ideal” descriptions of things — as having an existence more real than instances of those things in the real world. Class-based languages such as Smalltalk, C++ or Simula are Platonic in their explicit use of classes to represent similarity among collections of objects. Prototype-based systems such as Self [UnS87], Omega [Bla91, Bla94], Kevo [Tai92, Tai93], GlyphicScript [Gly94] and NewtonScript [SLS94] represent another view of the world, in which one does not rely so much on advance categorization and classification, but rather tries to make the concepts in the problem domain as tangible and intuitive as possible. A typical argument in favor of prototypes is that people seem to be a lot better at dealing with specific examples first, then generalizing from them, than they are at absorbing general abstract principles first and later applying them in particular cases.

Prototypes give rise to a broad spectrum of interesting technical, conceptual and philosophical issues. In this paper we take a rather unusual, non-technical approach and investigate object-oriented programming and the prototype-based programming field from a purely philosophical viewpoint. Some historical facts and observations pertaining to objects and prototypes are presented, and conclusions based on those observations are derived.”

http://devblog.avdi.org/2012/10/13/classes-vs-prototypes-some-philosophical-and-historical-observations/