This page is a list of “open projects” in OpenCog, including some which the OpenCog “experts” feel would be generally appropriate for OpenCog newbies, and also some more substantial ones. The page was last seriously reviewed and updated January 2016.
OpenCog being what it is, none of these are extremely easy. But they are all pretty well understood by one or more OpenCog experts; and they are all things that should be approachable by someone with good programming skills and undergrad CS knowledge, but who hasn’t yet achieved a deep overall understanding of the OpenCog theories and codebase. I.e. all of these are “non-terrible places to start” in terms of really getting one’s feet wet with OpenCog in a serious way. And they are all quite useful things for moving OpenCog forward…
If you’re a newbie and want something smaller to do, varying on some of the examples in the tutorials at XX would be a good way to start. Also, look at the OpenCog GitHub repository and try fixing some pending issues — that’s always a good way to get the hang of a complex software system.
OpenCog is a flexible and open-ended framework that can be used for all sorts of different projects, and extended in many different ways. So the list of open projects here is very far from complete. If you have a cool idea that's not on this list, feel free to plunge right in and do it, or to discuss on the OpenCog email list.
The projects on this page fall into four categories:
- Minecraft-related projects … there is a specific initiative among a group of OpenCog volunteers to get OpenCog to do basic Minecraft stuff. Before too long the OpenCog-Minecraft initiative should leverage, and likely require improvements or tweaks to, all aspects of OpenCog. For now it’s focusing on a lot of specifically Minecraft-related stuff.
- A few open-ended, exploratory research projects (which are quite critical to OpenCog’s AGI ambitions). Working on these can take a short or long time depending on how far you want to go.
- Projects labeled Fairly Small — which it is estimated should take in the low tens of hours for someone who has figured out the basics of OpenCog. These mostly involve testing out or tweaking some existing specific AI functionality.
- Projects labeled Medium — which it is estimated might take from the high tens to the low hundreds of hours of effort, for someone who knows the basics of OpenCog. These projects more often include implementing some new AI functionality in OpenCog, or significantly changing or extending some existing AI functionality.
If you want to work on one of these projects please email the OpenCog Google Group and say so, and someone will give you guidance…
NOTE: This page is currently in rough form, with XX's in places where links to code or research papers should go. If you have time and relevant knowledge, please feel free to fill in some of the XX's! Otherwise someone else will do it before too long, likely during February 2016... Even with the XX's the page should be useful; most of the relevant resources can be found in Github pretty easily.
OpenCog for Minecraft
Fredy Villa is leading a team of other volunteer OpenCog developers toward the goal of making OpenCog play Minecraft in an interesting way.
There is a dedicated page http://wiki.opencog.org/wikihome/index.php/Minecraft_Bot_Development_Roadmap for the OpenCog-Minecraft project, with many subtasks etc.
Many OpenCog AI tasks and functionalities will be interestingly demonstrable inand more the Minecraft world. However, at this moment (Jan 2016) the focus of the Minecraft effort is on getting more basic functionalities working. As time goes on, the focus will shift to using Minecraft as a platform for OpenCog intelligence.
Of course, what is achieved regarding Minecraft on the OpenCog side, can easily be ported to other Minecraft-like game worlds (e.g. via building a ROS proxy to those game worlds)
Fairly Small Projects
Test the Neo4J Backing store on Simple English Wikipedia
Hendy Irawan made a Neo4j backing store for OpenCog but nobody is using it.
You can find the code here: opencog/opencog-neo4j
Some dump of content from neo4j
A tutorial by Hendy
There are unit tests, it seems to pass and basically work OK.
The latest Simple English Wikipedia database dump can be downloaded here: SEW
There are scripts for reading and cleaning the database dump, leaving only plain text. See relex/src/perl
The cleaning process is good but still you may find some leftovers from the output, so updates for the above scripts to perfect the process are always welcome.
There are also scripts for parsing the text into the AtomSpace. See opencog/nlp/learn/misc-scripts
The README files from the above directories describes what each of those scripts does, it would be useful to read them when you are downloading the SEW.
Writing test code that saves SEW to Neo4J and reloads it would be a good test of the Neo4j backing store.
Making various OpenCog Pattern Matcher queries work effectively against the Neo4j backing store may be a more substantial task, involving creating cleverer Neo4j Cypher queries corresponding to different PM queries. More testing is required to understand the necessity and nature of this task.
Modify the OpenCog visualizer to allow elegant visualization of the Atoms involved in interpreting an English sentence
The ocViewer visualizer visualizes small networks of nodes and links in the Atomspace. It's currently under active development by Dagmawi Moges and Selameab. Plenty of improvements to the visualizer could be made.
One idea is to make a mode that is specialized for visualizing the Atoms involved in processing an English sentence. The sentence could be shown on the screen (each word in the sentence corresponds to a WordInstance node, so this is just a matter of telling the graph layout to arrange these WordInstanceNodes in a horizontal line). Then other NLP related Atoms could be shown, linked to these linearly arranged WordInstanceNodes. Controls could be created enabling various viewing options
- link grammar Atoms
- RelEx Atoms
- RelEx2Logic Atoms
- Atoms produced via reasoning
Rodas Solomon, Amen Belayneh or Man Hin Leung could provide guidance on grabbing these different Atom sets.
Modify the OpenCog visualizer to allow elegant visualization of the Atoms involved in interpreting an LOJBAN sentence
If you like Lojban, you may be interested in doing the above exercise but for Lojban instead of -- or alongside -- English
Roman Treutlein built the Lojban comprehension pipeline
and will be happy to explain it to anyone who's interested. Ben Goertzel used to know a bit of Lojban and groks the concepts well and is happy to discuss/guide work on this as well.
Bio-AI: write scripts importing additional bio-databases into the Atomspace, e.g. pathway databases
This is a project for someone who likes bioinformatics and understands basic biology.
Scripts exist for importing the Gene Ontology and MSigDB into the Atomspace, where they can be queried, used for reasoning, etc. See
Similar scripts could be written for many other biology databases, e.g. pathway databases, drug databases like DrugBank or PubChem, etc.
Eddie Monroe is expert on these scripts; Mike Duncan can provide guidance on what datasets to look at and on the underlying biology.
Finish Deception Inference Example
In 2014 Amen Belayneh got most of the way through implementing a PLN inference chain related to deception. See
for the code.
At that time PLN was not as fully functional as now so he stopped to focus on other things. Now this example could be finished, based loosely on the general ideas in:
Test out the Pattern Miner on various datasets
Shujing Ke wrote a greedy pattern miner for the Atomspace, which mines frequent or surprising patterns. See opencog/learning/PatternMiner
She has tested it out on some sample data (e.g. from DBPedia), but so far nobody else but Shujing has seriously used it.
Making more test cases, involving trying the pattern miner out on different datasets, would be interesting and valuable, e.g.
- Mining patterns from Atoms produced by applying the NLP pipeline to Simple English Wikipedia
- Mining patterns from data about different countries of the world, imported from e.g. Freebase or DBPedia (or wherever)
- Mining patterns from the Gene Ontology, which has been imported to the Atomspace
- any other structured data sources you can think of ...
Write a test application for the Gearman distributed-querying infrastructure
Mandeep Bhatia wrote code applying the Gearman framework to distribute OpenCog Scheme-shell functions across multiple machines (aimed initially at the use case where each of the "slave" machines contains an identical Atomspace; but the code is not intrinsically limited to this use case). See
This code has been tested and used only minimally.
It would be good to set up a configuration involving, say, one master and 3 slave machines, where each slave contains an identical Atomspace containing a significant chunk of Simple English Wikipedia. The master could process a series of fuzzy pattern matcher queries (corresponding to English language questions) and then pass them to the slaves for answering. As the number of slave machines increases and the rate of queries coming through the master increases, the rate of speedup in query response achieved by the Gearman setup can be measured.
An additional test would be to use the slaves for microplanning and surface realization. Suppose the master has a lot of atom-chunks to articulate, then it can send these to slaves for processing using the Gearman framework.
Enhancing the NL comprehension pipeline to handle biological and biomedical texts
The OpenCog NLP pipeline -- the link parser, RelEx and RelEx2Logic -- handles a variety of simple English sentences well. However, parsing typical biomedical texts such as PubMed abstracts requires additional attention. Biomedical entity lists can easily be integrated with the link parser (and this has been done before), but attention must also be given to adjusting the link grammar dictionary and RelEx rule-bases to handle the particular syntactic and semantic constructs that are important in biological texts. This is heavier on linguistic intuition than programming skill, though it does require running complex software and editing linguistic rule-files with obscure syntax.
Test Boosted-MOSES and Subsampling-MOSES on various examples
The core MOSES learning algorithm has had some basic improvements integrated into it, e.g.
- Boosting, see XX
- Subsampling, see XX
Neither of these improvements has been extensively tested.
Test concept blending on a sizeable knowledge-base of concepts
A GSoC student implemented Concept Blending XXX for OpenCog in summer 2015, culminating in a cog-blend function that produces a blend of two Atoms using a heuristic based on interaction information. See
This has been run on some toy examples and seems to work OK. But it's never been seriously experimented with.
Let's try it on interesting selections of concepts derived from, say, DBPedia or Simple English wikipedia
Test concept blending on biology concepts
An interesting cog-sci/bio experiment would be to try out concept blending on Gene Ontology concepts. What kind of blends will cog-blend coe up with? How does the interaction information measure need to be tweaked to make this work?
Create a MindAgent for Concept Creation from Predicates
The project here is to implement, and experiment with, a MindAgent doing creation of concepts from predicates.
The first part is to figure out which predicates are interesting to use as the source for ConceptNodes. For instance, if we have many links in the Atomspace of the form
EvaluationLink PredicateNode "in" ConceptNode "Beijing" ConceptNode "China" EvaluationLink PredicateNode "in" ConceptNode "women" ConceptNode "China" etc.
then a ConceptNode can be formed consisting of all $X so that
EvaluationLink PredicateNode "in" ConceptNode $X ConceptNode "China"
These ConceptNodes can then be fed into PLN inference and conclusions drawn about them. These conclusions will lead to new links being formed, which can then be mined to form new concepts, etc.
Integrate the PLN Modal Logic rule code into the main PLN code and test it
A GSoC student wrote code embodying PLN formulas for modal logic. See
The math underlying the formulas was thought out pretty thoroughly. This code however has not been integrated with the main URE-based rule engine, nor used for anything.
So there are two things to do here:
- Integrate the code with the PLN-based URE, writing appropriate test cases
- Make up some serious test cases embodying interesting instances of modal inference
Modal Logic for Theory of Mind in Minecraft
This is a not-so-easy-but-very-interesting task.... Following the above steps, it should be possible to do some interesting experimentation involving theory of mind with OpenCog agents acting in the Minecraft world.
2008 work by Selmer Bringsjord and his colleagues, described in some page (that should be referenced around page 17 of ) may provide some conceptual inspiration.
Complete implementation of Distributional Truth Values for PLN
OpenCog Atoms are labeled with TruthValue objects. Currently these are mostly SimpleTruthValues [ (strength, confidence) values]. But the theory of PLN as described in the PLN Book also described other forms of truth values such as Distributional and Indefinite truth values.
Nil Geisweiller recently implemented an initial version of Distributional Truth Values, see .
But there is more to be done ... see the "Future Work" section on .
This project requires OK programming skill, but is more centered on mathematics -- it's good for someone familiar with probability and statistics, who wants to play with (and help create) new and more powerful forms of probabilistic inference.
Use OpenCog to control a simple wheeled robot
OpenCog can interact via ROS, and is now connected to a Hanson robot head, and to a Minecraft game character.
So it should be realistic now to connect it to a simple wheeled robot, taking account of the various sensors and actuators.
A Turtlebot would be one example, but cheaper/simpler robots could also be used.
This could lead into work on implementing integrating planning/navigation in the URE (see the below task regarding ocPlanner)
some detailed comments
If you have a turtlebot with a depth camera, then you can try something like
- make the depth camera output go into the openCog 3DSpaceMap...
- connect some trained classification model (trained outside OpenCog) to recognize some particular type of object, say a chair ... then have the locations of any chairs in the environment go to the 3DSpaceMap as well
- write some Scheme/python "Atomese" code that uses an external ROS navigation package to cause the robot to note where there is a chair in the environment and attempt to go there...
This of course is a complicated way to do something simple, but it would validate that all the "plumbing" is working in terms of using OpenCog to get robot perceptions and issue robot actions
After that, one could get OpenCog to do more stuff, like integrated planning/navigation for instance ... talking to people via speech to text (after a person-recognizer is integrated) etc.
For guidance on sending action commands out of OpenCog, you can look at the eva_behavior code created by linas vepstas for Hanson Robotics, which sends out ROS commands to control the movements of a Hanson robot head
For guidance on getting perceptual data into the 3DSpaceMap, consult for instance Mandeep Bhatia ...
Shujing Ke implemented a planner (ocPlanner) which integrated planning and navigation in a unique way. The code is here
and the concept is described here
The ocPlanner was tested in a Minecraft-like Unity3D game world, see the following YouTube videos for some simple examples:
However, the ocPlanner code was tied to an old version of the OpenCog Embodiment module, which has now been removed from the master branch (though still exists in OpenCog GitHub archives).
So the tasks are
- Implement a new version of the ocPlanner idea, using the URE (and integrating with PLN where appropriate)
- Connect the new ocPlanner for combined planning/navigation in Minecraft
Revive the dimensional embedding code and integrate it with the fuzzy pattern matcher
Back in the days of yore, code was written for embedding Atoms from the Atomspace into an N-dimensional space. This is useful because certain kinds of queries (e.g. "find me stuff similar to X") are much more efficiently done in a dimensional space than in a hypergraph.
This code needs to be dusted off and used for something. A clear candidate would be to plug it into the fuzzy pattern matcher, which Man Hin Leung (and Linas Vepstas) have recently modified so that it can use SimilarityLinks between Nodes to quantify fuzziness of matches. (So, e.g., if you search for "InheritanceLink $X cow", and we have a SimilarityLink between "cow" and "horse", then some $X so that (InheritanceLink $X horse") maybe returned as partial matches...). The SimilarityLinks can come from anywhere. The current fuzzy matcher code uses SimilarityLinks that are existent explicitly in the Atomspace. But we could also do the same thing, using SimilarityLinks that are implicit in (autogenerated from) a dimensional embedding space generated from the Atomspace.
Also, currently the dimensional embedding space is set up to be constructed in batch mode. It should be changed to get updated incrementally.
Implement clustering using the Unified Rule Engine
explains how to implement a variety of agglomerative clustering using the Unified Rule Engine. This would allow clustering to interoperate with a lot of other AI algorithms in OpenCog. (e.g. if we are clustering Atoms based on SimilarityLinks, PLN can be triggered to learn new SimilarityLinks in the course of clustering).
Port the visualizer to Oculus Rift or your 3D visualization environment of choice
The ocViewer OpenCog visualizer
visualizes nodes and links in a Web browser, which is pretty cool...
But it would also be cool to see the OpenCog hypergraph in 3D virtual reality. The same REST API method could be used to interface with OpenCog, but of course the visual layout and UI design aspects would be totally different.
This is a good project for someone who knows, or wants to learn about, programming for Oculus Rift or other 3D environments...
Complete the Haskell MOSES deme started by Kaivan Shah, and integrate it into MOSES
An experimental framework for Genetic Programming in Haskell was implemented by Kaivan Shan in summer 2015, see
This still needs some work -- it was *almost* gotten to the point of running actual GP experiments, but not quite. We were considering the possibility of enabling this Haskell deme as an alternate MOSES deme (to be plugged into MOSES, replacing what happens inside the MOSES deme, while leaving the deme level of MOSES intact).
This is good for someone who knows (or wants to learn) Haskell and genetic programming.
User-defined indexes for the Atomspace
The AtomSpace is essentially a certain kind of graphical database. Like other databases, it should allow users to define the kinds of structures that they neeed to be able to quickly locate. This could be done with a user-defined index.
This requires work to be done in conjunction with the type checking proposal. The type checker, including unit tests, shoul take no more than two or three weeks for someone unfamiliar with the Atomspace.
This is mostly straight-ahead coding, little/no AI research needed. The hard part will be understanding the existing code base, to understand where/how to slot this in.
Write a Reduct library for Atomspace
Reduct is a critical part of MOSES which normalizes evolving program-trees into a standard "Elegant Normal Form".
It could be improved in various ways, especially regarding the use of MOSES to evolve programs with complex programmatic constructs like recursion and fold.
However, Reduct has its own special rule engine, and it would be nicer to reimplement Reduct using the Atomspace's Unified Rule Engine rather than adding more and more complexity into a standalone specialized rule engine. This would enable use of PLN to do uncertain program tree reduction, which opens up a whole host of other research areas.
This should appeal to someone who wants to make fundamental progress in evolutionary/inferential program learning.
OpenCog Visualization Workbench
The AtomSpace visualiser only addresses one aspect of a UI for understanding OpenCog dynamics. A generic GUI workbench, which can handle the parsing of log files and dynamic configuration of a running OpenCog instance, would be useful.
Other features that such a workbench might have:
- graphing dynamics through time:
- total atoms of various types
- distribution of truthvalues
- distribution of atom importance (STI/LTI)
- average coordination number (number of links atoms are connected with)
- network diameter
- # of disjoint clusters
- visualising and controlling expansion of the backward inference tree of PLN.
- active MindAgents and their resource usage (memory, CPU)
This could also be integrated with, or made to incorporate the functions of, the OpenPsi GUI that Zhenhua Cai has developed. That UI uses:
- zeromq messages for monitoring state changes from OpenCog.
- implementation in Python.
- pyqt for widgets.
This is not exactly an OpenCog project, but it's a link grammar project, and the link grammar is closely allied with OpenCog (it's used as the first phase in the OpenCog NLP pipeline). See
for general info on link grammar, and
for the current link grammar code.
Currently, Link Grammar is unable to properly handle quoted text and dialogue. A mechanism is needed to disentangle the quoting from the quoted text, so that each can be parsed appropriately. This might be done with some amount of pre-processing (e.g. in RelEx), or possibly within Link Grammar itself.
Its somewhat unclear how to handle this within link-grammar. This is somewhat related to the problem of morphology (parsing words as if they were "mini-sentences",) idioms (phrases that are treated as if they were singe words), set-phrase structures (if ... then... not only... but also ...) which have a long-range structure similar to quoted text (he said ...)
Refactor MOSES for Lifelong Learning and Transfer Learning
Causing MOSES to generalize across problem instances, so what it has learned across multiple problem instances can be used to help prime its learning in new problem instances. This can be done by extending the probabilistic model building step to span multiple generations, but this poses a number of subsidiary problems, and requires integration of some sort of sophisticated attention allocation method into MOSES to tell it which patterns observed in which prior problem instances to pay attention to.
A much simpler approach is to exchange the MOSES inner loop needs with the outer loop. Unfortunately, this would be a major tear-up of the code base. Exchanging the order of the loops would allow an endless supply of new stimulus to be continuously processed by MOSES, in essence performing continuous, ongoing transfer learning. This should be contrasted to the current design: the input is a fixed-size, fixed-content table, and during training, it is iterated over and over, within the inner loop. If the order of the loops were reversed, then equivalent function could be obtained simply by replaying the same input endlessly. However, there would now be the option to NOT replay the same input over and over, but rather, it could slowly mutate, and the learned structures would mutate along with it. This is a much more phenomenologically, biologically correct approach to learning from input stimulus.
Uncertain temporal reasoning using fuzzy Allen Interval Algebra
Modify the Pattern Miner so its filters are provided in the Atomspace
Implement an “Atomese/atomic” language interpreter
Increasingly, throughout 2015, "Atomese" (the coding of Atom constructs in Scheme, python or Haskell, for loading into Atomspace) is becoming more and more like a programming language. We can map and do tail recursion, and generally use Atomese as a unique combination of logic programming and functional programming, with probabilistic and other interesting aspects.
However, the surface-level syntax of Atomese is still a bit ugly and awkward. Some ideas for making it more elegantly readable and usable are given here
Implementing these or some variant thereof would be useful. There are many possible ways to do this, e.g. in Haskell, Scheme or python...
Implement the invocation of array operations from within OpenCog
Yenatfanta in Addis Ababa has gotten started on this already. See
for high-level design ideas.
Implement a deep learning vision algorithm using OpenCog as a framework
Go through the following articles:
for some ideas on how to do this.
The high level motivation is to enable cognitive feedback to perception algorithms. But before experimentation with this occurs, the "plumbing" of doing deep learning in OpenCog (using OpenCog as a wrapper for lower-level NN-and-similar algorithms running on GPU) needs to be completed....
Connect Lojban speech-to-text and text-to-speech to the OpenCog/Lojban QA system
Thanks to Roman Treutlein, we now have basic comprehension and generation pipelines in place in OpenCog, for the wonderful Lojban language.
Connecting Lojban STT and TTS systems to this, would give a basic Lojban spoken dialogue system.
Lojban TTS is very straightforward, if one's not overly worked up about aesthetics.
Lojban STT requires making a training corpus and then using existing OSS STT software to make appropriate models.
Markov Chain Based Adaptive Inference Control
The biggest challenge regarding automatic inference is "inference control" -- choosing what inference step to take at a certain point, based on the context, the prior inference steps taken, etc.
As a first step in this direction, we want to
- save the inferences done by PLN in a separate "inference history Atomspace"
- mine simple sequential patterns in this inference history Atomspace.
- We can start with patterns involving sequences of inference rules (rule R1, feeding output to rule R2, feeding output to rule R3, etc.). ** Next we could look at contextualized patterns, e.g. "R1 acting on Atoms in category C, feeding output to rule R2, feeding output to rule R3"
- Each sequential pattern should be assigned a number indicating its count (the number of times it was used), and also a number indicating the amount of information generated via its use (on average)
- Then make the rule selection in the URE use these sequential patterns to guide its rule choice
Pattern Mining Based Inference Control
A more sophisticated approach to history-based adaptive inference control is to use the Pattern Miner to identify patterns in the inference history Atomspace, and then use these patterns in the rule selector. This may require some careful tuning of the Pattern Miner and its filters, to enable it to learn patterns of sufficient complexity without getting overly bogged down in combinatorial explosions.
Parallel Backtracking for the Pattern Matcher
The Pattern Matcher is a powerful tool and an OpenCog workhorse (thanks Linas!) but in its current version, a single PM query only can run in a single thread.
Relaxing this constraint is not that simple, because the core algorithm inside the PM is backtracking, which is serial by nature.
There is, however, a moderately large literature on algorithms for parallel backtracking. Backtracking parallelizes to GPUs horribly, but it parallelizes to multiple processors on a standard SMP multiprocessor architecture just fine. But the algorithms are moderately complicated, and the PM embodies a fairly complicated version of backtracking.
So this is a project for someone who
- really likes C++
- is very comfortable with data structures and algorithms (complex tree/hypergraph manipulations, etc. etc.)
For the right person this should be a huge amount of fun! After all, there aren't many chances these days to implement algorithms of this complexity, combining AI with efficiency optimization, and have a bunch of users delighted with your work...
More Open-Ended Research Projects
Extracting Symbolic Concepts from Deep Vision Networks
Building on the prototyping work with DeSTIN and OpenCog reported in DestinOpenCog page, the idea is to use the "deep learning neural nets wrapped in OpenCog" framework (whose construction based on the ideas in article Deep Learning Perception in OpenCog, which is currently underway as of Jan 2016) to form states based on observing images (and later videos), and then use the OpenCog Pattern Miner to recognize patterns among these states.
For instance, an initial test would be to feed the system pictures of cars, motorcycles, bicycles, unicycles and trucks (videos would be great, but let's start with pictures) and see if the whole system results in OpenCog getting ConceptNodes for car, motorcycle, bicycle, unicycle and truck, with appropriate logical links between them like
Inheritance car motor_vehicle <.9> Inheritance truck motor_vehicle <.9> Inheritance bicycle motor_vehicle <.1> Similarity bicycle unicycle <.7> Similarity bicycle motorcycle <.7> Similarity motorcycle car <.6> Similarity bicycle car <.3>
Note that while I'm using English terms here, in this experiment the OpenCog ConceptNodes would have no English names and English words would not be involved at all. I'm just using the English terms as shorthands.
This would be a first example of the bridging of a deep network's subsymbolic knowledge and OpenCog's symbolic knowledge, which would be pretty cool and a good start toward further work on intelligent vision ;)
Implement simplified Word grammar style parsing in the Atomspace
The idea here would be to do parsing in the Atomspace rather than in separate external software like the link parser -- but using the link grammar dictionary. (The link grammar dictionary has already been fed into the Atomspace as Nodes and Links.)
The parsing process I'm envisioning would proceed forward through a sentence, and when it encounters a word W, would try to find a way to link W to words occurring before W in the sentence, consistent with the links already drawn among the words before W. If this is not possible, the process would then backtrack and explore alternate linkages among the words occurring before W.
Implementation-wise, this in-Atomspace parser would be a chainer somewhat similar to the existing PLN backward chainer, but customized for the parsing process.
This project would require some intuition for linguistics, plus good C++ skills (e.g. it will require making a custom callback for the Pattern Matcher in C++).
This is in a loose sense a "word grammar style" parser, but using the link grammar dictionary. The first version would utilize only syntactic links but of course since it's all taking place in the Atomspace, there is also an option to use semantic information generated mid-parse from syntactic links to prioritize possible linkages.
A 2015 GSoC student got started on this project, but didn't make a huge amount of progress; see
for the partial start he made on the idea.
Improving the Core MOSES Algorithm
A couple ideas for improving MOSES have been bouncing around for a while. Neither is tremendously simple or easy; both seem quite promising.
Add Library of Known Routines
Currently, MOSES constructs programs out of a very minimal collection of functions and operators; for example, boolean expressions are constructed entirely from and, or, not. However, learning of programs could happen much faster if more complex primitives were provided: for example, xor, a half-adder, a multiplexer, etc. Rather than hard-code such functions into combo, this project calls for creating a generic library that could hold such functions, and the infrastructure to allow MOSES to sample from the library when building exemplars.
This is a needed step, on the way to implementing the pleasure algorithm below, as well as for transfer learning, etc.
Proof that it is working would be defining xor in some file (as combo, viz. that xor($1,$2) := and(or($1, $2), not (and($1, $2))) and then show that this definition was actually used in solving the k-parity problem.
Currently, the combo boolean primitives are and, or, not. Moses can learn to model input data (tables of boolean values, with hundreds or thousands of rows, and dozens or hundreds of columns) by randomly assembling and trying out trees of these three boolean ops.
If the input data just happens to be exclusive-or truth tables, or multiplexor tables, or adders, whatever. then, given enough time and effort, moses can discover the tree expressions that correspond to the truth tables. Unfortunately, it can take a huge (exponentially, combinatorially large) amount of time to do this.
So, the core idea is to short-cut the learning process by enriching the three primitives and/or/not with some extra ones. That way, instead of trying out random combinations of and/or/not, it would explore random combinations of and/or/not/other-stuff.
There are two ways to do this:
1) hand-code some new a-priori trees.
2) automatically discover and remember useful trees.
The second way is strongly preferred. So, that is the big picture. But its open-season on all of the details. How does discovery happen? What gets stored after being discovered? What format is used to store things? After being discovered, how does moses draw upon this knowledge base? Does it keep track of how useful some bit of "knowledge" is? How? What modifications are needed to the random-tree generator to make this all work?
These all need answers, and all need to be converted into C++ code, And it all needs to be done in a way that is efficient, well-designed, doesn't perturb the existing code base by too much, looks elegant, etc.
AtomSpace-ish Pleasure Algorithm
Start implementing (even partially) a version of Pleasure  on the AtomSpace. Since MOSES models can be exported to the AtomSpace many processes can take place there, such as the Pattern Miner, PLN, etc. The idea is to attempt to leverage that to implement one or a few ideas of the PLEASURE algorithm for program learning.
Planning via Temporal Logic & Consistency-Checking
A general and elegant way to do planning using PLN would be
- Use explicit temporal logic, so that the temporal dependencies between different actions are explicitly encoded in Atoms using relations like Before, After, During, etc.
- Improve the backward chainer so that, when it spawns a new Link in its chaining process, it checks whether this link is logically consistent with other promising Links occurring elsewhere in the backward chaining tree... (where if A and B are uncertain, logical inconsistency means that (A and B) has a much lower truth value strength than one would expect via an independence assumption.)
The search for logical consistency can be done heuristically, via starting at the top of the back-inference tree and working down. If quick inference says that a certain Link L is consistent with the link at a certain node N in the BIT, then consistency-checking with the children of N is probably not necessary.
This approach would subsume the heuristics generally used in planning algorithms, into a generally sensible method for inference control...
This goes beyond, and could incorporate, the smaller project mentioned above on this page, of porting the ocPlanner (which synthesizes basic planning with navigation) to the Unified Rule Engine.
Unsupervised Language Learning
OpenCog’s current NL comprehension pipeline is based substantially on hand-coded rules, but this is not a desirable situation AGI-wise. A design for a system that replaces these rules with rules learned via unsupervised text analysis has been posted XXX, and implementation has begun XX, but there’s lots more to do. See this wiki page XX as well.
Probabilistic Programming in OpenCog
Ben Goertzel is in the midst of cooking up a scheme for representing OpenCog's various learning algorithms in a standardized way, in terms of (mostly optimization queries for) probabilistic programming.... For instance, adaptive history-based inference control for PLN can be modeled in terms of sampling from the distribution of prior inference sub-dags.
See OpenCoggy Probabilistic Programming for some detailed ideas in this regard....
As a start along this path, it can also be interesting to implement simple probabilistic programming constructs and methods in OpenCog. For instance, one could implement a SampleLink of the general form
SampleLink > predicate outputting the probability or priority of an Atom ... sampling is done proportionately to this number < > predicate restricting which Atoms to be sampled from < > Optionally, GroundedSchemaNode indicating the sampling scheme <