Software Development Process Proposal

From OpenCog


This document presents a set of development practices that we believe would make OpenCog development more user and developer friendly, increase overall code quality, and make it easier for people to become involved with OpenCog, as users as well as developers.

We believe these suggestions would give us higher quality code and processes. More importantly, we believe these practices would play a significant role in fostering community growth.

We have agreed to deploy this process over the months of July to October, 2015, on at least one project: an OpenCog-driven chatbot, deployed through IRC. This project will be run by the Hong Kong OpenCog team and Linas, with help from others. It has a heavy emphasis on integration and releases, which makes it a good fit for the more controlled workflow we propose below. At the end of the project, we'll review the process and tweak it based on what we learn.

Objective: Improving the Community for Different Kinds of Members

We'd like to create a set of development practices that meets the following objectives:

  • Don't make life annoying! Not a lot of overhead.
  • Make it easier for people to start contributing to OpenCog development;
  • Make it easier for contributors to grow into core developers;
  • Make it easier for people to use OpenCog;

All of these goals aim at enabling community growth, on three levels. To explain that, let's first look at the different groups of contributors we have.

Different Groups of Contributors

Different members of the OpenCog community have different needs, and it seems useful to group them broadly into three main roles (there's always going to be some overlap there):

  1. Users: right now, users are developers, they're just not OpenCog developers. One example would be earlier work on the bioinformatics project where MOSES is used, but the people in that project just need a tool that's easy to understand and adopt. Sometimes users will make contributions (bug fixes, etc) as is often the case in open source communities. OpenCog should try to make it easy for people to use it (meaning we should improve access) and also make it easy for those users to contribute, which increases the chances that they'll become regular contributors.
  2. Feature/application developers: these are people who are developing code that is part of OpenCog, or applications heavily reliant on OpenCog. Typically their focus is on a single OpenCog-related or OpenCog-dependent project. OpenCog should make their lives easier by making it straightforward to understand the code and evolve it. One way to do that is to give them high quality, stable, documented code (so we should improve quality among multiple dimensions, not just well written code). Giving these people a pleasant experience increases the chances that they'll become core developers.
  3. Core developers: these are the relatively few people whose main interest is OpenCog itself (and AGI, etc), rather than particular applications. They may work on applications from time to time, but even when doing so they tend to have a broad understanding of the OpenCog codebase and their application work often results in improvements to the fundamental components. Less experienced developers and feature/application developers often rely on them for guidance and help. We can help these people by a combination of: easing their helping burden (if less experienced/knowledgeable developers have an easier time doing their things, core developers have more time for core development) and improving and then maintaining system stability.

As of July 2015, the core developers are Linas, Nil, and William. We want to expand that group over time.

Specific Issues We'd Like to Address

There are many things we can do to help the three kinds of community members. What follows is a list based on previous discussions and efforts. It's not exhaustive by any means, and maybe there are higher priority issues. The ones below combine two properties: fixing them should make life significantly easier for at least some existing community members (so we'd benefit even these changes don't increase the size of the community), and they're likely to remove at least some barriers for people to become users, or to transition from users to feature/app developers, and from developers to core developers.

Here's an initial list of issues we believe we can tackle with some collaborative effort:

  • Unstable development HEAD (as of this writing, the last time the master branch passed BuildBot tests was April 29, 2015, and passing all tests wasn't a regular event before then). This discourages high quality contributions (how can people be sure that their contributions don't break anything if the system is constantly failing tests?) and it discourages adoption.
  • Low testing coverage. Even if all tests passed, there's a feeling that they aren't comprehensive enough to give people confidence.
  • Bad documentation, mostly at the wiki (hard to navigate and full of obsolete, sometimes conflicting information). This makes it hard for people to become developers, or even users. We note that the code level documentation is often very useful.
  • Lack of releases and/or a very clear set of instructions on how to begin. There is ample documentation on how to begin to use or develop for OpenCog, but it hasn't been maintained, and there's multiple suggested avenues (vagrant + VM, Docker, run dependency installing scripts on your own Linux box, etc)

The following sections give concrete suggestions on addressing all these issues, and how doing so would help the different kinds of community members described above.

Improving Stability: Proposed Branching Policy

We'd like to balance the importance of frequent commits to a shared branch with the importance of stable branches. People actively involved in writing code benefit from seeing everything that's going on, and frequent commits provide early warnings of potential conflicts, bugs, etc. On the other hand, people using the code and/or working on isolated components or applications benefit from being able to stay in touch with development without the risk of pulling in broken code.

We'd like to prioritize stability on the master branch. For that to happen, each github project's master branch should follow these rules:

  1. All tests must pass at all times. If a commit breaks the continuous integration tests, it should be rolled back immediately, and fixing it takes place off-branch.
  2. In order to enable the previous rule, only completed work can be merged into the master branch. No work actually takes place in master, Development and bug fixes take place on other branches. master only moves through pull requests being merged.
  3. All work merged into this branch must be reviewed by a core developer before it can be merged (so core developers can merge their own code and anyone else's, but non-core contributors need their pull requests approved).
    1. On specific modules, it's best to appoint someone else as reviewer, if neither Linas, Nil nor William know the code well enough. One example is the REST API.
    2. We can use GitHub permissions if we want to enforce this restriction on who can merge onto master. Perhaps we can start adopting the process without those restrictions, but change to enforcing them if needed (i.e., master branches remain broken, etc).

Pull requests can only be merged in if they pass a set of criteria, which are defined below. In essence, this branch gets completed work on small tasks, and this branch must always build and pass all tests.

Work (development tasks, research, experiments, bug fixes) can't take place directly on the master branch. So we need a branching workflow, and it comes in two varieties: feature branches and personal branches. Feature branches tend to work better, and are pretty standard git workflow these days (see [1]).

People are encouraged to adopt the following workflow based on feature branches:

  1. When you start working on a new item, create a new feature branch on your machine based on the current master HEAD.
  2. As you work, commit code frequently to that branch (at least once a day), and fetch/rebase from the master branch daily as well (don't pull; it pollutes history with merges, while rebase doesn't). This will ensure that you keep up with commits by other people, and will make future merging easier.
  3. Once the task is ready for review, push your branch to the remote origin repository, but don't merge it back to master yet. Rather, create a pull request and discuss on slack or the dev mailing list to see who should review it if you aren't a core developer.
  4. If you know what you're doing you can edit your commit history to make it easier to review. If this sounds scary, don't worry about it.

Once review is completed, the reviewer will merge the pull request into master, and delete the remote branch. Feel free to delete your local branch as well.

A longer version of a similar workflow, with the actual git commands you'd need, is given here: [2]

A Note on Task Size and Merging Frequency

The above workflow tends to work very well when people keep their tasks small, so they're completed relatively often, and no one feature branch deviates from the HEAD too much, or stays open too long. So if we decide to go with a stable HEAD, we need people to break down their work into relatively small pieces, tracked through GitHub issues.

If someone needs to work a multiple month project, they're still encouraged to break that down into a sequence of small tasks, each with its own GitHub issue and feature branch. We can adopt naming conventions and labeling for different projects on GitHub issues to make tracking easier.

More Visible Testing

In addition to the merging process, we should promote continuous integration more heavily. Amen is working on enabling Travis as a continuous integration and automated testing solution to replace the Buildbot. Travis integrates with GitHub, and can be easily setup to test pull requests automatically before they're merged, as well as to display the latest build status on the home page for each GitHub project.

Situations like the current one (last time we got a green build: April 29) should be rare and embarrassing. More visible build results should encourage more frequent green builds, and automated testing of pull requests should reduce breakage.

Improving Quality: Proposed Merging Requirements

If we adopt the above workflow, we need reviewers for every issue being merged. We can use the reviews to improve the overall quality of our source code, our tests, and our documentation. These improvements would make life easier for new developers as well as those whose involvement concerns a single project or component. That easier development would in turn make it easier for OpenCog to expand its core development team.

The checklist for a review could be:

  1. Code looks correct and follows good software design and development practices (if we adopt code guidelines, the review should check those). And we should use tools to automate as much of the guidelines check as possible because life is too short
  2. Appropriate unit tests exist. Either existing ones have been updated, or new ones have been created. Obsolete unit tests should be removed.
  3. Documentation. If the development task provides functionality to be used by others, documentation is needed. It can be new documentation or updates to existing documents. This comes in two formats:
    1. The code should be adequately documented. This means every non-trivial class and function/method should have a brief explanation; and every tricky coding decision should have an explanation -- so people can understand the why, not just the how.
    2. Module-level documentation in markdown and/or examples. These will be automatically converted and published on the wiki.
  4. Dependencies. If the task create new dependencies, verify that those are needed (can we do the same by using libraries we already depend on?). If so, verify that Dockerfiles and install scripts are updated.

The review checklist for bug fixes is a bit different. For each bug being fixed we should add an automated unit test to prevent regression. This is a test that will fail on the code before the fix and pass on the code after the fix.

It's possible to do the reviews in two steps. Less experienced volunteers can take care of ensuring unit test coverage, code documentation and code guideline adherence. They can then mark the pull request as passing these requirements. Core developers can then evaluate the semantics of the code and decide on what do update on the wiki.

If we adopt these review practices, we'll gradually increase our test coverage, code and wiki documentation. It may still take focused sprints to bring all of the above to a really high standard, but these practices will ensure that the movement is always in the right direction, which is a very good start.

Another benefit is that this workflow lets us ask interested but inexperience volunteers to get their feet wet in a way that directly benefits the project: they could start contributing by adding unit tests to code that's currently untested. This is a relatively simple task in many cases (though certainly not all for OpenCog!), and it lets people easily become official "contributors", which tends to encourage community growth.

A Note on Code Documentation

Some modules and projects provide good documentation in the form of Readmes or other markdown documents as well as examples. This is a very good idea, and we should make it standard. Having module-level documentation as markdown documents kept in git means we can update documentation as part of any pull request. This makes compliance and review much easier.

If we adopt this idea, then we should have a prominent section on the wiki dedicated to module-level documentation, and a script can be triggered on every change to master branches, which would compile the markdown documents into MediaWiki format and publish them on the wiki. This would really help fighting wiki obsolescence.

We should also make sure that Doxygen builds the latest documentation on every change to master branches, and that documentation is uploaded to the OpenCog website and easy to reach.

Improving Access: Dcoumentation and Releases

We should have official releases, including traditional release artifacts -- at the very least Debian package files and Release Notes. This would make it a lot easier for people to start using OpenCog some of its sub-projects, including those who want to use the AtomSpace to do OpenCog development but will not be involved with AtomSpace development.

Releases would make it simple for people to just get OpenCog and run it without the frequent configuration. We should ensure that Dockerfiles install the latest releases, and have prominent instructions on the OpenCog site for people who want to just run OpenCog binaries (e.g. by providing, in addition to the Dockerfiles, instructions on running demos).

Releases should be supplemented by a nice set of well-documented demos that do interesting and diverse things. The upcoming OpenCog dialog system is one effort at creating such a demo for the NLP modules, and we should discuss improving other existing examples and creating new ones as well.

Summer/Fall 2015 Release Plan

For the OpenCog dialog system, we'd like to have an initial deployment of a very simple system by the first week of August. This should come with initial releases for the relevant projects:

  • Link grammar
  • RelEx
  • Cogutil
  • Atomspace
  • OpenCog

The latter is the big challenge, of course, and this plan requires us to immediately fix the broken unit tests.

By the end of the project (currently scheduled for October) we'd like to have follow-up releases as needed for those projects. At that time we should have a discussion about release cycles and ideas like long-lived vs short-lived releases (like Ubuntu does).

Release Workflow

Before each release, we should stop development work and focus one or two weeks on the release process. This means people should prioritize:

  • Fixing issues
  • Ensuring documentation is brought up to date
  • Ensuring dependency information is up to date so packages can be built

The exact details need a bit more fleshing out.

Documentation Improvements

For the initial August release, we'd like to ensure at least the following documentation requirements:

  • All information on the wiki Homepage, Background pages and AI Documentation pages is current.
  • There is a single set of Getting Started pages, and it works reliably. This should perhaps be Docker based?
  • We have a script that automatically converts Markdown to MediaWiki, and it pulls selected documents from GitHub on every commit and publishes those to a new category on the Wiki, with a (manually generated) outline for code documentation.

Other efforts on cleaning up the Wiki will be ongoing but don't have to be completed by then.

Releases and Maintenance

The creation of releases brings up the issue of release maintenance. We'd have to decide what to do about that, as the ease of use comes with a cost in maintainer time. One possibility is commercial users funding a paid maintainer, part-time or full-time.