If you are a beginner at open source you may run in to lot of tools associated with open source technical infrastructure, to the point that you feel overwhelmed by the sheer number of tools and jargon that you have to become familiar with. At least this happened to me when I started on my open source journey not so long ago. The fact that it was also the first time I was exposed to industrial level software development didn’t help things either. Anyway after riding through the initial learning curve I now realize how these systems are vital for an open source project. So I thought of making a post of what I have learned so far about the tools that typically constitute an open source software development infrastructure. But items mentioned here are not exclusive to open source software development. Any form of large software development will need these tools to enable effective collaboration among developers and project monitoring.
What an open source project typically needs
– A Web site
– A Version Controlling software
– A Bug Tracker
– A Wiki
– A Mailing List
– An IRC channel
This is a must for any serious software project. This is central point where the information about the project is disseminated to the world. Also the project software is hosted in the site so that users can download and use it. In addition user guides and other information such a project mailing lists, source repository URLs etc. are given in the site. So an informative and easily navigable web site is a must for a good open source project.
Version Controlling Software
Version controlling is a system which enables tracking and controlling changes to a project’s files, in particular to source code, documentation, and web pages. One of long standing version controlling system are CVS. But recently Subversion has become popular with quite a bit of upcoming and existing projects adopting it. Version control systems carry their own set of jargon which anyone using them has to be familiar with.
Repository – Database where the source files and their changes are stored. Some version control systems have centralized databases while others have decentralized databases.
Commit – To make a change to the sources at version control database so that they can be incorporated into future releases of the project. People with authority to commit to the database are called commiters and the commitership is something usually attached with political powers of the project like voting rights during project votes. Usually you are made a committer when the community comes to an agreement via a vote that you are satisfactorily familiar with the project after inspecting amount of work done by you in the project up until the vote. Until you gain committership you can only provide patches so that other commitors may review it and then incorporate it to the code base using there commitership powers.
Patch – Patch is a text file illustrating differences you made to the project sources which is then sent to the project mailing list or submitted to the issue tracker of the project according to the project policy of submitting patches. Providing patches is usually the entry point for a newcomer of a project to start making contributions to the project. It is usually derived by making a diff on the working copy on your machine. See below for the definitions of diff and working copy.
Diff – This is a textual representation of change. A diff shows which lines were changed and how, plus a few lines of surrounding context on either side. Usually this is synonymous with a patch. The version controlling software usually has command with a similar name to create a diff file on a changed source. (e.g.: in Subversion it is called diff)
Checkout – This is the process of obtaining a copy of the project from the project repository. This produces a directory tree called a working copy in the local machine.
Working copy – The developer’s private directory tree containing project sources. A working copy also contains meta data managed by the version control system, telling the working copy what repository it comes from, what revisions (see below) of the files are present, etc. Generally, each developer has his own working copy, in which he makes and tests changes, and from which he commits.
Revision – A revision is one specific state of the file or directory is in or has been. For example, if the file starts out with a revision 1 after someone commits a change to the file this produces revision 2 of the same file.
Branch – A copy of the project under version control. Commits to a branch doesn’t affect other copies present in the repository or the main project directory tree which is usually called trunk. This isolates a line of development of project from the main development of the project.
There are several reasons for branches to exist in a project. Some of them may be
* The development work carried under the branch may be experimental and not in the regular lines of the project.
* When a release of the project is near necessitating it to forbid making changes in the trunk in order to maintain the stability of the code in the trunk prior to the release.
* Conversely a branch can be used as a place to stabilize a new release. In this case no changes for the release branch may be allowed while regular development work is going on in the main branch.
Merge – The final aim of creating new branches is to merge them back to main branch so that changes made in the branch is transferred to the main branch. So when branch comes to a stage that it seems to be stable then developers can incorporate changes in the branch to the trunk to incorporate enhancements to the trunk.
Conflict – This happens when two people try to make different changes to the same source. Version control system detects the conflicts and notify the users so that it is up to users to sort out the conflicts between them and resolve them according in the version control system.
The bug trackers or more correctly issue trackers, are responsible for reporting and tracking status of the bugs, catering for feature requests and submitting patches. The project developer mailing list is linked with the issue tracker so that once a user or a developer creates/resolves an issue or submits a patch a mail is sent to the developer list notifying this. Once an issue is created it is in the open stage. Then some developer may assign the bug to himself or it may have been assigned to him at the bug creation time. Others can comment on the issue as how it can be solved or its effect on the project. Then the bug is reproduced and diagnosed using the information produced in the bug report. A developer then creates a fix for this and submits his work in the form of a patch for a community review or as a direct commit if he has power to do so. Alternatively the bug gets scheduled for a future release according the nature of the bug. (Like fixing of bug requiring extensive rework of the some codes which is not possible due to the time constraints of the current release and it is not critical enough to worth the trouble).
This is not a must and some projects may not have a wiki though having a one is worthwhile if the project has lot of documentation requirements. It is a web site that allows any visitor to edit or extend its content. This can be a place to build documents which build over time like FAQs in which user inputs are required. Then the refined documents can be extracted from the wiki and can be transfered to the project web site.
Most projects offer real-time chat rooms using Internet Relay Chat (IRC). This can be used as a place where users and developers can ask each other questions and get instant responses. But this is not a must and some projects don’t have their own IRC channel.
The beauty of the open source software development is that, it brings different indiviuals across the globe to a single develoment team. So an effective communication medium is necessary. Mailing lists are the nuts and bolts of communication in the open source world. Anyone interested in the project can subscribe to the project mailing list and receive the mails sent to the list. Most projects have multiple mailing lists. It is usual to have two mailing lists called developer and user mailing lists. Developer mailing list is ususally where bug reports and version controlling system generated messages are sent. Or alternatively they can go in a seperate list. Additionally developers talk about project development topics and architectural issues in the devloper mailing list. User mailing list is for the users where they can ask questions about general usage of the software or about issues they face when using the software.
Though I had to learn most of the above stuff in an ad hoc manner as I got along with coding, one fine day I got hold of the book “Producing Open Source Software” by K.Fogel which pretty much consolidated my understanding of how things work in open source. It is really a great book and talks about many other things regarding open source though I found the above is what really needed in getting a good grasp of basic technical infrastructure.
So as can be seen each different component described above plays an important role in the development of the open source software. And I hope this rather long post may prove to be useful for some one coming in to the open source software development world. Any comments about the things I may have missed or stated wrong, are welcome as usual.