Theory and Practice

In this post I'll give a brief presentation of some software development techniques that seem to work for me.

Incremental Development

Is it better to have a Big Design Up-front or just be Agile? Both is better. If you have an idea of the overall picture before even writing the first line of code (fiddling around with technologies you'll use or GUI prototypes don't count) then it is less likely that you'll be surprised by some issue that force you to refactor a large portion of the code. The best way to have a good architecture is to study similar existing projects. Experience is a good asset. If you don't have it then at least try to take it from others. But don't over do it. You should not strive for a detailed design because it will surely be wrong. Instead fill in the details as you go. I prefer a vertical implementation style. First implement the minimum functionality so that you have a program actually running and resembling somehow the final product. Then add features one at a time. Before adding a feature take time to complete your original design sketch up to a satisfactory level of detail. For a object-oriented language satisfactory means specifying all classes and all public members. But adding other details will do no harm. Do not implement two features in parallel. Focus on one at a time. To do this you need a more or less orthogonal list of features. Finding orthogonal features is the hard part. Whenever you add a feature refactor the code of the other features to accomodate the new one if this is needed. Don't be affraid to refactor. Improving code is always good. But never leave a feature unfinished. You should never consider a feature finished if there is still something that doesn't work quite right. You also need to finish modifications to other features. Things you leave behind will clutter and later they'll haunt you! Trust me. Whenever you find yourself saying: "I'll do this small tweak later, now I want to move to the next feature.", think also: "If I say this it means I'll never have time to get to it. In the next few days I'll be busy and after that I'll simply forget. So it's better to take now a conscious decision if this thing can be out without repercursions."

Data Structures and Algorithms

Features tend to be associated with algorithms. The algorithms act on data structures. Each program tends to have a central set of data structures and adding features mainly means adding new algorithms for handling those data structures. (Of course, there are also data structures that are local to a certain algorithm/feature.) It is VERY important to get right the design of data structures on which a lot of algorithms will act! Changing the data structure later means changing all the algorithms already implemented. So do this design in the begining and do it right. Most of the features you will implement will require the design of at least one algorithm. It might be trivial (more often than not) or it could be a really smart one. In whatever category it is pay close attention to its complexity. Then plug in some numbers and see which choice is acceptable in terms of space and time and also easiest to implement and maintain. Choose that one! If profiling shows you are wrong, and the program is to slow (or too memory hungry) then your algorithm inventory will help you choose a better one. Does it pay to optimize away constant factors? Usually not! But never be rigid; take rules of thumbs (like the ones presented here) with a grain of salt. Sometimes you DO need to optimize for constant factors. But this is a rare situation.

Test Suites

After you decide what is the next feature, even before you choose the algorithm (but usually after you have the data structures) you should generate a test suite for that feature. Even mathematicians do work by trial and error. There is nothing wrong in using a limited set of tests that could never proove your implementation correct (Dijkstra said the last part). What is important is that very often just trying things out is a revealing experience. It shows bugs and stimulates intuition. Rigouros reasoning should temper this process. So, even if you create tests before implementing any new feature, don't be sure your code is 100% correct. Knowing this makes you smarter, not the other way around. ("People ask me very often how can I live without knowing. I don't know what they mean. It's simple, I do it all the time." Feynman.)

Specification and Design Documentation

Ok. So now you know what the next feature will be and how it fits in the overall picture. You also know what data structures and algorithms you'll use. Write it down! Make a specification. But don't stop here. Describe also the structure of the code that you WILL write. Often you'll use diagrams to do it. For object-oriented design there is a well developed graphical language: UML. When you write this document you'll often notice little things that you missed. Go back and revise your attack plan. Repeat. When you feel comfortable with the document you just wrote, and only after that, it's time to go hit the keyboard.

Logging

Before discussing about code I shall make a short detour. There is a feature that is special. It is special because it helps the development process itself. This is a good enough reason to include this feature into the big picture and also to make it one of the first on your chronological feature implementation list. It is logging. This feature lets you know what is happening inside the program while it is happening. Logging should be off by default but it should be easy (even for end-users) to turn it on in order to diagnose a problem. Programs having a GUI are no exception. Their internals should be as visible as possible when running in debug mode. For more about this read the transparency and discoverability chapter in The Art of UNIX Programming.

Assertions

Assertions are very important to help the understanding and prove the correctness of an algorithm, especially when it is expressed in an imperative language (eg. BASIC, C/C++, Java, FORTRAN). They are propositions that are true at certain points of execution. The code between two assertion batches always presume the above assertions to be true and promise that the below assertions will be true. By explicitely checking assertions you do two things:

Help the guy who reads the code understand it
Help the guy who debugs the code (probably you) pinpoint the piece of code that does not work correctly.

Assertions are old. They were used even by Turing. Knuth says in The Art of Computer Programming that, probably, assertions are very effective because they reflect the way humans think about algorithms. A special case of assertions are pre-conditions and post-conditions. These are simply the first and, respectively, the last set of assertions in a function. You should strive to always specify these, at least in comments. At least in comments? Why not in code? (eg. by using #include <assert.h> in C). It's simple. Not all assertions are easy to code. Those which are not easy to code should not keep you from working. Just write them in comments using whatever abstractions you like (and the language at hand doesn't provide). Just a short note. UML has something named "constraints" which resemble somehow the notion of assertion. Check them out!

Comments

Ah.. Comments. They are SO important. In literate programming these are taken to an extreme: the code becomes embedded in a story. What a wonderful way of publishing articles about algorithms, data structures and such! I'm afraid that real life is not so kind on verbosity. In fact, another way to put it: when you do trivial stuff (algorithms) it's not worth writing too much about them because they'll be easily understood. On the other hand, in an article you probably present something smart so the reader will need all the help he can get. In writing comments it's probably a good idea to follow Dijkstra's notion of politeness: "if you can spend 20min to spare each of your readers 2min then it is worthwhile to do so". Some other good rules of thumbs are:

Write comments delimiting code sections before the code itself. The places where you put these comments are usualy also a good place to put assertions.
Whenever you read some code and don't understand it quickly add comments explaining it (after you do manage to understand it).
Comment every important syntactic element. This means: classes, modules, functions, variables. Some trivial variables, like the i counter are usually not commented.

Peer Review

(Professional) communication with collegues is a good way to turn your workplace into a place where you like to be. Asking them to review your work is likely to improve it. Never receive or give criticism directed to a person. The critique should always be directed at ideas. When to ask for help? In my opinion you should do it in a formal way at least two times per feature. Once after you wrote the specification and design document and once after you finished coding. Inbetween you should strive for a continous and healthy communication with your collegues and with your client. The effectiveness of peer review has been showed by the longstanding tradition of scientific journals. When the review process is rigouros and taken seriously the quality is almost always better.

Tools (example)

In real life you need to be pragmatic. When deciding if you will apply a "Rule of Thumb" you should put in balance its advantages and its disadvantages. Whenever a Rule of Thumb tells you to do something that you usually don't do an important concern is if the spent time will pay of. Well, if the time spent is short it has more chances to pay off :) Here is where the TOOLS come into the picture.

Nothing compares to a good editor. The open source I like is JEdit. The other ones are IntelliJ IDEA for Java and, of course o:) Visual Studio .NET.
Poseidon is a tool that helps you create UML models. It is a very good tool to assist you in specification and design.
Doxygen and javadoc are tools that automatically generate API documentation from code comments.
cccc is a tool that computes code metrics. You will have some objective measures of how well commented your code is, how complex the control structure is, how modular it is, etc.
CVS is a versioning control system. Every developer should feel like suffocating without such a system.

Finally I'll put here a link to a MIT course with much more information. It is however more detailed and hence you will need to devote more time to read it.

Disclaimer

What I described here is more or less a guideline process/methodology. Always feel free to diverge when a particular situation asks for it.