Programming Style


[ Home ] [ Classes ] [ Research ] [ Links ] [ Biography ]

CS Dept.

Notes on Programming Style

by C. S. Ferner

Program Design

Programming Style

Debugging

Program Design

Motivation

The majority of the cost for creating software is actually not in the design and development stages of the software life cycle. The majority of the cost is in debugging and maintenance. For example, it takes a great deal more effort to find and fix a bug that is reported by a customer months after the release of a product, than it does to find and fix it when the software is begin written. This means that it is essential to plan ahead and use good programming habits when writing software.

Does this relate to Computer Science students? Definitely! Many Computer Science students will seek employment as software developers when they graduate. What students do not realize is that it is very unlikely that they will be developing software by themselves. Instead, they will be brought onto a project well after its initial creating and will likely not stay with that project for its lifetime. The probability that you will need to read, maintain, and modify someone else's code is very high. Likewise it is very probably that someone else will have to read, maintain, and modify the code you have written.

Good software development is not simply a matter of knowledge, but rather it is a skill. This means that Computer Science students need to learn by seeing examples, having lots of practice, and getting feedback. This document is written to discuss some of the habits students should learn early on to develop good programming habits and be able to write better software.

Qualities of Good Software

Elegant code is the goal. Elegant code is software that solves the problem correctly, does so efficiently, is the simplest possible solution, and is easy to understand and debug. Simple is good. Complicated is bad. Simple code improves readability, writability, maintainability, and reliability.
Design from the top down. Start at the highest most general level. Write a description of the algorithm or approach in English first. Then translate it to pseudo-code, then to real code. The real code should closely follow the English description.
Methods should be short and accomplish very specific tasks. If a method is too long (more than a page or so) or complicated, then break it into sub-methods.
Avoid repeating code. If more than one section of code does the same thing, then see if it can be generalized and placed into another method.
If an algorithm gets too complicated, then rethink the problem. There is probably a more elegant solution. Can you create another method that will take some of the hard work out and make the algorithm easier?

Who is the user?

We use the word user often. There are three distinct people (or roles) that are important when software is being developed. They are: the designer, the developer, and the user. It is very difficult but important for students to separate different roles when the develop different parts of their program. This is because students play all of these roles when they write programs for class assignments.

The designer is the person who decides what a product, program, package, class, method, API, etc. will do and how it will work. This usually means designing the interface. For a class, this means designing the public methods (or data). For a method, this means determining the number and type of parameters and the return value.

The developer is the person who decides on the implementation and actually writes and debugs the software.

The user is the person who is going to use the software being written or debugged. This may be the end user who is typing at the keyboard and using the mouse, or it may be the person writing other software that uses our software. Furthermore, the user may change depending on which part of our software we are developing.

For example, suppose we are writing a class that implements complex numbers. The designer decides that we should have methods to construct, add, subtract, multiply, and print complex numbers. We as the developer decide to implement complex numbers use two floating point values: one for the real part and one for the imaginary part. The user of our class is another developer, writing software that instantiates complex number objects and manipulates them using our methods.

Think More Code Less

It is common for students to start writing their program without much thought first for how they are going to do it. This usually leads to more complicated and harder to debug programs, and ends up being slower in the end. For example, suppose we want to use a stack to evaluate an arithmetic expression. The algorithm states that when processing arithmetic operators, we should pop any operators off the stack that are higher or equal precedence than the current operator before pushing the current operator on the stack. Consider the following solution:

if (currentOperator == '+' || currentOperator == '-') {

top = stack.top( );

while (top == '+' || top == '-' || top == '*' || top == '/') {

System.out.print( stack.popAndTop( ) );

top = stack.top( );

}

stack.push( currentOperator );

} else if (currentOperator == '*' || currentOperator == '/') {

top = stack.top( );

while (top == '*' || top == '/') {

System.out.print( stack.popAndTop( ) );

top = stack.top( );

}

stack.push( currentOperator );

} else ...

This algorithm is messy because of the verification of higher or equal precedence. It is not only difficult to read and understand, it is also difficult to maintain. So how can it be improved? By generalizing this verification of higher or equal precedence and moving it into a method, the algorithm is much cleaner:

top = stack.top( );

while ( precendence( currentOperator ) <= precedence( top ) ) {

System.out.print( stack.popAndTop( ) );

top = stack.top( );

}

stack.push( currentOperator );

The precedence method is actually very easy to implement and therefore is easy to debug. By removing this functionality, the above code is simpler, more elegant, easier to understand, and easier to debug. We also reduce duplicate code by generalizing the test for precedence.

Passing the Buck

Well written code should pass the details of implementation down to the lowest levels of abstraction as possible and pass the handling of exceptions, errors, and communicating with the user to the highest levels of abstraction. Consider the following program segment, which will ask the user for a value and determine if that value is prime.

public static void main( String args[] )

{

...

while ( ! done ) {

System.out.println( "Please enter an integer (>= 0)" );

inputLine = stdin.readLine();

try {

i = Integer.parseInt( inputLine );

if (isPrime(i)) System.out.println(i + " is not a prime number.");

else System.out.println( i + " is a prime number." );

done = true;

} catch( NumberFormatException e ) {

System.out.println( "Sorry, cannot convert input to an integer" );

} catch( Exception e ) {

System.out.println( e );

Sytem.out.println( "Cannot check for Prime" );

}

public static boolean isPrime( int x ) throws Exception

{

int i;

if (x < 0 )

throw new Exception ("Invalid number (must be >= 0)");

else if (x == 0 || x == 1)

return false;

else {

for (i = 2; i <= Math.sqrt( x ); i++)

if (x % i == 0) return false;

}

return true;

}

The details of determining whether the input is a prime could have been done in the main program. However, this would make the main program longer and more complicated. The details are moved into a new method so that the main method can be cleaned up. This is called creating an abstraction or abstracting out the details . From the point of view of the main method, the details of determining prime is abstract. Now the main method only does those things that involve communication with the user.

Two other advantages of doing this are 1) the isPrime method can be used elsewhere, in other parts of this program or in other programs, and 2) we are free to change the algorithms for determining a prime. For example, it would be more efficient if we did not check if x is divisible by even numbers. If 2 does not divide x, then none of the other even numbers will either. The same is true of multiples of 3. If 3 does not divide x, then none of the multiples of 3. This is the premise of the "Sieve of Erotosthenes" algorithm for finding prime numbers. We are free to choose other algorithms for finding prime numbers. Making this improvement to the algorithm is easier because we have abstracted out the details from main. Furthermore, if we have used this method elsewhere, there is only one location in the program for use to make a change.

Notice also that we did not print any messages from the isPrime method, but rather passes the buck for dealing with the user back to the calling method. This brings up the next topic of consistent levels of abstraction.

Consistent Levels of Abstraction

This is one of the most common mistake that students make. It is very easy to print the answer in the isPrime method. But that means we are doing some printing in the main method and some printing in a lower level method. That is inconsistent. However, that is not the worst of it.

What would happen if we used this method in another program where we did not wish to print a message? When we are writing the isPrime method, we must assume that the user (whoever is calling our method) knows best how to handle the case where the input is invalid. The primary purpose of our method is to check for prime and THAT IS ALL. If we cannot accomplish that (e.g. the input is invalid) then we return an error value or throw an exception so that the calling method can deal with the problem. This may mean printing a message to the user, or it may mean throwing out some values, or it may not even be an error. It is not up to us (i.e. the writers of the isPrime method) to decide that. The end user may have no idea what "Invalid number" refers to or what to do about it. Perhaps they just hit the "Save" button. Most people have seen error message windows that say something about a stack dump or register trace. Ask yourself if you have any idea what that means or what to do about it. This is why you should pass the buck upward.

The higher levels are best way to handle problems and the lower levels are best for handling implementation details.

Programming Style

Indention

The statements within a block should be indented over (at least 4 spaces). Nested blocks should be indented further. The beginning and ending braces should not be indented but aligned with the enclosing block or statement. The beginning brace should go in one of two locations: either underneath the enclosing statement or at the end of statement. The following two examples are acceptable forms:

if (x != y)

{

a = x * y;

b = a * Math.PI;

} else

{

a = x * x;

b = a * 2.0 * Math.PI;

}

if (x != y) {

a = x * y;

b = a * Math.PI;

} else{

a = x * x;

b = a * 2.0 * Math.PI;

}

Use Constants Instead of Literals

There are a few literals that are appropriate to use directly in a program, but every thing else should be a constant. Numbers such as zero, one, and two are usually okay as literals because their use is usually obvious (such as 2*pi in the example above). However, numbers like 42 in the example below are not obvious. The reader is likely to ask, "Why 42? Why not 43 or 52? What is significant about that particular value? What happens if I were to change that number? Would the program crash? Is it okay to make the value bigger/smaller?"

int a[];

a = new int [42];

for (i = 0; i < 42; i++)

a[i] = myMethod(i);

...

for (i = 0; i < 42; i++)

System.out.print (a[i] + " ");

System.out.println ((char) 42);

static final char ASTERISK = '*';

static final int ARRAYSIZE = 42;

int a[];

a = new int [ ARRAYSIZE ];

for (i = 0; i < ARRAYSIZE ; i++)

a[i] = myMethod(i);

...

for (i = 0; i < ARRAYSIZE ; i++)

System.out.print (a[i] + " ");

System.out.println (ASTERISK);

Another reason to use constants is to reduce the modifications necessary should the value need to be changed. Suppose we did want to change the value of 42 to 80 in the above program. If we used a constant instead of literals, we would have only 1 location in the program to make the modification, as opposed to many locations if we stuck in 42 everywhere.

Furthermore, if we use literals instead of constants, like the program on the left, we run the risk of changing a literal (because it just happens to have the same value) that has nothing to do with the the constant that we intend to change. For example, the last line in the above program is to print an asterisk, whose ASCII value is a 42. That happens to have no relationship to the array size. Had we changed all occurrences of 42 to 80, then we modify the functionality of the program in a non-trivial way.

Use Descriptive Variable, Constant, and Methods Names

You should use names that indicate the purpose of a variable, constant, or method. That implies that a variable has a single purpose. In other words, it is poor practice to reuse the same variable over and over, unless it is something like an loop index. Consider the two program segments:

x = stdin.readLine();

if (x != NULL) {

y = Float.parseFloat(x);

z = 38.0f;

z = y / z;

}

final float milesPerGallon = 38.0f;

userInput = stdin.readLine();

if (userInput != NULL) {

numOfMiles = Float.parseFloat(userInput);

numOfGallons = numOfMiles / milesPerGallon;

}

It is difficult to understand what the purpose is of the program segment on the left. The variable names give no clue, and variable z serves two purposes. The program segment on the right does exactly the same thing. However, now it is clear what the author is trying to do. The program is taking the user input as the number of miles, and converting that to the number of gallons required, assuming 38 miles per gallon. Maintaining the program on the right is much easier.

Use of Parameters and Local variables as much as possible

The variables and constants used in a method should fall into one of three categories with respect to where they are declared: class data member, parameters, and local variables. When a person is reading a program, they should be able to look in one of these three locations to find the declaration of any variable used in a method. Data members should be used only for data that is essential to the class or object. They should not be used to provide information to a method. This is what parameters are for. Furthermore, if a variable's value is not used outside of a method, then it should be a local variable.

Documentation

Adding comments to your program will go a long way toward making it more readable and maintainable. Not everything needs to be commented. For example, if you you descriptive names and methods that have very specific purposes, then much of your program code will be self-documenting. However, even if your program is readable, there are still some things you should comment.

Class headers
Method headers
Non-obvious constants or literals
Obscure algorithms

Class headers should include: author, class name, purpose of the class, use of the class or objects, public data and methods, base or super classes, and related classes

Method headers should include: method name, purpose of the method, parameters, return values, and assumption on the parameters or input. The particular format that you use to provide this information is not as important as the fact that you have provided it.

Any program segment where something is not obvious to the reader should have a comment to aid the reader in understanding what it does. An example of non-obvious code is the following:

// METHOD NAME: daysApart

// PURPOSE: To compute the number of days between two distinct dates

// PARAMETERS: month, day, and year (integers) for two separate dates

// RETURN: the number of days between the two dates

// ASSUMPTIONS: The input is assumed to be 1 <= month <= 12, 1 <= day <= 31,
//              and the dates are assumed to be after March 1, 1900. If

//              we wanted to include prior dates, we would have to add

//              a 1 to N for anything from March 1, 1800 to Feb. 28, 1900,

//              and a 2 to N for anything prior to Feb. 28, 1800.

//

public static int daysApart( int month1, int day1, int year1,

                             int month2, int day2, int year2 )

{

    return (computeN(month1, day1, year1) -

            computeN(month2, day2, year2));

}

// METHOD NAME: computeN

// PURPOSE: To convert a date to an integer

// PARAMETERS: month, day, and year (integers)

// RETURN: and integer representing the date

// ASSUMPTIONS: The input is assumed to be 1 <= month <= 12, 1 <= day <= 31,
//              and the dates are assumed to be after March 1, 1900. If

//              we wanted to include prior dates, we would have to add

//              a 1 to N for anything from March 1, 1800 to Feb. 28, 1900,

//              and a 2 to N for anything prior to Feb. 28, 1800.

//

// DESCRIPTION: The following algorithm is provided by Stephen G. Kochan,
//              "Programming in ANSI C", Hayden Books, Carmel Indiana, 1988,

//              p. 188.

//

//              N = 1461 * f(year, month) / 4 + 153 *g(month) / 5 + day

//

//              where f(year, month) = year - 1     if month <= 2

//                                     year         otherwise

//

//              where g(month) = month + 13         if month <= 2

//                               month + 1          otherwise

//

private static int computeN( int month, int day, int year )

{

    int N;

    if (month <= 2) { year--; month += 13; }

    else month++;

    N = 1461 * year / 4 + 153 * month / 5 + day;

    return (N);

}

Notice that the body of the method computeN is rather obscure. However, with the description in the header, one can now understand what it is doing. Notice also that there are literals in the code, instead of constants. This is a case where constants do not really add much to the readability. It is not clear what the 1461 and the 153 do. Instead, a reference is given to a published source for this formula. This tells the reader that the author of this program segment does not have any more information to provide as to why 1461 is used, but if the reader needs more information, he or she has a place to turn. The point is this: comments are very useful when it is not clear why things are done they way they are. Also, the method headers from provide useful information such as the assumption that the dates are after March 1, 1900. This kind of information is essential to properly maintain this code.

Always Check Validity of Inputs

The above code makes the assumption that the month, day, and year of each date is within a specific range. This is actually poor practice. What happens if the user enters the dates incorrectly? Obviously, they will get an incorrect answer, but more importantly, will they understand there mistake and be able to fix it? Depending on the interface and the number of levels of abstraction between the user and this segment of program code, they may have no idea why they are unable to get the proper results. The best practice is to check the validity of all inputs and not make assumptions that it is correct. If the input is not correct, then we should handle it gracefully and report the problem to the next higher level (either through an exception or error status, see Passing the Buck above).

A more important situation is when an incorrect input will cause major problems. Take the following program segment as an example:

public static float computeAverage( float [] a, int count )

{

    int i;

    float sum = 0.0f;

    if (count <= 0) return 0.0f;

    for (i = 0; i < count; i++} {

        average += a[i];

    }

    return (average / count);

}

This method will compute the average of the floating point numbers in an array from zero to count - 1. However, what happens if count is zero (or negative). We could assume that the user provides the correct input. But if we are wrong, we could end up with a division by zero in the return statement. This is far from being an elegant solution. The user may have no idea why they are getting a "division by zero" error message. Perhaps they just pressed the "Save" button on their GUI. How are they suppose to handle this error? What are they suppose to do now?

A better way is to prevent the division altogether by including the if statement that check to see if count is non-negative. Perhaps returning zero is still not the best solution, but it is much more elegant than division by zero.

Debugging

Debugging should be done from the bottom up. The basic idea is that if you test each class and method individually, then it is a much easier problem and can be done faster. It may seem like it?s more work in the beginning, but it will save an enormous amount of time.If you spend a few moments concentrating on testing one method and can convince yourself that it is correct, then you have reduce the number of lines that remain to be tested. You will not have to repeatedly retest the same code. Also, when you find a problem, you will not have very much code to inspect to find the error and will therefore find it faster.

Unfortunately, the methods within a class are not independent and cannot completely be tested independently. You may only be able to partially test one method and then come back to it after you have tested another method, or you may have to test methods together. However, the incremental testing will speed up the process.

When you are ready to test a particular method do the following:

Comment out all other methods you have not already tested.
Be sure the method can handle invalid input (within reason). Are there assumptions about the data you can or cannot make?
Add print statements (which you should remove when you are done) to the method you are testing so that you can tell what is happening. When you have an error, you will end up adding them anyway, so you might as well take a moment to think about what information you will need to track down problems.
Compile the class (without the untested methods)
Create a main (or modify an existing one) that will test specifically this method. Be sure to test for the boundary conditions, not just the normal cases. Your objective is to find problems now instead of later, so test the method with incorrect input.

You should test a class by testing the methods in a particular order (usually from simplest to least simplest):

Test the constructors first. The main method should only instantiate objects using the different constructors and do nothing else.
Next you should test methods that will help you test other methods. For example, creating a print method that can display the data in a form that is easy to understand will allow you to see the modifications made by other methods.
Test the simple or trivial methods (isEmpty, isFull, makeEmpty, getValue, setValue, ?).
Then work your way up to the more complicated methods. You will probably have to go back and test some of the other methods again, or test them in combination. For example, a method called isEmpty, which is usually a trivial method, cannot be fully tested until a method like insert is working properly.

Once you have the subclasses working, develop the main method by using evolution. The basic idea is that functionality should be added and tested incrementally.

Start with a main method that does very little. Take an existing one and remove all the stuff in the middle. Compile it and run.
Add the reading and printing of the input. It should do nothing with the input except print it so that you know it is reading it correctly. The input can be either from a user or from a file.
Add the declarations and instantiations of the objects incrementally and compile and run after each.
Add the functionality incrementally and compile and run after each new step is added.


This page was last updated: August 18, 2003 Email:

[ Home ] [ Classes ] [ Research ] [ Links ] [ Biography ]