1-6 PROGRAM LAYOUT - THE ART OF MAKING PROGRAMS READABLE
*********************************************************
(Thanks to Arne Vajhoej and Clive Page for the good suggestions
and comments, Kenneth Plotkin for the good comments, and of
course to Craig Burley)
Program layout is the art of arranging program code in a READABLE
and EASY TO EDIT way.
Some practical methods to improve layout are:
1) Indenting control structures
2) Separating functional units
3) Choosing good identifiers
4) Adding comments, procedure headings etc
Indentation of control structures
---------------------------------
Control constructs should be properly indented to reveal the
internal structure. Avoid an excessive indentation step that
may take you too quickly to the last allowed column.
Recommended step sizes are in the range 2-4.
This form:
DO 200 I = 1, 100
DO 100 J = 1, 100
WRITE(*,*) I, J
100 CONTINUE
200 CONTINUE
Is surely more readable than:
DO 200 I = 1, 100
DO 100 J = 1, 100
WRITE(*,*) I, J
100 CONTINUE
200 CONTINUE
It is advisable (for good programming style and aiding automatic
optimization) to avoid statement labels and GOTOs as much as possible,
if you must have them, make the label numbers strictly increasing.
Maybe the time has come to use DO ... END DO and other semi-standard
constructs? Most FORTRAN 77 compilers support them, and of course
all the Fortran 90 compilers.
If you must ensure absolute code portability, always use CONTINUE as
the terminal statement and only _one_ loop per terminal CONTINUE
statement, as shown above.
It is easy to say that you should avoid too deep nesting of control
structures inside each other and getting close to the last allowed
column. It is not always possible to follow such advice.
Separating functional units
---------------------------
An important concept that is not used in the Fortran standards,
nor in many Fortran texts, but is a popular term in the compiler
arena, is the BASIC BLOCK.
A BASIC BLOCK is a sequence of consecutive language statements
with the following properties:
1) Program flow doesn't jump into or from that block except
maybe to the first statement or from the last statement.
A simple example of such a jump is executing any GOTO
statement, but IF and DO statements are really constructed
out of similar implied jumps.
2) The sequence is maximal in the sense that adding one
more statement at the beginning of the block, or one
more statement at the end, will make the previous
requirement false.
When you examine the way program control flows, the basic blocks
are the natural building blocks, so making their boundaries stand
out will make the code easier to follow.
Simple ways to mark basic block boundaries are:
1) Using a separator line composed of:
{'*' or 'C' or '!' in column 1} // {6*SPACE} //
{dash|equal|asterisk line from column 7 to column 72}
2) An empty line
However, the basic block concept should be used only as a guide,
you may add additional separator lines if beneficial, or drop them.
In some cases the separator lines creates visual clutter, and
should be dropped:
1) Block IF statements actually contain two basic blocks,
with implicit conditional jumps selecting between them.
However if the statements are properly indented the
indentation does the job, and there is no need for
separating lines.
IF (...) THEN
.......
ELSE
.......
ENDIF
2) Similar considerations apply to nested DO loops.
DO (...)
DO (...)
.......
END DO
END DO
Limitations on identifiers
--------------------------
FORTRAN 77 imposes a limit of six characters, many FORTRAN 77
compilers (and of course fortran 90) allow longer names.
It is not likely that new programs will be ported to a compiler
that doesn't support longer names, so maybe it's time to forget
this old restriction.
It is interesting to compare FORTRAN 77 with the C Standard that
came out much later, in that standard you can use identifiers at
least 509 characters long but:
1) Names used only internally (by the compiler) have to
be distinct in the first 31 characters
2) Names used externally (also seen by the linker) have
to be distinct in the first 6 characters, and be so
even ignoring the case of letters (sounds familiar?)
The '6 characters with no case' limit ensures that linkers on
all systems (remember that the linker is basic system software
and NOT specific to the language you use) can do their job.
Another common restriction is that names should begin with a
letter and contain alphanumeric characters (underscore is
considered a letter in C).
Choosing good identifiers
-------------------------
Identifiers should be meaningful and describe the variable, constant,
procedure, etc that they represent.
Inventing good names may be difficult, even if you can use long
identifiers, when you try to specify the role of a certain variable
you run a lot of times into 4-word (and more) names, for example:
input_buffer_start_index, input_buffer_end_index
Of course you cannot put many such identifiers into the [7,72]
column range of FORTRAN 77, and you don't really want to.
Highly modular programs usually declare few variables in each
procedure, and have less need for long and complicated identifiers
to characterize them.
Classical abbreviation method
-----------------------------
A possible solution is combining together two different methods
each of them capable of shortening a long name:
1) Separately truncate/abbreviate each part of
the name, e.g.
input --> in
output --> out
buffer --> buf
begin --> beg
index --> ix (a little drastic, but still a mnemonic)
number --> num
error --> err
2) Drop 'non-essential' letters (a, e, i, o, u, y)
next --> nxt
keyword --> kwrd
format --> frmt
The result of both transformations is:
in_buf_beg_ix, in_buf_end_ix
The underscores are usually omitted in names composed
of two parts:
inbuf, begix
The identifier can be further shortened if you use
capitalization instead of underscores:
InBufBegIx, InBufEndIx
These methods are usually supplemented by some
conventions:
1) A prefix 'n' means 'number of' or 'size of'
e.g. nbits - number of bits
2) The letters I, J, K are reserved for loop
control variables, array indexes and
sub-string indexes
A tentative naming convention
-----------------------------
Naming conventions are beneficial, because they provide programs with
a sense of continuity and style.
Many C programmers use the Hungarian naming convention which is highly
oriented towards the computer architecture, and encodes mainly info
about the variable's data type.
However, FORTRAN has a different character, it is more problem oriented,
so a FORTRAN naming convention MUST BE SPECIFICALLY ADAPTED to the
specific requirements of every program.
A possible starting point may be the following:
Syntax: XYY_TEXTn
Where: X = G General-purpose
I Input
O Output
S System
U User-supplied (Interactively)
Y = B Buffer
F Format (Or file-name?)
G General-purpose
P Array index (Pointer)
T Text
_TEXTn modifier e.g. len, top, bot, with
possibly one digit at the end.
(if lacking identifier is general purpose)
Case usage
----------
FORTRAN KEYWORDS (IF, SUBROUTINE)
constants (pi, solar_const)
Variables (IP_Start, IP_End)
Global Names (COMMON /Blk1/, CALL MySub)
Remember that the only global names in FORTRAN are procedure and
common block names (and logical unit numbers).
Using this naming convention the two 4-word identifiers we had
above will become:
IP_Start, IP_End
Some more layout techniques (FORTRAN 77 oriented)
-------------------------------------------------
A small example program:
PROGRAM IDEXMP
INTEGER BUS_NUM
BUS_NUM = 99
WRITE(*,*) ' TAKE BUS ', BUS_NUM
END
can be transformed into:
C +-----------------------------------------------------------------
C | Program: Bus number advice
C | Author: Abraham Agay
C | Date: 28.11.1995
C +-----------------------------------------------------------------
PROGRAM IdExmp
C ------------------------------------------------------------------
INTEGER
* Bus_Num
C ------------------------------------------------------------------
Bus_Num = 99
WRITE(*,*) ' Take bus ', Bus_Num
C ------------------------------------------------------------------
END
For a small program this layout technique just adds "visual clutter",
however in a large program with long variable lists and many code
sections it may really help.
Using continuation lines in declarations makes it easy to edit
variable lists (The FORTRAN standard allows up to 19 continuation
lines - 20 lines total).
The separating comment lines end at column 72, so you can see at
a glance if some line is longer than the FORTRAN maximum of 72
columns. Amateur programmers do that mistake a lot and sometimes
it's hard to trace (when professionals do it, it's even harder...).
Many compilers allow code extending beyond column 72 (usually with
a suitable compiler option), but it is not standard. By default
characters beyond column 72 are ignored by most standard-conforming
compilers.
By the way, the presence of TAB characters may create confusion
when counting characters (e.g. to determine the line width),
editing programs usually interpret TABs as taking place up to
the next tab-stop, and display them accordingly.
You can define some key combination that will make your
editor insert such a separating line in your code.
Separating lines
----------------
You can use various types of separating lines to mark
the the executable part of the code, important loops etc.
C +-----------------------------------------------------------------
C | Program: Bus number advice
C | Author: Abraham Agay
C | Date: 28.11.1995
C +-----------------------------------------------------------------
PROGRAM IdExmp
C ------------------------------------------------------------------
INTEGER
* Bus_Num
C ==================================================================
Bus_Num = 99
WRITE(*,*) ' Take bus ', Bus_Num
C ==================================================================
END
Adding comments
---------------
A recommended way to add little comments to a FORTRAN program
is putting an '!' at column 73 and typing the comment after it.
Using '!' to begin a comment is recognized by many compilers,
and anything typed after column 72 is supposed to be ignored.
However, a strict-checking compiler (maybe with some option)
may flag such practice as a warning-level error.
Your editor should be enabled to display lines longer than 80
characters. Some people don't like to work with longer lines
because on text terminals you usually get smaller fonts.
This commenting style is portable IF YOU CONVERT ALL TABS TO SPACES.
programming styles
------------------
Most FORTRAN 77 compilers (and all Fortran 90) accept lowercase
letters, and treat them as equivalent to the corresponding
uppercase ones. This is useful, as lowercase letters are more
readable than uppercase.
You can write Fortran code in a way that reminds one of
C code:
integer function strlen(st)
integer i
character st*(*)
i = len(st)
do while (st(i:i) .eq. ' ')
i = i - 1
enddo
strlen = i
return
end
integer function strchr(st,ch)
integer len, i, strlen
character st*(*), ch*1
external strlen
len = strlen(st)
i = 1
do while ((i .le. len) .and. (st(i:i) .ne. ch))
i = i + 1
enddo
strchr = i
return
end
+------------------------------------------+
| IMPROVING PROGRAM LAYOUT IS IMPORTANT! |
+------------------------------------------+
Return to contents page