Program layout


 1-6  PROGRAM LAYOUT - THE ART OF MAKING PROGRAMS READABLE 
 *********************************************************

 (Thanks to Arne Vajhoej and Clive Page for the good suggestions
 and comments, Kenneth Plotkin for the good comments, and of
 course to Craig Burley)


 Program layout is the art of arranging program code in a READABLE 
 and EASY TO EDIT way. 

 Some practical methods to improve layout are: 

    1) Indenting control structures
    2) Separating functional units 
    3) Choosing good identifiers
    4) Adding comments, procedure headings etc


 Indentation of control structures
 ---------------------------------
 Control constructs should be properly indented to reveal the 
 internal structure. Avoid an excessive indentation step that 
 may take you too quickly to the last allowed column. 
 Recommended step sizes are in the range 2-4.

 This form:

      DO 200 I = 1, 100
        DO 100 J = 1, 100
          WRITE(*,*) I, J
100     CONTINUE
200   CONTINUE


 Is surely more readable than:

      DO 200 I = 1, 100
      DO 100 J = 1, 100
      WRITE(*,*) I, J
100   CONTINUE
200   CONTINUE


 It is advisable (for good programming style and aiding automatic 
 optimization) to avoid statement labels and GOTOs as much as possible, 
 if you must have them, make the label numbers strictly increasing.

 Maybe the time has come to use DO ... END DO and other semi-standard
 constructs? Most FORTRAN 77 compilers support them, and of course
 all the Fortran 90 compilers. 

 If you must ensure absolute code portability, always use CONTINUE as 
 the terminal statement and only _one_ loop per terminal CONTINUE 
 statement, as shown above.

 It is easy to say that you should avoid too deep nesting of control 
 structures inside each other and getting close to the last allowed 
 column. It is not always possible to follow such advice.


 Separating functional units
 ---------------------------
 An important concept that is not used in the Fortran standards, 
 nor in many Fortran texts, but is a popular term in the compiler 
 arena, is the BASIC BLOCK.

 A BASIC BLOCK is a sequence of consecutive language statements 
 with the following properties:

    1) Program flow doesn't jump into or from that block except 
       maybe to the first statement or from the last statement.

       A simple example of such a jump is executing any GOTO 
       statement, but IF and DO statements are really constructed 
       out of similar implied jumps.

    2) The sequence is maximal in the sense that adding one 
       more statement at the beginning of the block, or one 
       more statement at the end, will make the previous 
       requirement false.

 When you examine the way program control flows, the basic blocks 
 are the natural building blocks, so making their boundaries stand 
 out will make the code easier to follow.

 Simple ways to mark basic block boundaries are:

    1) Using a separator line composed of:

         {'*' or 'C' or '!' in column 1}  //  {6*SPACE}  //
         {dash|equal|asterisk line from column 7 to column 72}

    2) An empty line

 However, the basic block concept should be used only as a guide, 
 you may add additional separator lines if beneficial, or drop them.

 In some cases the separator lines creates visual clutter, and
 should be dropped:

   1) Block IF statements actually contain two basic blocks, 
      with implicit conditional jumps selecting between them. 
      However if the statements are properly indented the 
      indentation does the job, and there is no need for
      separating lines.

      IF (...) THEN 
        .......
      ELSE 
        .......
      ENDIF

   2) Similar considerations apply to nested DO loops.

      DO (...)
        DO (...)
          .......
        END DO
      END DO


 Limitations on identifiers
 --------------------------
 FORTRAN 77 imposes a limit of six characters, many FORTRAN 77 
 compilers (and of course fortran 90) allow longer names.

 It is not likely that new programs will be ported to a compiler 
 that doesn't support longer names, so maybe it's time to forget
 this old restriction.

 It is interesting to compare FORTRAN 77 with the C Standard that
 came out much later, in that standard you can use identifiers at
 least 509 characters long but: 

   1) Names used only internally (by the compiler) have to
      be distinct in the first 31 characters

   2) Names used externally (also seen by the linker) have 
      to be distinct in the first 6 characters, and be so 
      even ignoring the case of letters (sounds familiar?)
   
 The '6 characters with no case' limit ensures that linkers on 
 all systems (remember that the linker is basic system software 
 and NOT specific to the language you use) can do their job.

 Another common restriction is that names should begin with a 
 letter and contain alphanumeric characters (underscore is
 considered a letter in C). 


 Choosing good identifiers
 -------------------------
 Identifiers should be meaningful and describe the variable, constant, 
 procedure, etc that they represent. 

 Inventing good names may be difficult, even if you can use long
 identifiers, when you try to specify the role of a certain variable 
 you run a lot of times into 4-word (and more) names, for example: 

   input_buffer_start_index,  input_buffer_end_index

 Of course you cannot put many such identifiers into the [7,72]
 column range of FORTRAN 77, and you don't really want to. 

 Highly modular programs usually declare few variables in each
 procedure, and have less need for long and complicated identifiers 
 to characterize them.


 Classical abbreviation method
 -----------------------------
 A possible solution is combining together two different methods 
 each of them capable of shortening a long name:

    1) Separately truncate/abbreviate each part of 
       the name, e.g.

          input   -->   in
          output  -->   out
          buffer  -->   buf
          begin   -->   beg
          index   -->   ix    (a little drastic, but still a mnemonic)
          number  -->   num
          error   -->   err

    2) Drop 'non-essential' letters (a, e, i, o, u, y)

          next    -->   nxt
          keyword -->   kwrd
          format  -->   frmt

    The result of both transformations is:

          in_buf_beg_ix, in_buf_end_ix

    The underscores are usually omitted in names composed 
    of two parts:

           inbuf, begix 

    The identifier can be further shortened if you use
    capitalization instead of underscores: 

           InBufBegIx, InBufEndIx

    These methods are usually supplemented by some 
    conventions: 

    1) A prefix 'n' means 'number of' or 'size of' 
       e.g. nbits - number of bits

    2) The letters I, J, K are reserved for loop 
       control variables, array indexes and 
       sub-string indexes



 A tentative naming convention
 -----------------------------
 Naming conventions are beneficial, because they provide programs with 
 a sense of continuity and style. 

 Many C programmers use the Hungarian naming convention which is highly 
 oriented towards the computer architecture, and encodes mainly info
 about the variable's data type.

 However, FORTRAN has a different character, it is more problem oriented, 
 so a FORTRAN naming convention MUST BE SPECIFICALLY ADAPTED to the 
 specific requirements of every program.

 A possible starting point may be the following:

   Syntax:      XYY_TEXTn

   Where:       X =       G   General-purpose
                          I   Input
                          O   Output
                          S   System
                          U   User-supplied (Interactively)
                
                Y =       B   Buffer
                          F   Format  (Or file-name?)
                          G   General-purpose
                          P   Array index (Pointer)
                          T   Text
                
                _TEXTn    modifier e.g. len, top, bot, with 
                          possibly one digit at the end.
                          (if lacking identifier is general purpose)

   Case usage 
   ----------
   FORTRAN KEYWORDS   (IF, SUBROUTINE)
   constants          (pi, solar_const)
   Variables          (IP_Start, IP_End)
   Global Names       (COMMON /Blk1/, CALL MySub)


 Remember that the only global names in FORTRAN are procedure and 
 common block names (and logical unit numbers).

 Using this naming convention the two 4-word identifiers we had
 above will become: 

   IP_Start, IP_End


 Some more layout techniques (FORTRAN 77 oriented)
 -------------------------------------------------
 A small example program:

      PROGRAM IDEXMP 
      INTEGER BUS_NUM
      BUS_NUM = 99
      WRITE(*,*) ' TAKE BUS ', BUS_NUM
      END


 can be transformed into:

C     +-----------------------------------------------------------------
C     | Program:        Bus number advice
C     | Author:         Abraham Agay
C     | Date:           28.11.1995
C     +-----------------------------------------------------------------
      PROGRAM IdExmp
C     ------------------------------------------------------------------
      INTEGER 
     *                  Bus_Num
C     ------------------------------------------------------------------
      Bus_Num = 99
      WRITE(*,*) ' Take bus ', Bus_Num
C     ------------------------------------------------------------------
      END


 For a small program this layout technique just adds "visual clutter", 
 however in a large program with long variable lists and many code 
 sections it may really help.

 Using continuation lines in declarations makes it easy to edit 
 variable lists (The FORTRAN standard allows up to 19 continuation 
 lines - 20 lines total).

 The separating comment lines end at column 72, so you can see at 
 a glance if some line is longer than the FORTRAN maximum of 72 
 columns. Amateur programmers do that mistake a lot and sometimes 
 it's hard to trace (when professionals do it, it's even harder...).

 Many compilers allow code extending beyond column 72 (usually with
 a suitable compiler option), but it is not standard. By default 
 characters beyond column 72 are ignored by most standard-conforming 
 compilers. 

 By the way, the presence of TAB characters may create confusion
 when counting characters (e.g. to determine the line width), 
 editing programs usually interpret TABs as taking place up to 
 the next tab-stop, and display them accordingly. 

 You can define some key combination that will make your 
 editor insert such a separating line in your code.


 Separating lines
 ----------------
 You can use various types of separating lines to mark
 the the executable part of the code, important loops etc.
                
C     +-----------------------------------------------------------------
C     | Program:        Bus number advice
C     | Author:         Abraham Agay
C     | Date:           28.11.1995
C     +-----------------------------------------------------------------
      PROGRAM IdExmp
C     ------------------------------------------------------------------
      INTEGER 
     *                  Bus_Num
C     ==================================================================
      Bus_Num = 99
      WRITE(*,*) ' Take bus ', Bus_Num
C     ==================================================================
      END


 Adding comments
 ---------------
 A recommended way to add little comments to a FORTRAN program 
 is putting an '!' at column 73 and typing the comment after it.
 Using '!' to begin a comment is recognized by many compilers, 
 and anything typed after column 72 is supposed to be ignored.
 However, a strict-checking compiler (maybe with some option) 
 may flag such practice as a warning-level error.

 Your editor should be enabled to display lines longer than 80 
 characters. Some people don't like to work with longer lines 
 because on text terminals you usually get smaller fonts.

 This commenting style is portable IF YOU CONVERT ALL TABS TO SPACES.


 programming styles
 ------------------
 Most FORTRAN 77 compilers (and all Fortran 90) accept lowercase
 letters, and treat them as equivalent to the corresponding 
 uppercase ones. This is useful, as lowercase letters are more 
 readable than uppercase.

 You can write Fortran code in a way that reminds one of 
 C code:

      integer function strlen(st)
      integer		i
      character		st*(*)
      i = len(st)
      do while (st(i:i) .eq. ' ')
        i = i - 1
      enddo
      strlen = i
      return
      end

      integer function strchr(st,ch)
      integer		len, i, strlen
      character		st*(*), ch*1
      external		strlen
      len = strlen(st)
      i = 1 
      do while ((i .le. len) .and. (st(i:i) .ne. ch))
        i = i + 1
      enddo
      strchr = i
      return
      end


  +------------------------------------------+
  |  IMPROVING PROGRAM LAYOUT IS IMPORTANT!  |
  +------------------------------------------+
Return to contents page