Recursion


 1-12  RECURSION
 ***************
 (Thanks to Dieter Britz for the excellent comments, and to Craig Burley
  for the corrections)

 A small glossary
 ----------------
 ACTIVATION RECORD Context associated with a given execution of a procedure
 CONTEXT           The state (values) of all variables at a given time
 ENTRY POINT       The place in the code where a routine starts executing
 INSTANCE          Another word for procedure activation
 INVOCATION        Calling a procedure
 STACK             A dynamic data structure (see below)
 STACK POINTER     A variable keeping the place of the stack's top
 STATIC MEMORY     Pre-allocated by the compiler (at compile time)


 Introduction
 ------------
 Recursion is calling a procedure from itself, directly (simple recursion)
 or via calls to other procedures (mutual recursion). FORTRAN 77 does not 
 allow recursion, Fortran 90 does, (recursive routines must be explicitly 
 declared so).

 Most FORTRAN 77 compilers allow recursion, some (e.g. DEC) require 
 using a compiler option (see compiler options chapter). The GNU g77,
 which conforms strictly to the Fortran 77 standard, doesn't allow 
 recursion at all.

 Recursion is usually less computationally efficient, compiler generated 
 procedure calls introduce overhead when storing all variables before a
 procedure call and loading them back upon entering the procedure.

 Automatic optimizations reduce the overhead associated with procedure
 calls by eliminating redundant store/load operations to some degree, 
 but of course hand optimizing may improve upon that. 

 Efficiency is usually not the most important consideration when
 choosing an algorithm, in the name of good programming we already 
 agreed to gladly suffer some measure of inefficiency.

 In spite of the inherent inefficacy, recursion is important as the 
 natural solution to some problems is a recursive algorithm, 
 for example: 

    1) Identifying pixels belonging to the same 'blob'
    2) Searching a tree-structured database

 Writing a program using a recursive algorithm is usually difficult, 
 every small mistake, and in a recursive procedure, almost every 
 modification is a mistake, renders the procedure useless.

 If your compiler doesn't support recursion, or you want to write a
 fully portable program, or you want your program to be more efficient,
 you can implement a recursive algorithm by manual simulation.


 Simultaneous invocations of a procedure 
 ---------------------------------------
 Why the execution of recursive procedures requires special treatment?
 Non-recursive programs have the property that at any given time many 
 procedures may be 'active', but only one 'instance' of a given procedure 
 is 'active'. 

 That means that non-recursive programs can be implemented STATICALLY, 
 i.e. each of the various variables (and arrays) used in the program are 
 assigned a pre-determined location in main memory at compile time 
 (subject to relocation by the link editor).
 
 The characteristic property of recursion is that different invocation
 of the same procedure are active at the same time, and the variables
 associated with each invocation must be kept separately, so they may
 hold different values.

 We need to keep more than one copy of the procedure set of variables 
 (the code may be shared), one for each possible activation. Clearly
 the simple static memory allocation scheme fails for recursive procedures,
 and something more elaborate is needed. 


 Simple-minded implementation of recursion
 -----------------------------------------
 Let's say we want to implement a procedure that call itself, and
 for some reason or other we know that the 'call depth' is less than
 say 100 activations.

 We create 100 copies of the procedure, name them in some way to keep 
 the names unique, e.g. SUB001, SUB002, ... ,SUB100, the CALL statements
 inside each copy will refer to the next one, i.e. SUB001 will call
 SUB002, SUB002 will call SUB003, ... and instead of the CALL statement
 in SUB100 we will put a  STOP 'Not enough copies '.

 In this way we can perform recursion with a compiler that supports
 only a static memory allocation scheme. Of course this method is very
 inefficient and wasteful, We don't need to duplicate the procedure's 
 code, only the data part (activation record). 

 The solution to the problem of managing several 'activation records'
 at the same time is the STACK data structure.


 The stack data structure
 ------------------------
 A stack is a sequence of memory locations with an associated STACK POINTER
 that points to the 'current location', a stack can be manipulated by two 
 operations: PUSH and POP. 

 Stacks (like main-memory) are really one-dimensional entities, they go 
 from location 0 to location N, N is supposed to be large enough for the 
 task on hand ('real' stacks may go from -N to 0). 

 All locations 'below' the current value of the stack pointer are 
 considered to contain valid values, all locations above are considered 
 'garbage' left from previous operations. The stack can 'grow' into 
 the garbage area if needed.

 PUSH increments the stack pointer, and puts a value at the location 
 the stack pointer points to, so it will point to the next available
 place on the stack.

 POP takes the value pointed to by the stack pointer, and decrements
 the stack pointer, thus 'trashing' that location.

    0006   |013|
    0005   |140|   GARBAGE AREA
    0004   |112|
    0003   |358|<----Stack-pointer
    0002   |871|
    0001   |348|   VALID DATA AREA
    0000   |231|
           +---+
  Address  Data


 Placing activation records on the stack
 ---------------------------------------
 An ACTIVATION RECORD is a memory region holding the variables associated
 with a certain activation (instance) of a procedure, and some control
 information. Activation records are placed on the stack in the order 
 of activation.


 Recursion simulation
 --------------------
 You can simulate recursion in FORTRAN 77, in an efficient way. The
 warnings about the sensitivity of recursive procedures to programming
 errors, apply even more to simulated recursion.

 The general idea behind recursion simulation is to replace the recursion 
 with some suitable DO loop. 

 The straightforward method to simulate recursion is to closely imitate 
 the way compilers implement recursive procedure calls, that way we will 
 get a bulky and inefficient code, that can be manually optimized.

 With enough experience you can do it more informally, replace the call 
 to the recursive procedure with a GOTO and add the necessary data 
 structures as needed.

 Remember that the recursive algorithm may be equivalent to a simple
 iterative algorithm, and imitating a compiler may hide that simple
 algorithm behind redundant complexity.


 Implementation of recursion simulation: general scheme
 ------------------------------------------------------
 This is a general scheme for simulating one procedure calling
 itself (simple recursion). See the following text for explanation 
 of the scheme.


       Data-declarations (Of Modified data structures)
       .......................................
       stack-pointer = 1
       Initialize the stack (First activation)
    Entry-point:
       Read local variables from the stack
       .......................................
       .-----------------------------------------.
       |  A recursive call:                      |
       |     stack-pointer = stack-pointer + 1   |
       |     Store local variables on the stack  |
       |     GOTO Entry-point                    |
       '-----------------------------------------'
       .......................................
       stack-pointer = stack-pointer - 1
    IF (stack-pointer .GE. 1) GOTO Entry-point       (exit point!)


 Implementation of recursion simulation: data structures
 -------------------------------------------------------
 We need to keep the value of each variable for each activation of the
 procedure, so it's clear we should replace every scalar variable with
 a 1D array of the same data type (REAL etc.), and size large enough
 to hold all expected activations. The array will be indexed by the
 'call depth', the first activation will have index 1, the second and
 third will be 2,3...

 Similarly we will replace a 1D array by a 2D array, a 2D array by a 
 3D array, etc:

      REAL	scalar, array(10)

  Will be replaced by:

      INTEGER	MAXACT
      PARAMETER(MAXACT = 200)
      REAL	scalar, Xscalar(MAXACT), array(10), Xarray(10, MAXACT)

 We add the expanded variables to the original ones, the original
 variables will serve as 'local' variables (we will implement a
 call by value parameter passing mechanism), the expended variables
 taken together will serve as our simulated stack.

 The added dimension is put last, it's more efficient that way because 
 of FORTRAN row major storage order of arrays.


 Implementation of recursion simulation: stack pointer
 -----------------------------------------------------
 Our replacement for the stack pointer is an INTEGER variable holding
 the value of the 'call depth', with possible values ranging from 1 
 (first simulated activation) to MAXACT.

 On starting a new simulated activation the value of the 'stack pointer',
 will be incremented by 1, and used to index the modified procedure data
 structures (simulated stack).


 Implementation of recursion simulation: entry point
 ---------------------------------------------------
 After passing the simulated entry point, all variables are read from 
 the simulated stack. This step is necessary, as program control can 
 flow to the entry point from the 'exit point' (the conditional GOTO 
 at the end of the code).

 If the exit point transfers control to the entry point, we have to 
 read the variables from the stack using the decremented stack-pointer,
 the current values are no more good.

 The scheme can be improved to skip reading the variables if the
 current values are valid, i.e. the stack-pointer was not decremented.


 Implementation of recursion simulation: recursive call
 ------------------------------------------------------
 Instead of a 'recursive' CALL statement we will use a series of
 FORTRAN statements:

    1) Incrementing the stack-pointer by 1.
    2) Several assignments that will store all variables on the 
       stack, thus creating a simulated  'activation record'.
    3) Unconditional GOTO to the 'entry point'. 

 
 Implementation of recursion simulation: exit point
 --------------------------------------------------
 Recursion (simulated or not) must stop sometimes, the call depth may
 go up and down while executing, but in the end it must drop below 1.

 The natural exit point is at the end of the code, reaching the end
 means we finished an activation, so we decrement the stack-pointer.

 If the stack-pointer is zero, it means everything is over, otherwise
 we loop again to the entry point.


 Implementation of recursion simulation: termination condition
 -------------------------------------------------------------
 We exit the recursion when the call depth (value of stack-pointer) 
 drops below 1, something in the computation have to make the call
 depth eventually drop to zero.

 The termination condition of the recursion is a delicate matter,
 in our blobs example we mark more and more elements of matrix B,
 the marked elements keep a record of the places we have already
 visited, as we don't visit places we have already visited, we avoid
 the trap of going after blob points in circles.

 That is a subtle and important problem in multi-dimensional searches,
 because we manage to avoid it with the help of the history matrix B,
 we will eventually get to the end of the recursion. 


 Example: Identifying 'blobs'
 ----------------------------
 MATRIX(M,N) is the input array, elements larger than TRSH are 'blob' 
 pixels, the output array is BLOBS(M,N), when the double loop completes, 
 every element in BLOBS will contain the serial number of the 'blob' it 
 belongs to (or stay zero).

 MAXSTACK should be raised if necessary, it should be on the order 
 of M*N, to allow large blobs. 

 This is a corrected version of the procedure, to be more portable
 and understandable it was made a bit longer than necessary.

C     ------------------------------------------------------------------
C     |  Procedure:     Identify and mark "blobs" in a matrix.
C     |                 Input is matrix A, matrix B gives for each point, 
C     |                 the serial number of the blob it belongs to. 
C     |  Author:        Abraham Agay
C     |  Update:        
C     ------------------------------------------------------------------
      SUBROUTINE BLOBER(MATRIX, BLOBS, M, N, TRSH, CURBLB)
      IMPLICIT NONE
C     ------------------------------------------------------------------
      INTEGER
     *                  M, N, BLOBS(M,N), CURBLB, 
     *                  I, J, II, JJ,
     *                  SP, MAXSTACK
      REAL              MATRIX(M,N), TRSH
      PARAMETER         (MAXSTACK = 100)
      INTEGER           STACK(2,MAXSTACK)
      LOGICAL           NEWGOOD, GOAHEAD
C     ------------------------------------------------------------------
      NEWGOOD(I,J) = (MATRIX(I,J) .GE. TRSH) .AND. (BLOBS(I,J) .EQ. 0)
C     ==================================================================
      CURBLB = 0
      DO J = 1, N
        DO I = 1, M
          BLOBS(I,J) = 0
        ENDDO
      ENDDO
C     ------------------------------------------------------------------
      DO J = 1, N
        DO I = 1, M
          IF (.NOT. NEWGOOD(I,J)) GOTO 200 
          II = I
          JJ = J
          SP = 0
          CURBLB = CURBLB + 1
          GOAHEAD = .TRUE.
C     ------------------------------------------------------------------
100       CONTINUE
            IF (GOAHEAD) THEN
              SP = SP + 1
              IF (SP .GT. MAXSTACK) STOP 'BLOBER: Increase stack size! '
              STACK(1,SP) = II
              STACK(2,SP) = JJ
              IF (MATRIX(II,JJ) .GE. TRSH) BLOBS(II,JJ) = CURBLB
            ELSE
              SP = SP - 1
              IF (SP .LE. 0) GOTO 200
              II = STACK(1,SP) 
              JJ = STACK(2,SP) 
              GOAHEAD = .TRUE.
            ENDIF
C     ------------------------------------------------------------------
            IF (II .LT. M) THEN
              IF (NEWGOOD(II+1,JJ)) THEN
                II = II + 1
                GOTO 100
              ENDIF
            ENDIF
C     ------------------------------------------------------------------
            IF (JJ .LT. N) THEN
              IF (NEWGOOD(II,JJ+1)) THEN
                JJ = JJ + 1
                GOTO 100
              ENDIF
            ENDIF
C     ------------------------------------------------------------------
            IF (II .GT. 1) THEN
              IF (NEWGOOD(II-1,JJ)) THEN
                II = II - 1
                GOTO 100
              ENDIF
            ENDIF
C     ------------------------------------------------------------------
            IF (JJ .GT. 1) THEN
              IF (NEWGOOD(II,JJ-1)) THEN
                JJ = JJ - 1
                GOTO 100
              ENDIF
            ENDIF
C     ------------------------------------------------------------------
            GOAHEAD = .FALSE.
          GOTO 100
C     ------------------------------------------------------------------
200       CONTINUE
        ENDDO
      ENDDO
      RETURN
      END


 By the way, there is a good non-recursive algorithm by Yoav Levy 
 of the Hebrew University of Jerusalem (mailto:yoavl@vms.huji.ac.il).


 Bibliography
 ------------
 On simulating recursion:

    Data structures using Pascal (2nd edition)
    Aaron M. Tenenbaum, Moshe J. Augenstein
    Prentice-Hall 1986
    ISBN 0-13-196684-7

 On computer architecture:

    Computer Organization, 3rd edition 
    V. Carl Hamacher, Zvonko G. Vranesic, Safwat G. Zaky
    McGraw-Hill 1984
    ISBN 0-07-Y66313-0 (2nd edition)
Return to contents page