Copyright (C) 1998 Timothy C. Prince
Freely distributable with acknowledgment
branch history and prediction schemes: Uht, Sindagi, Somanathan "Branch Effect Reduction Techniques" IEEE Computer May 1997 pp 71-81.
cache prefetch:
Vander Wiel, Lilja "When Caches Aren't Enough: Data Prefetching Techniques:
IEEE Computer July '97 pp 23-30.
celefunt: Cody's accuracy test suite for FORTRAN complex math functions netlib/toms714. Quite useful in its standard form, although not written for extended precision (like Intel).
directives: "Visual KAP for OpenMP User's Manual" www.kai.com/vkomp
divide/sqrt hardware techniques:
Soderquist, Leeser "Division and Square Root ..."
IEEE Micro July/Aug'97 pp 56-66
egcs: directories under ftp.cygnus.com and many mirror sites
elefunt: Accuracy test suite for FORTRAN math functions. Has some
portability problems (runs but results not right). Translated to C by
Plauger and further modified by Prince. Copyright by Plauger, possibly
available with permission.
f77/f90 comparison:
Einarsson, Shokin "Fortran 90 for the Fortran 77 Programmer"
http://www.nsc.liu.se.~boein/f77tof90
Computational Science Education Project "Fortran 90 and Computational Science".
f90 tutorial: Metcalf http://wwwcn.cern.ch/asdoc/WWW/f90/
Patrick Corde, Herve Delouis "Cours Fortran 90" idris.fr
f95 compilers and netlib software: many listed on www.fortran.com/fortran
look for modernized versions of netlib software elsewhere
e.g. http://www.vic.cmis.csiro.au/~alan
f95: FORTRAN 95 Handbook, Adams, Brainerd et al MIT Press 1997 ISBN0-262-51096-0.
fused MAC effects etc:
http://http.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps
Note that Kahan's quadratic code for fused MAC is not satisfactorily
programmable in standard FORTRAN, but can be done reasonably in C.
g77: gnu or egcs mirror sites; CD versions tend to be out of date.
HP PA-8000:
Kumar "the HP PA-8000 RISC CPU" IEEE Micro Mar/Apr '97 pp 27-32.
IEEE P754/854: Cody, IEEE Micro Aug. 1984 pp 84-100.
Intel Pentium Pro: Papworth "Tuning the Pentium Pro.." IEEE Micro April 1996 pp 8-15; Bhandarkar and Ding "Performance Characterization of the Pentium Pro" distributed by Internet.
latency and instruction level parallelism, Newton and Goldschmidt schemes: Soderquist, Leeser "Division and Square Root..." IEEE Micro July 1997 pp 56-66.
Alan Miller's site for modernized netlib: http://www.ozemail.com.au/~milleraj
MIPS/SGI R10000: Yeager "The MIPS R10000.." IEEE Micro April 1996 pp28-40.
pipelining: Smith, Weiss " PowerPC 601 and Alpha 21064..." IEEE Computer,
June 1994 pp 46-58