100 time steps ++++++++++++++++++++++++++++++++++++++++++++++++++++ CPU mem CPUsec wall run machine model MHz MB comp /step hh:mm date -------- ------------------ ---- ---- ------ ----- ------- ------- mars Intel Xeon i7x4 3600 24GB gfort 3.5 0:06 26jun15 ares Intel Xeon i7x4 3600 16GB gfort 6.1 0:06 24aug15 darter Cray XC30 XeonE5 2600 2GB Cray 4.6 0:07 19sep15 householder Intel Xeon 10 3000 192GB gfort 4.8 0:08 26may14 frost Intel XeonE5440 2830 4GB ifort 10.8 0:18 3feb12 ThinkPad Intel core i5 2400 1.5G gfort 10.9 0:18 30apr11 midtown AMD opteron2376 2300 2GB ifort 15.0 0:25 6feb10 zeus AMD opteron2376 2300 2GB ifort 15.0 0:25 6feb10 newton Intel Xeon_64 3200 4GB ifort 18.6 0:31 5may06 ares AMD Opteron2220 2800 16GB gfort 18.8 0:32 30apr11 zeus AMD opteron252 2600 2GB g77 21.5 0:36 13apr06 zeus AMD opteron252 2600 2GB pgf95 21.9 0:37 13apr06 tiger Cray opteron248 2200 8GB pgf90 23.7 0:40 dec07 tiger Cray opteron248 2200 8GB g77 24.0 0:40 dec07 oic Intel Xeon_64 3400 4GB ifort 27.2 0:45 5may06 cheetah IBM SP p690 pwr4 1300 1GB xlf 28.9 0:48 16nov02 fubini Intel P4 Xeon 3056 4GB ifort 33.6 0:56 13oct03 hawk AMD opteron242 1596 2GB g77 31.8 0:55 25jan05 hawk Amd opteron242 1596 2GB ifort 34.1 0:57 25jan05 frodo AMD opteron240 1396 2GB g77 40.4 1:07 29oct04 agnesi Intel P4 Xeon 2200 4GB ifort 41.2 1:08 13nov02 abcd Intel P4 Xeon 3200 4GB ifort 52.1 1:26 28oct04 hawk AMD opteron242 1596 2GB pf90 53.3 1:31 29jan05 colt Alpha SC ev67 667 2GB f90 65.9 1:50 29apr01 knox3 Sun UltraSparc 900 1GB f77 76. 2:07 29apr01 barnard Sun ultra80 dual 450 1024 f77 162.9 6:52 24nov01 vxa Dell Latitude C600 752 256 g77 198.3 5:36 21nov01 capsicum SGI IP27 250 4096 f77 220.8 6:10 30nov01 goliath Intel PentiumIII 497 1028 f77 221.1 6:09 22nov01 larry Sun ultra4 296 2048 f77 263.9 7:20 23nov01 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 300 time steps ++++++++++++++++++++++++++++++++++++++++++++++++++++ CPU mem CPUsec wall run machine model MHz MB comp /step hh:mm date -------- ------------------ ---- ---- ----- ----- ------ ------- mars Intel Xeon i7x4 3600 24GB gfort 4.8 0:24 26jun15 ares Intel Xeon i7x4 3600 16GB gfort 5.2 0:26 24aug15 householder Intel Xeon 3000 192GB gfort 6.0 0:33 26may14 frost Intel XeonE5440 2830 4GB ifort 14.2 1:11 3feb12 ThinkPad Intel core i5 2400 1.5G gfort 13.4 1:27 30apr11 midtown AMD opteron2376 2300 2GB ifort 20.8 1:44 6feb10 zeus AMD opteron2376 2300 2GB ifort 22.6 1:53 6feb10 newton Intel Xeon_64 3200 4GB ifort 25.6 2:08 5may06 ares AMD Opteron2220 2800 16GB gfort 26.6 1:27 30apr11 zeus opteron 252 2600 2GB pgf95 31.3 2:37 14apr06 zeus opteron 252 2600 2GB g77 33.8 2:49 13apr06 tiger Cray opteron248 2200 8GB pgf90 30.0 2:30 dec07 tiger Cray opteron248 2200 8GB g77 33.5 2:48 dec07 cheetah IBM SP p690 pwr4 1300 1GB xlf 38.7 3:14 16nov02 fubini Intel P4 Xeon 3056 4GB ifort 42.9 3:35 13oct03 oic Intel Xeon_64 3400 4GB ifort 49.03 4:05 5may06 frodo opteron 240 1396 2GB g77 53.2 4:27 29oct04 agnesi Intel P4 Xeon 2200 4GB ifort 53.98 4:30 13nov02 hawk opteron 242 1596 2GB ifort 53.8 4:33 30jan05 hawk opteron 242 1596 2GB g77 55.9 4:58 25jan05 hawk opteron 242 1596 2GB pf90 78.1 6:35 30jan05 abcd Intel P4 Xeon 3200 4GB ifort 79.4 6:37 28oct04 colt Alpha SC ev67 667 2GB f90 95.77 7:59 2nov01 capsicum SGI IP27 250 4096 f77 219.15 18:21 1dec01 barnard Sun ultra80 450 1024 f77 219.15 27:54 27nov01 vxa Dell Latitude C600 752 261 g77 259.54 23:39 25nov01 goliath dual PentiumIII 497 1028 f77 296.61 24:43 22nov01 larry Sun ultra4 296 2048 f77 364.37 30:26 24nov01 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 400 time steps ++++++++++++++++++++++++++++++++++++++++++++++++ CPU mem CPUsec wall run machine model MHz MB comp /step hh:mm date -------- ------------------ ---- ---- ------ ----- ------- ------- mars Intel Xeon i7x4 3600 24GB gfort 5.7 0:37 26jun15 ares Intel Xeon i7x4 3600 16GB gfort 6.1 0:41 24aug15 darter Cray XC30 XeonE5 2600 2GB Cray 7.4 0:50 19sep15 householder Intel XeonE5 3000 192G gfort 7.7 0:51 26may14 darter Cray XC30 XeonE5 2600 2GB ifort 8.2 0:55 19sep15 frost Intel XeonE5440 2830 4GB ifort 16.2 1:48 3feb12 ThinkPad Intel core i5 2400 1.5G gfort 20.9 2:20 30apr11 midtown AMD opteron2376 2300 2GB ifort 23.2 2:35 6feb10 zeus AMD opteron2376 2300 2GB ifort 25.5 2:50 6feb10 newton Intel Xeon_64 3200 4GB ifort 29.8 3:19 6may06 zeus opteron 252 2600 2GB pgf95 34.8 3:52 14apr06 tiger Cray opteron248 2200 8GB g77 35.5 3:57 dec07 tiger Cray opteron248 2200 8GB pgf90 37.2 4:09 dec07 ares AMD Opteron2220 2800 16GB gfort 37.5 3:50 30apr11 zeus opteron 252 2600 2GB g77 38.6 4:18 13apr06 cheetah IBM SP p690 pwr4 1300 1GB xlf 46.70 5:11 17nov02 frodo opteron 240 1396 2GB g77 57.1 6:20 29oct04 fatou Intel P4 Xeon 3056 4GB ifort 55.80 6:12 1jul04 agnesi Intel P4 Xeon 2200 4GB ifort 61.48 6:50 14nov02 hawk opteron 242 1596 2GB ifort 65.07 7:14 30jan05 hawk opteron 242 1596 2GB g77 72.98 8:07 30jan05 abcd Intel P4 Xeon 3200 4GB ifort 75.8 8:26 28oct04 hawk opteron 242 1596 2GB pf90 79.48 8:51 30jan05 colt Alpha SC ev67 667 2GB f90 103.27 11:29 4nov01 barnard Sun ultra80 450 1024 f77 255.40 52:45 30nov01 capsicum SGI IP27 250 4096 f90 326.18 36:27 3dec01 goliath dual PentiumIII 497 1028 ifort 339.73 38:11 25nov01 larry Sun ultra4 296 2048 f77 425.70 47:21 2dec01 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 500 time steps ++++++++++++++++++++++++++++++++++++++++++++++++ CPU mem CPUsec wall run machine model MHz MB comp /step hh:mm date -------- ------------------ ---- ---- ------ ----- ------- ------- mars Intel Xeon i7x4 3600 24GB gfort 6.95 0:58 26jun15 ares Intel Xeon i7x4 3600 16GB gfort 7.4 1:02 24aug15 householder Intel XeonE5 3000 192G gfortr 10.1 1:24 26may14 frost Intel XeonE5440 2830 4GB ifort 19.3 2:41 3feb12 ThinkPad Intel core i5 2400 1.5G gfort 22.8 3:10 30apr11 midtown AMD opteron2376 2300 2GB ifort 27.5 3:50 6feb10 zeus AMD opteron2376 2300 2GB ifort 29.6 4:06 6feb10 newton Intel Xeon_64 3200 4GB ifort 38.1 5:17 6may06 zeus opteron 252 2600 2GB pgf95 42.4 5:56 14apr06 tiger Cray opteron248 2200 8GB pgf90 42.95 5:59 dec07 zeus opteron 252 2600 2GB g77 44.7 6:13 13apr06 tiger Cray opteron248 2200 8GB g77 46.3 6:27 dec07 abcd Intel P4 Xeon 3200 4GB ifort 58.7 8:09 28oct04 cheetah IBM SP p690 pwr4 1300 1GB xlf 60.3 8:22 17nov02 hawk opteron 242 1596 2GB g77 65.0 9:04 25jan05 hawk opteron 242 1596 2GB ifort 67.13 9:20 30jan05 frodo opteron 240 1396 2GB g77 65.7 9:07 29oct04 fatou Intel P4 Xeon 3056 4GB ifort 66.65 9:16 1jul04 agnesi Intel P4 Xeon 2200 4GB ifort 76.09 10:34 15nov02 hawk opteron 242 1596 2GB pf90 106.9 14:53 29jan05 colt Alpha SC ev67 667 2GB f90 122.42 17:01 3nov01 barnard Sun ultra80 450 1024 f77 306.76 82:07 5dec01 goliath dual PentiumIII 497 1028 ifort 405.84 56:49 4dec01 larry Sun ultra4 296 2048 f77 514.99 71:49 5dec01 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++If you would be willing to run the benchmark on another machine
colt.ccs.ornl.gov on one CPU falcon.ccs.ornl.gov on one CPU Compaq AlphaServer SC, 4 SMP CPUs per node, 2GB RAM CPU: ES40 processor: 21264a (ev67), 667 MHz, 64KB I-cache, 64KB D-cache, 8MB L2 cache On colt or falcon (with prun, no DFS/DCE) 100 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 6588.46u 1.16s 1:49:54 99% 0+2709k 29apr01 ==> 65.88 CPUs/step in 1:49 hrs f95 Compaq Fortran Compiler X5.5-2801-48CAG 6900.18u 0.73s 1:55:05 99% 0+2709k 13nov02 ==> 69.00 CPUs/step in 1:55 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 7382.22u 2.71s 2:03:15 99% 0+2709k 21nov01 ==> 73.82 CPUs/step in 2:03 hrs 300 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 23229.03u 0.96s 6:27:24 99% 0+2721k 29apr01 ==> 77.43 CPUs/step in 6:27 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 28730.36u 3.77s 7:59:15 99% 0+2720k 2nov01 ==> 95.77 CPUs/step in 7:59 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 29594.60u 18.05s 8:14:27 99% 0+2720k 21nov01 ==> 98.65 CPUs/step in 8:14 hrs 400 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 35077.46u 1.11s 9:45:00 99% 0+2725k 29apr01 ==> 87.69 CPUs/step in 9:45 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 41308.00u 1.00s 11:28:52 99% 0+2725k 4nov01 ==> 103.27 CPUs/step in 11:29 hrs uname -a: OSF1 falcon63 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 42850.37u 0.88s 11:54:43 99% 0+2724k 22nov01 ==> 107.13 CPUs/step in 11:54 hrs 500 steps: uname -a: OSF1 colt13 V5.0 910 alpha f90 5.3: f90 -fast -O5 -tune ev6 58239.35u 31.24s 16:11:57 99% 0+2728k 2may01 ==> 116.48 CPUs/step in 16:12 hrs uname -a: OSF1 colt13 V5.1 732 alpha f95 Compaq Fortran Compiler V5.4A-1472-46B2F 61211.11u 1.14s 17:00:46 99% 0+2729k 3nov01 ==> 122.42 CPUs/step in 17:00 hrs
goliath.math.utk.edu Dell ???, dual PentiumIII, 497MHz, 1028MB, 512KB cache uname -a: Linux goliath 2.4.7-10smp #1 SMP i686 unknown g77 version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) g77 -O3 100 steps: 22109.0u 5.75s 6:08:42 22nov01 ==> 221.09 CPUs/step in 6:09 hrs 300 steps: 88982.47u 18.4s 24:43:02 22nov01 ==> 296.61 CPUs/step in 24:43 hrs 400 steps: 135893.92u 42.170s 38:11:27.54 25nov01 ==> 339.73 CPUs/step in 38:11 hrs 500 steps: 202920.600u 100.160s 56:48:51.17 99.2% 4dec01 ==> 405.84 CPUs/step in 56:49 hrs
bearcat.ccs.ornl.gov IBM Power3 ? uname -a: AIX bearcat 3 4 000729924C00 xlf -O5 100 steps: not enought memory to run ! power3.cs.utk.edu IBM Power3 dual SMP uname -a: AIX power3 3 4 00005F6B4C00 xlf -O4 -qarch=auto -qnolm -qtune=pwr3 100 steps: not enought memory to run !
barnard.math.ua.edu (N.Hannoun ran it) Sun ultra-80 dual SMP 450MHz, 1GB, solaris 5.8 SunOS barnard 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-80 f95: Sun WorkShop 6 update 1 Fortran 95 6.1 2000/09/11 f95 -fast -O4 100 steps: 16288.0u 1.0s 6:52:27 65% 24nov01 ==> 162.88 CPUs/step in 6:52 hrs 300 steps: 65744.0u 1.0s 27:53:52 65% 27nov01 ==> 219.15 CPUs/step in 27:54 hrs 400 steps: 102160.0u 2.0s 52:44:52 53% 30nov01 ==> 255.40 CPUs/step in 52:45 hrs 500 steps: 153381.0u 2.0s 82:07:19 51% 4dec01 ==> 306.76 CPUs/step in 82:07 hrs
larry.cas.utk.edu Sun Ultra-4 296MHz, 2048MB, solaris 5.8 SunOS larry 5.8 Generic_108528-07 sun4u sparc SUNW,Ultra-4 f95: Sun WorkShop 6 2000/04/07 FORTRAN 95 6.0 f95 -fast -O4 100 steps: 26388.0u 2.6s 7:19:48 22nov01 ==> 263.88 CPUs/step in 7:20 hrs 300 steps: 109310.20u 3.1s 30:26:39 24nov01 ==> 364.37 CPUs/step in 30:26 hrs 400 steps: 172535.0u 10.0s 53:01:27 90% 26nov01 ==> 431.37 CPUs/step in 53:01 hrs f95 -fast -O4 -xtarget=ultra2 100 steps: 26754.0u 2.0s 7:26:41 99% 7dec01 300 steps: 110351.0u 3.0s 30:45:31 99% 6dec01 ==> 367.84 CPUs/step in 30:46 hrs 400 steps: 170279.0u 3.0s 47:21:55 99% 1dec01 ==> 425.70 CPUs/step in 47:21 hrs 500 steps: 257494.0u 8.0s 71:48:49 -66% 5dec01 ==> 514.99 CPUs/step in 71:49 hrs
capsicum.epm.ornl.gov SGI 32 node SMP, 250MHz, 32K/4MB, 4096 MB uname -a: IRIX64 capsicum 6.5 04131233 IP27 MIPSpro Compilers: Version 7.3.1.1m f90 -Ofast 100 steps: 22080.3u 16.3s 6:10:06 99% 30nov01 ==> 220.80 CPUs/step in 6:10 hrs 300 steps: 65744.8u 20.5s 18:21:09 99% 1dec01 ==> 219.15 CPUs/step in 18:21 hrs 400 steps: 130473.3u 53.6s 36:26:51 99% 3dec01 ==> 326.18 CPUs/step in 36:27 hrs
vxa.math.utk.edu Dell Latitude C600 752MHz, 256MB, 256KB cache redhat7.1 linux2.4.2 g77 version 2.96 20000731 (RedHat Linux 7.1.2.96-81) g77 -O3 100 steps: 19829.04u 7.86s 5:35:57 21nov01 ==> 198.29 CPUs/step in 5:36 hrs 300 steps: 77863.240u 56.950s 23:39:14.78 91.5% 25nov01 ==> 259.54 CPUs/step in 23:39 hrs
knox3.rgrid.utk.edu (a node of knox OIT cluster) Sun UltraSparc 900MHz 1MB uname -a: SunOS knox1 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Fire-280R f95 -V: Forte Developer 7 Fortran 95 7.0 2002/03/09 f95 -fast -O4 100 steps: 7599.0u 0.0s 2:07:16 09jan05 ==> 75.99 CPUs/step in 2:07 hrs
agnesi.math.utk.edu: Dell ??? 2.2GHz Xeon, 4GB, 512KB cache Linux 2.4.9-31enterprise Red Hat Linux release 7.2 Intel(R) Fortran Compiler for 32-bit applications, Version 6.0 Build 020312Z trial nov02 ifc -O3 -mp1 -tpp7 100 steps: 4114.890u 0.440s 1:08:34.60 13nov02 ==> 41.15 CPUs/step in 1:08 hrs 300 steps: 16192.93u 0.62s 4:29:53 13nov02 ==> 53.98 CPUs/step in 4:30 hrs 400 steps: 24590.580u 4.920s 6:49:55.30 14nov02 ==> 61.48 CPUs/step in 6:50 hrs 500 steps: 38043.360u 0.600s 10:34:04.97 15nov02 ==> 76.09 CPUs/step in 10:34 hrs
cheetah.ccs.ornl.gov IBM pSeries System (p690) 27 "Regatta" nodes, each with 32 processors on 16 chips CPU: 1.3 GHz Power4 processor, 64 KB L1 cache, 32 KB D-cache, 1.5 MB L2 cache estimated computational power 4.5 TeraFLOP/s OS: AIX 5.1.0.0 uname -a: AIX cheetah0033 1 5 00207D8A4C00 Fortran level: 7.1.1.3 xlf_r -g -O4 -qnoipa -bmaxdata:0x40000000 run from GPFS area (default is 32bit, needs -bmaxdata for 1GB memory, faster than -q64) runs on single processor: 100 steps: 3107.85u 0.64s 51:59 xlf_r -g -O4 -qnoipa -q64 16nov02 ==> 31.08 CPUs/step in 52 min slower than 32bit: 2886.1u 0.8s 48:09 xlf_r -g -O4 -qnoipa 16nov02 ==> 28.86 CPUs/step in 48 min 300 steps: 11608.9u 0.9s 3:13:32 xlf_r -g -O4 -qnoipa 16nov02 ==> 38.7 CPUs/step in 3:14 hrs 400 steps: 18679.9u 0.9s 5:11:19 xlf_r -g -O4 -qnoipa 17nov02 ==> 46.7 CPUs/step in 5:11 hrs 500 steps: 30138.3u 1.6s 8:22:17 xlf_r -g -O4 -qnoipa 17nov02 ==> 60.28 CPUs/step in 8:22 hrs
fubini.math.utk.edu: Dell ??? 3.06GHz Xeon, 4GB, 512KB cache Red Hat Linux release 9(Shrike) Intel(R) Fortran Compiler for 32-bit applications, Version 7.1 Build 20030909Z 100 steps: 3359.830u 4.730s 56:04.80 ifc -O3 -tpp7 13oct03 300 steps: 12871.430u 12.480s 3:34:45.47 ifc -O3 -tpp7 13oct03 400 steps: 34414.940u 4.390s 9:33:44.05 ? g77 -O3 ? 15oct03
fatou.math.utk.edu: Dell dual P4 Xeon 3.06GHz, 512KB cache, 4GB mem Linux 2.4.20-30.9bigmem #1 SMP ; Red Hat Linux release 9(Shrike) Statically compiled (on ares-linux) with: Intel(R) Fortran Compiler Version 8.0 Build 20031231Z and ran on fatou: 400 steps: 22318.670u 34.470s 6:12:38.22 ifc -O3 -tpp7 1jul04 ==> 55.8 CPUs/step in 6:12 hrs 500 steps: 33327.400u 40.560s 9:16:09.12 ifc -O3 -tpp7 2jul04 ==> 66.65 CPUs/step in 9:16 hrs
abcd.math.vanderbilt.edu dual Intel Pentium 4 XEON 3.20GHz 512KB cache, 4GB mem uname -a: 2.4.9-e.3smp #1 SMP i686 unknown ifc -v: Version 8.0 compiled with: 100 steps: 5206.730u 0.610s 1:26:47.28 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 52.1 CPUs/step in 1:26 hrs 300 steps: 23828.830u 0.770s 6:37:10.30 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 79.4 CPUs/step in 6:37 hrs 400 steps: 30334.310u 0.510s 8:25:35.10 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 75.8 CPUs/step in 8:26 hrs 500 steps: 29350.420u 0.460s 8:09:10.05 ifc -O3 -tpp7 -w95 -FI 27oct04 ==> 58.7 CPUs/step in 8:09 hrs
frodo.sinrg.cs.utk.edu 64 node linux cluster dual AMD Opteron 240 1.4GHz 1024KB cache 2GB mem uname -a: Linux head 2.4.19-NUMA #1 SMP x86_64 gcc -v: gcc version 3.2.2 (SuSE Linux) on head node 100 steps: 3992.950u 43.300s 1:07:24.90 g77 -O3 27oct04 ==> 40.4 CPUs/step in 1:07 hrs 300 steps: 15968.600u 15.180s 4:26:50.86 g77 -O3 27oct04 ==> 53.2 CPUs/step in 4:27 hrs 400 steps: 22812.520u 29.470s 6:20:42.16 g77 -O3 27oct04 ==> 57.1 CPUs/step in 6:20 hrs 500 steps: 32839.200u 0.320s 9:07:19.75 g77 -O3 27oct04 ==> 65.7 CPUs/step in 9:07 hrs
grig.sinrg.cs.utk.edu 64 node linux cluster dual Intel Xeon 3.2GHz 1024KB cache 4GB mem uname -a: Linux grig-head 2.6.8 #1 SMP x86_64 GNU/Linux gcc -v: gcc version 3.3.5 (Debian 1:3.3.5-13) ran via PBS 100 steps: walltime=00:43:18 g77 -O3 26mar07
hawk.csm.ornl.gov head node of 50 node linux cluster dual AMD Opteron 242 1.4GHz 1024KB cache 2GB mem uname -a: Linux g77 -v: gcc version 3.3.3 (SuSE Linux) g77 -O3 on hawk1 (head node) without prun: 100 steps: 3400.614u 0.555s 0:56:44 99.9% w/out prun 25jan05 ==> 34.0 CPUs/step in 0:57 hrs g77 -O3 -fPIC -fno-automatic -finit-local-zero -Wno-globals 100 steps: 3658.311u 0.388s 1:01:00.24 99.9% prun:hawk31 29jan05 ==> 36.6 CPUs/step in 1:01 hrs: slower than -03 on hawk1 300 steps: 16770.257u 17.298s 4:57:49.58 93.9% -O3 w/out prun 25jan05 ==> 55.9 CPUs/step in 4:58 hrs 400 steps: 29192.212u 7.433s 8:06:54.38 99.9% -O3 prun:hawk29 30jan05 ==> 72.98 CPUs/step in 8:07 hrs 500 steps: 32517.853u 17.562s 9:03:53.24 99.7% -O3 w/out prun 25jan05 ==> 65.0 CPUs/step in 9:04 hrs g77 -v: gcc version 3.4.2 <----- faster than 3.3.3 ? g77-3.4.2 -O3 -fPIC -fno-automatic -finit-local-zero -Wno-globals 100 steps: 3633.325u 1.362s 1:00:42.14 99.7% no prun:hawk1 1feb05 g77-3.4.2 -O3 -fPIC <---- best: 100 steps: 3280.363u 0.706s 54:46.54 99.8% no prun:hawk1 1feb05 ==> 31.8 CPUs/step in 0:55 hrs ifort : version 8.1 ifort -O3 on hawk1 (head node) without prun: 100 steps: 3411.446u 1.140s 0:56:59 99.8% 25jan05 ==> 34.1 CPUs/step in 0:57 hrs ifort -O3 -fpic -save -w95 -FI 100 steps: 4197.278u 1.916s 1:10:03.41 99.8% prun: hawk30 29jan05 ==> 42.0 CPUs/step in 1:10 hrs: slower than -O3 on hawk1 300 steps: 16135.677u 14.059s 4:32:54.38 98.6% -O4 w/out prun 30jan05 ==> 53.8 CPUs/step in 4:33 hrs 400 steps: 26026.244u 1.833s 7:13:53.04 99.9% -O4 prun:hawk29 30jan05 ==> 65.07 CPUs/step in 7:14 hrs 500 steps: 33564.316u 1.672s 9:20:07.35 99.8% -O4 prun:hawk30 30jan05 ==> 67.13 CPUs/step in 9:20 hrs pathscale EKO Version 1.4 gcc version 3.3.1 (PathScale 1.4 driver) pathf90 -Ofast -fpic -static-data -msse2 -Wno-globals 100 steps: 6285.782u 0.649s 1:44:47.61 99.9% prun: hawk16 29jan05 ==> 62.9 CPUs/step in 1:45 hrs: pathf90 -Ofast -fpic -static-data -msse2 -mtune=opteron 100 steps: 6486.800u 6.292s 1:48:48.50 99.4% 29jan05 ==> 64.9 CPUs/step in 1:49 hrs: worse with -mtune ! pathf90 -Ofast <-------- fastest for pf90, rest with -Ofast: 100 steps: 5326.438u 4.774s 1:30:34.04 98.1% ==> 53.3 CPUs/step in 1:31 hrs 300 steps: 23438.313u 18.280s 6:34:31.39 99.0% ==> 78.1 CPUs/step in 6:35 hrs 400 steps: 31791.565u 0.343s 8:50:31.37 99.8% prun:hawk31 30jan05 ==> 79.48 CPUs/step in 8:51 hrs 500 steps: 53463.715u 14.770s 14:52:43.87 99.8% w/out prun 28jan05 ==> 106.9 CPUs/step in 14:53 hrs
zeus.math.utk.edu 9+headnode Opteron 252 linux cluster dual AMD Opteron 252 2.6GHz 1024KB cache 2GB mem uname -a: Linux 2.6.12-1.1381_FC3smp x86_64 GNU/Linux g77 -v: gcc version 3.4.4 20050721 (Red Hat 3.4.4-2) g77 -O3 on master or single nodes 100 steps: 2151.284u 0.938s 35:53.67 99.9% 13apr06 ==> 21.5 CPUs/step in 0:36 hrs 41% faster than fatou ifc 300 steps: 10141.622u 0.503s 2:49:05.80 13apr06 ==> 33.8 CPUs/step in 2:49 hrs 400 steps: 15446.104u 0.690s 4:17:32.58 slower than on 100steps 13apr06 ==> 38.6 CPUs/step in 4:18 hrs 31% faster than fatou ifc 500 steps: 22344.229u 0.746s 6:12:32.97 13apr06 ==> 44.7 CPUs/step in 6:13 hrs 33% faster than fatou ifc pgf95 -V: pgf95 6.1-3 64-bit target on x86-64 Linux 13apr06 pgf95 -fast -O3 on master: 40% faster than fatou ifc ! 100 steps: 2194.774u 0.793s 36:38.14 => 21.9 CPUs/step in 0:37 hrs pgf95 -fast -O3 -fastsse on n01: 100 steps: 2202.048u 0.304s 36:43.15 => 22.0 CPUs/step in 0:37 hrs pgf95 -fast -O3 -fastsse -Mconcur (with: setenv NCPUS 2) on n02: 100 steps: 4327.326u 0.550s 36:05.19 199.8%=>43.3 CPUs/step 0:36 hrs pgf95 -fast -O3 -fastsse on single nodes: 300 steps: 9390.347u 0.562s 2:36:34.67 ==> 31.3 CPUs/step 2:37 hrs 7.4% faster than zeus g77 400 steps: 13901.953u 0.580s 3:51:47.79 38% faster than fatou ifc ==> 34.8 CPUs/step 3:52 hrs 9.8% faster than zeus g77 500 steps: 21353.843u 0.796s 5:56:02.18 36% faster than fatou ifc ==> 42.7 CPUs/step 5:56 hrs 4.5% faster than g77 21207.959u 115.662s 5:55:31.74 on n02 ==> 42.4 CPUs/step 5:56 hrs 4.5% faster than g77
oic.ornl.gov 325 node Xeon linux cluster dual Intel Xeon 3.4GHz 2048KB cache 4GB mem uname -a: Linux b06l02 2.6.9-22.0.2.ELsmp #1 SMP x86_64 GNU/Linux ? gigabit interconnect, PBS(torque)scheduler, Maui mgr , MPICH ifort -V: Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.0 Build 20051201 on headnode: /opt/intel/fce/9.0/bin/ifort -fast 100 steps: 2722.722u 0.848s 45:24.70 5may06 ==> 27.2 CPUs/step in 0:45 hrs 300 steps: 14710.469u 3.493s 4:05:19.39 ==> 49.03 CPUs/step in 4:05 hrs 400 steps: ==> CPUs/step in hrs 500 steps: ==> CPUs/step in hrs on a node via 'qsub PBSscript': /opt/mpich-ch_p4-icc-1.2.7/bin/mpif90 -fast -save -w95 -FI 100 steps: 5may06 ==> 27.2 CPUs/step in 45min
newton.usg.utk.edu head of 36-node Xeon linux cluster 32 compute nodes: dual Xeon 3.2GHz uname -a: Linux 2.6.9-11.ELsmp #1 SMP x86_64 x86_64 GNU/Linux g77 -v: gcc version 3.4.3 20050227 (Red Hat 3.4.3-22.1) g77 -O3 -finit-local-zero -Wno-globals 5may06 ifort in /opt/intel/fce/9.0/bin/ifort: Intel(R) Fortran Compiler for Intel(R) EM64T-based v 9.0 Build 20050809 runs on headnode: 100 steps: 1857.528u 0.585s 30:59.99 5may06 ==> 18.6 CPUs/step in 0:31 hrs 300 steps: 7669.275u 1.260s 2:07:58.78 5may06 ==> 25.6 CPUs/step in 2:08 hrs 400 steps: 11916.587u 2.222s 3:18:52.91 5may06 ==> 29.8 CPUs/step in 3:19 hrs 500 steps: 19029.789u 3.392s 5:17:29.36 6may06 ==> 38.1 CPUs/step in 5:17 hrs /opt/mpich/intel/bin/mpif90 -fast -save -w95 -FI /opt/mpich/intel/bin/mpirun -np ... 5may06
tiger.ornl.gov (head of 72-node Cray XD1 linux cluster) 70 compute nodes: dual Opteron 248, 8GB memory Cray RapidArray Interconnect (Hypertransport). LSS synchronizes nodes with global clock and co-schedules processes to avoid latency in global communication. Linux ch328-n6 2.6.5_H_01_04 #39 SMP x86_64 x86_64 GNU/Linux pgf95 -V: pgf95 7.0-2 64-bit target on x86-64 Linux pgf95 -fast -O3 -fastsse dec07 100 steps: 2367.934u 1.985s 39:33.49 => 23.7 CPUs/step 40 min 200 steps: 5033.659u 5.028s 1:24:11.28 => 25.2 CPUs/step 1:24 hrs 300 steps: 9000.059u 6.429s 2:30:23.08 => 30.0 CPUs/step 2:30 hrs 400 steps: 14897.405u 10.201s 4:08:55.42 => 37.2 CPUs/step 4:09 hrs 500 steps: 21475.207u 14.057s 5:58:52.68 => 42.95 CPUs/step 5:59 hrs g77 -v: gcc version 3.3.3 (SuSE Linux) g77 -O3 -Wno-globals -funroll-loops dec07 100 steps: 2397.798u 1.850s 40:03.65 => 24.0 CPUs/step 40 min 300 steps: 10055.587u 6.698s 2:47:57.66 => 33.5 CPUs/step 2:48 hrs 400 steps: 14204.340u 9.494s 3:57:16.88 => 35.5 CPUs/step 3:57 hrs 500 steps: 23164.182u 15.356s 6:26:59.00 => 46.3 CPUs/step 6:27 hrs
zeus.math.edu (head of 52-cpu Linux cluster) installed aug2009 head+2 nodes of dual Quad-Core AMD Opteron 2376 2.3GHz 2GB/node plus 15 dual Opteron 252 nodes 2GB/node uname -a: head.bw01.math.utk.edu 2.6.18-128.2.1.el5 #1 SMP x86_64 ifort -V: Version 11.1 Build 20090630 ID: l_cprof_p_11.1.046 ifort -fast -O3 on 1 cpu of head feb10 100 steps: 1502.461u 0.162s 25:02.91 => 15.0 CPUs/step 25 min 300 steps: 6795.622u 0.389s 1:53:16.53 => 22.6 CPUs/step 1:53 hrs PBS resources_used.mem=261756kb, vmem=591724kb 400 steps: 10194.112u 0.427s 2:49:55.57 => 25.5 CPUs/step 2:50 hrs 500 steps: 14785.711u 0.273s 4:06:28.02 => 29.6 CPUs/step 4:06 hrs
midtown.uthsc.edu (head of 56-cpu Linux cluster) installed nov2009 7 nodes of dual Quad-Core AMD Opteron 2376 2.3GHz 2GB/node uname -a: midtown.bw01.uthsc.edu 2.6.18-164.9.1.el5 #1 SMP x86_64 ifort -V: Version 11.1 Build 20091130 ID: l_cprof_p_11.1.064 ifort -fast -O3 on 1cpu of a node feb10 100 steps: 1545.806u 0.155s 25:46.20 => 15.0 CPUs/step 25 min 200 steps: 3573.607u 0.383s 59:35.62 => 17.9 CPUs/step in 1:00 hrs 300 steps: 6253.090u 1.014s 1:44:15.25 => 20.8 CPUs/step 1:44 hrs 400 steps: 9273.045u 0.260s 2:34:34.12 => 23.2 CPUs/step 2:35 hrs 500 steps: 13772.764u 0.300s 3:49:34.35 => 27.5 CPUs/step 3:50 hrs
....30 apr 2011.... ares.math.utk.edu 64bit dual-core AMD Opteron 2220 x86_64 (Fedora 14) uname -a: 2.6.35.12-88.fc14.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.5.1) 100 steps: 1877.161u 0.890s 31:45.09 => 18.8 CPUs/step in 32 min 200 steps: 4355.763u 0.786s 1:13:36.91 => 21.8 CPUs/step in 1:14 hrs 300 steps: 7974.257u 1.529s 2:14:51.88 => 26.6 CPUs/step in 1:27 hrs 400 steps: => CPUs/step in hrs 500 steps: 18627.188u 3.381s 5:15:04.08 => 37.5 CPUs/step in 3:50 hrs
....30 apr 2011.... ThinkPad X201 32bit Intel Core i5 CPU M 520 @ 2.40GHz i686 2-core uname -a: 2.6.34.8-68.fc13.i686 #1 SMP i686 GNU/Linux gfortran -O3 (gcc 4.4.5) 100 steps: 1086.187u 1.822s 18:08.36 => 10.9 CPUs/step in 18 min 200 steps: 2966.126u 10.776s 49:38.23 => 14.8 CPUs/step in 50 min 300 steps: 4020.415u 3.957s 1:27:21.11 => 13.4 CPUs/step in 1:27 hrs 400 steps: 8379.494u 13.460s 2:20:28.24 => 20.9 CPUs/step in 2:20 hrs 500 steps: 11415.944u 24.757s 17:40:21.74 => 22.8 CPUs/step in 3:10 hrs
....3 feb 2012.... frost.ccs.ornl.gov (node of 2048 core Linux cluster) 3feb2012 SGI Altix ICE 8200 cluster, 128 nodes x16=2048 cores, 24GB mem Intel Xeon CPU E5440 @ 2.83GHz 4GB uname -a: Linux frost3 2.6.18-128.7.1.el5 #1 SMP x86_64 GNU/Linux Red Hat Enterprise Linux Server release 5.3 (Tikanga) ifort -fast 100 steps: 1075.660u 0.531s 17:56.71 => 10.8 CPUs/step 18 min 200 steps: 2377.953u 0.461s 39:39.18 => 11.9 CPUs/step in 40 min 300 steps: 4253.870u 0.209s 1:10:55.38 => 14.2 CPUs/step in 1:11 hrs 400 steps: 6485.084u 0.458s 1:48:06.69 => 16.2 CPUs/step in 1:48 hrs 500 steps: 9637.637u 0.379s 2:40:45.71 => 19.3 CPUs/step in 2:41 hrs
householder.math.utk.edu (20 core cluster) (Fedora19) 26may2014 Two 10 core Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 192GB mem (Huge shared memory, no MPI, no scheduler, just the OS decides) uname -a: 3.14.4-100.fc19.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.8.2) 100 steps: 485.462u 0.443s 8:07.22 => 4.8 CPUs/step in 8 min 300 steps: 1981.121u 0.944s 33:07.25 => 6.0 CPUs/step in 33 min 400 steps: 3064.296u 1.359s 51:13.86 => 7.7 CPUs/step in 51 min 500 steps: 5047.879u 2.180s 1:24:25.39 => 10.1 CPUs/step in 1:24 hrs concurrent running 4 200-steps (in different directories) 1197.249u 0.769s 20:01.92, 1194.257u 0.868s 19:59.24, 1188.664u 1.166s 19:53.68, 1208.337u 0.814s 20:13.14 so they actually ran a bit faster than the one alone! concurrent running 4 400-steps (in different directories) 3423.242u 5.039s 57:22.24, 3308.207u 2.660s 55:25.01, 3307.878u 3.612s 55:25.14, 3670.192u 2.316s 1:01:32.57 all slower than single, one much slower(20%)! So performance IS affected by how many jobs are running.
mars.math.utk.edu (4 core ) (Fedora21) Dell Optiplex ???? 26jun2015 Two 2 core Intel(R) Xeon(R) CPU i7-4790 CPU @ 3.60GHz 24.6GB mem uname -a: 4.0.4-201.fc21.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.9.2-6) 100 steps: 346.288u 0.142s 5:46.65 => 3.5 CPUs/step in 6 min 300 steps: 1452.034u 0.176s 24:12.83 => 4.8 CPUs/step in 24 min 400 steps: 2274.902u 0.390s 37:56.25 => 5.7 CPUs/step in 38 min 500 steps: 3475.478u 0.528s 57:57.41 => 6.95 CPUs/step in 58 min
ares.math.utk.edu (4 core ) (Fedora21) Dell Optiplex 9020 26jun2015 Two 2 core Intel(R) Xeon(R) CPU i7-4790 CPU @ 3.60GHz 16.4GB mem uname -a: 4.0.8-200.fc21.x86_64 #1 SMP x86_64 GNU/Linux gfortran -O3 (gcc 4.9.2-6) 100 steps: 367.116u 0.056s 6:07.31 => 3.7 CPUs/step in 6 min 300 steps: 1553.037u 0.059s 25:53.67 => 5.2 CPUs/step in 26 min 400 steps: 2455.629u 0.071s 40:56.62 => 6.1 CPUs/step in 41 min 500 steps: 3724.907u 0.287s 1:02:11.72 => 7.4 CPUs/step in 62 min
darter.nics.tennessee.edu Cray XC30 (Cascade) supercomputer 724 compute nodes, each with 16 cores, 32 GB of memory. Cores: 2.6 GHz 64bit Intel XEON E5-2600 Peak performance of 240.9 TF Cray Aries router (8GB/sec bandwidth) torque/4.2.9 , moab/7.2.9 scheduler, PBS runs with module PrgEnv-cray/5.2.40: crayftn mpich OMP ftn -o melt-cray.x -ffixed meltflowbnch.f (without OMP) 100 steps: 460u 00:07:40 => 4.6 CPUs/step in 7.7 min 400 steps: 2971u 00:49.52 => 7.4 CPUs/step in 50 min ftn -o melt-cray.x -ffixed -homp meltflowbnch.f (with OMP) 100 steps: 459u 00:07:39 => 4.6 CPUs/step in 7.65 min runs with module PrgEnv-intel/5.2.40: ifort mpich OMP ftn -o melt-intel.x -fixed -fast -openmp meltflowbnch.f (OMP) 400 steps: 3290u 00:54:50 => 8.2 CPUs/step in 55 min
Other benchmarking pages: