Accuracy and Reliability
in Scientific Computing
Chapter 1, What Can Go Wrong in Scientific Computing?
We include only examples where numerical problems have occurred,
not the more common pure programming errors (bugs). More examples
are given in the Stevenson paper 
and in the Thomas Huckle
web site Collection of Software Bugs, .
Floating point precision
The floating point precision has to be sufficient to handle the task.
A well-known example where this was not the case is the
failure [3, 4] with a
on February 25, 1991,
at Dhahran, Saudi Arabia. The
was designed, in order
to avoid detection, to operate for only a few hours at one location.
The velocity of the incoming missile is a floating point number but
the time from the internal clock is an integer, representing
the time in tenths of a second. Before that time is used, the integer number
is multiplied with a numerical approximation of 0.1 to 24 bits,
causing an error 0.000000095 in the conversion factor.
The inaccuracy in the position of the target is proportional to
the product of the target velocity and the length of time the system has
been running. This is a somewhat oversimplified discussion; a more detailed
one is given in .
With the system up and running for 100 hours and a
velocity of the Scud missile of 1676 meters per second, an error of
573 meters is obtained, more than sufficient to cause failure of
the Patriot and success for the Scud, killing 28 Americans. Actually,
the Patriot battery did
not track the Scud, and no Patriot missile was fired.
Modified software, which compensated for the inaccurate time calculation,
arrived the following day. The potential problem had been identified by
the Israelis and reported to the Patriot Project Office on February 11.
Other reported instances where roundoff has had significant effects:
- The Vancouver Stock Exchange  in 1982, where the index (with three decimals)
was updated (and truncated) after each
transaction. After 22 months it had fallen from the initial value 1000.000
to 524.881, but the correctly evaluated index was 1098.811. -- A simple
statistical analysis gives directly that assuming 2000 transactions a day,
the index will lose one unit per day, since the mean truncation error is
0.0005 per transaction. Assuming 22 working days a month
the index would be 516 instead of the actual (but false) 524.881.
- In the Schleswig-Holstein  local elections 1992 one party got
5.0 % in the printout (which was correctly rounded to one decimal), but the correct value rounded to two decimals was 4.97 %, and the
party did therefore not pass the 5 % threshold for getting into the local
parliament, which in turn caused a switch of majority. Similar rules apply not
only in German elections. A special rounding algorithm is required at
the threshold, truncating all values between 4.9 and 5.0 in Germany, or
all values between 3.9 and 4.0 in Sweden! In Florida no rounding
should be done.
- Criminal usages of round off have been reported  occasionally,
involving many minor withdrawals from bank accounts. The common rounding
to whole units of the currency (for example to dollars, removing the cents)
implies that the cross sums do not
exactly agree, which diminishes the chance/risk of detecting the fraud.
- A current problem is connected with the new European currency Euro,
which replaceed 12 national currencies from January 1, 2002. Partly
due to the strictly defined conversion rules the roundoff can have a
significant impact . A problem is that
the conversion factors have 6 significant decimal digits, thus
permitting a varying relative error, and for small amounts the final
result is also to be rounded according to local customs to at most
two decimals. The article discussed three arithmetic errors: conversion errors,
reconversion errors, and totalising errors.
Illegal conversion between data types
On June 4, 1996, an unmanned Ariane 5 rocket launched by the European Space
Agency exploded forty seconds after its lift-off from Kourou, French
Guiana. The Report by the Inquiry Board
found that the failure was caused by the
conversion of a 64-bit floating-point number to a 16-bit signed integer.
The floating-point number was too large to be represented by a
16-bit signed integer (32768). It is worth noting that this part of the software
was required in Ariane 4 but not in Ariane 5.
A somewhat similar problem is illegal mixing of different units of measurement
(SI, Imperial, and US). An example is the Mars Climate Orbiter which was
lost on entering orbit around Mars on September 23, 1999. The "root cause" of the
loss was that a sub-contractor failed to obey the specification that SI units should be used, and instead used Imperial units in their segment of the ground-based software, see .
See also pages 35 - 38 in the book by Telles and Hsieh .
A crew member of the USS Yorktown mistakenly entered a zero for a data value,
which resulted in a division by zero. The error cascaded and eventually shut down the ship's propulsion system in September 1997. The ship was dead in the water for 2 hours and
45 minutes, see [13, 14], 15].
Inaccurate finite element analysis
On August 23, 1991, the Sleipner A offshore platform went down in Gandsfjorden
near Stavanger, Norway. The conclusion of the investigation
 was that the loss
was caused by a failure in a cell wall, resulting in a serious crack and
a leakage that the pumps could not handle. The wall failed as a result of a
combination of a serious usage error in the finite element analysis
(using the popular NASTRAN code) and
insufficient anchorage of the reinforcement in a critical zone.
The shear stress was underestimated by 47 %, leading to insufficient design.
More careful FEM analysis after the accident predicted that failure
would occur at 62 meters depth; it did occur at 65 meters.
Incomplete analysis of the Millennium Bridge
The Millennium Bridge [17, 18,
over the Thames in London was closed on its opening
weekend in May 2000, since it wobbled more than expected.
The simulations done during the design process handled the vertical
force (which is all that is required by the British Standards Institution)
of a pedestrian at around 2 Hz, but not the horizontal
at about 1 Hz. What happened was that the slight wobbling
(within tolerances) due to the wind caused the pedestrians to
walk in step (synchronous walking) which made the bridge wobble even more .
The bridge was reopened in 2002, after that 37 viscous dampers and
54 tuned mass dampers were installed and all the modifications had been carefully tested.
The modifications were completely successful.
In scene 3 of the movie "Harry Potter and the Half-Blood Prince" there is a
nice simulation of a collapse of the Millennium Bridge.
Picture of the installed damping
Another picture of the bridge
Stability of ships and aircraft
The two most known stability problems in Sweden
are when the
 capsized on its maiden voyage in Stockholm harbour
in 1628 and the two crashes
in front of national television of the
JAS Gripen  aircraft in
because too many heavy guns were placed relatively
high up in the ship. In the 17th century there were no
scientific methods of calculating a ship's stability,
and certainly no computers to blame.
The aircraft crashed due to the control systems high
amplification of the pilots rapid joystick movements. The
jet fighter has a very advanced fly-by-wire system, and the
aircraft is designed in principle to be unstable; it requires
the computer to obtain stability. The control
system has now been modified to prevent further incidents.
Back to the whole book page.
1. D. E. Stevenson,
A Critical Look at Quality in Large-Scale Simulations,
IEEE Computing in Science & Engineering, May-June 1999,
Vol. 1, Issue 3, pp. 53-63.
http://csdl.computer.org/comp/mags/cs/1999/03/c3toc.htm (full text requires a subscription)
2. Thomas Huckle, Collection of Software Bugs,
3. GAO Report Patriot Missile Defense -- Software Problem Led to System
Failure at Dhahran, Saudi Arabia, United States General Accounting Office,
B-247094, February 4, 1992.
4. Robert Skeel, "Roundoff Error Cripples Patriot Missile", SIAM News, Volume 25, Number 4, July 1992, page 11.
5. The Vancouver Stock Exchange, references communicated by Valerie Fraysse via G. W. Stewart:
- The Wall Street Journal, November 8, 1983, p.37.
- The Toronto Star, November 19, 1983.
- B.D. McCullough and H.D. Vinod, The Numerical Reliability of Econometric Software,
Journal of Economic Literature,
Vol XXXVII (June 1999), pp. 633-665.
http://www.aeaweb.org/journal/contents/june1999.html (full text requires a subscription)
6. Debora Weber-Wulff,
Rounding error changes Parliament makeup,
The Risks Digest,
Volume 13, Issue 37, 1992.
7. Ken Berkun, London
firms reportedly offer amnesty to "hacker thieves",
The Risks Digest,
Volume 8, Issue 85, 1989.
8. Desmet Gert, EURO page: Conversion Arithmetics,
Remark: The user ~gedesmet is no longer on this web server (June 20, 2001). Essentially the same article is however available
Background information is available in:
European Central Bank, Determination of the euro conversion rates.
9. J. L. Lions,
Ariane 501 - Presentation of Inquiry Board report,
23 July 1996, Paris.
http://www.esa.int/export/esaCP/Pr_33_1996_p_EN.html (Link to full report at the bottom)
Flight 501 failure - first information, 6 June 1996
Ariane-5 and Cluster update, 27 June 1996
10. SIAM News, "Inquiry Board Traces Ariane 5 Failure to
Vol. 29, Number 8, October 1996, pp. 1, 12, 13.
11. Douglas Isbell, MARS POLAR LANDER,
Mars Climate Orbiter Failure Board releases report 99-134.
12. Matt Telles and Yuan Hsieh, "The Science
of Debugging", Coriolis, Scottsdale, Arizona, 2001. ISBN 1-57610-917-8
13. Gregory Slabodkin, Software glitches
leave Navy Smart Ship dead in the water, Government Computer News,
GCN July 13, 1998.
Update in http://www.gcn.com/print/17_32/33639-1.html
14. Alden M. Hayashi, Rough Sailing for
Smart Ships, Scientific American, November 1998.
15. Harvey McKelvey, Letters to the editors, Seaworthy Software, Scientific American, March 15, 1999.
16. SLEIPNER A GBS Loss, Reports 1 - 17.
Available from SINTEF, Norway. Further information is
A popular summary is available in
17. New Scientist magazine, July 8, 2000 and February 19, 2002.
"Bad vibrations, how could the designers
of a revolutionary bridge miss something so obvious?".
18. ARUP, Summary report on the Millennium Bridge.
19. BBC News, The Millennium Bridge
20. Information on Vasa is found at the Vasa Museum web site
21. Information on the JAS Gripen is found at the SAAB web site
Back to the whole book page.
Remark: All web links above
were operating on April 25, 2005, and on February 29, 2008,
and on February 3, 2009.
Java is a trademark of Sun Microsystems, Inc.
IFIP WG 2.5 Project 68 on "Accuracy and Reliability
in Scientific Computing".
Last modified: December 6, 2009
The extra empty lines below are for getting the Web internal links
to the bibliography items to work properly!