# Accuracy and Reliability in Scientific Computing

## Chapter 1, What Can Go Wrong in Scientific Computing?

### Section 1.4 What Really Went Wrong in Applied Scientific Computing!

We include only examples where numerical problems have occurred, not the more common pure programming errors (bugs). More examples are given in the Stevenson paper [1] and in the Thomas Huckle web site Collection of Software Bugs, [2].

• #### Floating point precision

The floating point precision has to be sufficient to handle the task. A well-known example where this was not the case is the Patriot missile failure [3, 4] with a Scud missile on February 25, 1991, at Dhahran, Saudi Arabia. The Patriot missile was designed, in order to avoid detection, to operate for only a few hours at one location. The velocity of the incoming missile is a floating point number but the time from the internal clock is an integer, representing the time in tenths of a second. Before that time is used, the integer number is multiplied with a numerical approximation of 0.1 to 24 bits, causing an error 0.000000095 in the conversion factor. The inaccuracy in the position of the target is proportional to the product of the target velocity and the length of time the system has been running. This is a somewhat oversimplified discussion; a more detailed one is given in [4]. With the system up and running for 100 hours and a velocity of the Scud missile of 1676 meters per second, an error of 573 meters is obtained, more than sufficient to cause failure of the Patriot and success for the Scud, killing 28 Americans. Actually, the Patriot battery did not track the Scud, and no Patriot missile was fired.

Modified software, which compensated for the inaccurate time calculation, arrived the following day. The potential problem had been identified by the Israelis and reported to the Patriot Project Office on February 11.

Other reported instances where roundoff has had significant effects:

• The Vancouver Stock Exchange [5] in 1982, where the index (with three decimals) was updated (and truncated) after each transaction. After 22 months it had fallen from the initial value 1000.000 to 524.881, but the correctly evaluated index was 1098.811. -- A simple statistical analysis gives directly that assuming 2000 transactions a day, the index will lose one unit per day, since the mean truncation error is 0.0005 per transaction. Assuming 22 working days a month the index would be 516 instead of the actual (but false) 524.881.

• In the Schleswig-Holstein [6] local elections 1992 one party got 5.0 % in the printout (which was correctly rounded to one decimal), but the correct value rounded to two decimals was 4.97 %, and the party did therefore not pass the 5 % threshold for getting into the local parliament, which in turn caused a switch of majority. Similar rules apply not only in German elections. A special rounding algorithm is required at the threshold, truncating all values between 4.9 and 5.0 in Germany, or all values between 3.9 and 4.0 in Sweden! In Florida no rounding should be done.

• Criminal usages of round off have been reported [7] occasionally, involving many minor withdrawals from bank accounts. The common rounding to whole units of the currency (for example to dollars, removing the cents) implies that the cross sums do not exactly agree, which diminishes the chance/risk of detecting the fraud.

• A current problem is connected with the new European currency Euro, which replaceed 12 national currencies from January 1, 2002. Partly due to the strictly defined conversion rules the roundoff can have a significant impact [8]. A problem is that the conversion factors have 6 significant decimal digits, thus permitting a varying relative error, and for small amounts the final result is also to be rounded according to local customs to at most two decimals. The article discussed three arithmetic errors: conversion errors, reconversion errors, and totalising errors.

• #### Illegal conversion between data types

On June 4, 1996, an unmanned Ariane 5 rocket launched by the European Space Agency exploded forty seconds after its lift-off from Kourou, French Guiana. The Report by the Inquiry Board [9, 10] found that the failure was caused by the conversion of a 64-bit floating-point number to a 16-bit signed integer. The floating-point number was too large to be represented by a 16-bit signed integer (32768). It is worth noting that this part of the software was required in Ariane 4 but not in Ariane 5.

A somewhat similar problem is illegal mixing of different units of measurement (SI, Imperial, and US). An example is the Mars Climate Orbiter which was lost on entering orbit around Mars on September 23, 1999. The "root cause" of the loss was that a sub-contractor failed to obey the specification that SI units should be used, and instead used Imperial units in their segment of the ground-based software, see [11]. See also pages 35 - 38 in the book by Telles and Hsieh [12].

• #### Illegal data

A crew member of the USS Yorktown mistakenly entered a zero for a data value, which resulted in a division by zero. The error cascaded and eventually shut down the ship's propulsion system in September 1997. The ship was dead in the water for 2 hours and 45 minutes, see [13, 14], 15].

• #### Inaccurate finite element analysis

On August 23, 1991, the Sleipner A offshore platform went down in Gandsfjorden near Stavanger, Norway. The conclusion of the investigation [16] was that the loss was caused by a failure in a cell wall, resulting in a serious crack and a leakage that the pumps could not handle. The wall failed as a result of a combination of a serious usage error in the finite element analysis (using the popular NASTRAN code) and insufficient anchorage of the reinforcement in a critical zone.

The shear stress was underestimated by 47 %, leading to insufficient design. More careful FEM analysis after the accident predicted that failure would occur at 62 meters depth; it did occur at 65 meters.

• #### Incomplete analysis of the Millennium Bridge

The Millennium Bridge [17, 18, 19] over the Thames in London was closed on its opening weekend in May 2000, since it wobbled more than expected. The simulations done during the design process handled the vertical force (which is all that is required by the British Standards Institution) of a pedestrian at around 2 Hz, but not the horizontal at about 1 Hz. What happened was that the slight wobbling (within tolerances) due to the wind caused the pedestrians to walk in step (synchronous walking) which made the bridge wobble even more .

The bridge was reopened in 2002, after that 37 viscous dampers and 54 tuned mass dampers were installed and all the modifications had been carefully tested. The modifications were completely successful.

In scene 3 of the movie "Harry Potter and the Half-Blood Prince" there is a nice simulation of a collapse of the Millennium Bridge.

• #### Stability of ships and aircraft

The two most known stability problems in Sweden are when the Vasa [20] capsized on its maiden voyage in Stockholm harbour in 1628 and the two crashes in front of national television of the JAS Gripen [21] aircraft in Linköping 1989 and in Stockholm 1993.

The ship capsized because too many heavy guns were placed relatively high up in the ship. In the 17th century there were no scientific methods of calculating a ship's stability, and certainly no computers to blame.

The aircraft crashed due to the control systems high amplification of the pilots rapid joystick movements. The jet fighter has a very advanced fly-by-wire system, and the aircraft is designed in principle to be unstable; it requires the computer to obtain stability. The control system has now been modified to prevent further incidents.

Back to the whole book page.

### Bibliography

Back to the whole book page.

Remark: All web links above were operating on April 25, 2005, and on February 29, 2008, and on February 3, 2009.

Java is a trademark of Sun Microsystems, Inc.

IFIP WG 2.5 Project 68 on "Accuracy and Reliability in Scientific Computing".
boein@nsc.liu.se

The extra empty lines below are for getting the Web internal links to the bibliography items to work properly!