Fault Tolerance

Fault tolerance is the intrincs ability of a software system to continuously deliver service to its user in the presence of faults. This approach to software reliability addresses how to keep a system functioning after the faults in the delivered system manifest themselves. The implementation of software fault tolerance is dramatically different from that of hardware. In a hardware fault tolerant system a second or third complete set of hardware is running in parallel, shadowing the execution of the main processor. All of immediately picks up the application. This is addressing of the faults shown in the bathtub curve- it shown hardware wearing out.

Fault tolerance begins at the implementation product development phase and extends through installation, operations and supports, and maintenance to final product retirement. As long as the software is running in a production mode, the fault tolerance approach to reliability is useful.

