How to survive a Byzantine attack: A guide to Byzantine fault tolerance

Imagine that you are a general in the Byzantine army, and you are planning to attack a city with your fellow generals. You have to communicate with them through messengers, and agree on a common strategy: either attack or retreat. Sounds simple, right? Well, not so fast. What if some of the messengers get lost, or some of the generals are traitors, or some of the messages are corrupted? How can you ensure that you and your loyal allies reach a consensus, and execute the same action, without being sabotaged by the enemies?

This is the essence of the Byzantine Generals’ Problem, a classic dilemma in distributed computing that illustrates the challenges of reaching consensus on unreliable networks. In this article, we will explain what Byzantine fault tolerance is, how it works, what are its types and applications, and how it relates to blockchain and cryptocurrency.

What is Byzantine fault tolerance?

Byzantine fault tolerance (BFT) is the property of a system that can resist the failures caused by the Byzantine Generals’ Problem. This means that a BFT system can continue to operate correctly, even if some of the nodes in the network fail or behave maliciously.

The Byzantine Generals’ Problem has several possible solutions, and therefore, there are different ways to build a BFT system. Similarly, blockchains have different methods to achieve BFT, and these are what we call consensus algorithms.

How does Byzantine fault tolerance work?

The main goal of a BFT system is to ensure that all the honest and reliable nodes in the network agree on the same value or decision, and ignore the faulty or dishonest ones. To achieve this, a BFT system usually follows a protocol that involves three steps: propose, vote, and commit.

Propose: One of the nodes in the network, called the leader or the proposer, suggests a value or a decision to the other nodes, called the followers or the acceptors. For example, the leader may propose to attack the city at dawn.
Vote: The followers receive the proposal from the leader, and send their votes to the leader and to each other. The votes can be either yes or no, depending on whether they agree with the proposal or not. For example, a follower may vote yes if he thinks that attacking the city is a good idea, or no if he thinks otherwise.
Commit: The leader collects the votes from the followers, and decides whether the proposal has reached a quorum or not. A quorum is the minimum number of votes required to accept the proposal. The leader then broadcasts the final decision to the followers, and they commit to it. For example, if the proposal has received more than two-thirds of the votes, the leader announces that the attack is on, and the followers prepare for it.

This protocol may vary depending on the type of BFT system, but the basic idea is the same. The leader and the followers have to exchange messages and votes, until they reach a consensus on the proposal.

What are the types of Byzantine fault tolerance?

There are different types of BFT systems, depending on the assumptions and the properties of the network. Here, we will distinguish between crash fault tolerance (CFT), practical Byzantine fault tolerance (PBFT), and probabilistic Byzantine fault tolerance (pBFT).

Crash fault tolerance (CFT)

Crash fault tolerance (CFT) is a type of BFT system that can tolerate nodes that fail by crashing or stopping. This means that the nodes either send correct messages, or send no messages at all. CFT systems assume that there are no malicious nodes in the network, and that the network is synchronous, meaning that there is a known upper bound on the message delivery time.

CFT systems are simpler and faster than other types of BFT systems, but they are also less robust and secure. They can only handle benign failures, such as hardware or software errors, but not malicious attacks, such as hacking or sabotage. CFT systems can tolerate up to one-third of the nodes failing, as long as the remaining nodes are honest and reliable.

An example of a CFT system is the Paxos algorithm, which is widely used in distributed databases and cloud computing.

Practical Byzantine fault tolerance (PBFT)

Practical Byzantine fault tolerance (PBFT) is a type of BFT system that can tolerate nodes that fail arbitrarily, meaning that they can send incorrect or inconsistent messages, or collude with other nodes. This means that the nodes can be either honest, faulty, or malicious. PBFT systems assume that the network is partially synchronous, meaning that there is an unknown upper bound on the message delivery time, but it eventually becomes known.

PBFT systems are more complex and slower than CFT systems, but they are also more robust and secure. They can handle both benign and malicious failures, and can tolerate up to one-third of the nodes being faulty or malicious, as long as the remaining nodes are honest and reliable.

An example of a PBFT system is the Tendermint algorithm, which is used by some blockchains, such as Cosmos and Binance Chain.

Probabilistic Byzantine fault tolerance (pBFT)

Probabilistic Byzantine fault tolerance (pBFT) is a type of BFT system that can tolerate nodes that fail arbitrarily, but with a probabilistic guarantee, meaning that the probability of reaching consensus is high, but not 100%. This means that the nodes can be either honest, faulty, or malicious, and the network can be either synchronous or asynchronous, meaning that there is no upper bound on the message delivery time.

pBFT systems are more scalable and faster than PBFT systems, but they are also less deterministic and reliable. They can handle both benign and malicious failures, and can tolerate more than one-third of the nodes being faulty or malicious, as long as the probability of reaching consensus is high enough.

An example of a pBFT system is the Nakamoto consensus algorithm, which is used by Bitcoin and other blockchains that rely on proof-of-work (PoW).

What are the applications of Byzantine fault tolerance?

Byzantine fault tolerance is a useful property for any system that involves distributed computing and coordination. Some of the applications of BFT systems are:

Blockchain and cryptocurrency: As we have seen, BFT systems are essential for achieving consensus on distributed networks, such as blockchains. BFT systems allow blockchains to operate in a decentralized, secure, and transparent way, without the need for intermediaries or trusted parties. BFT systems also enable blockchains to resist various types of attacks, such as double-spending, censorship, or 51% attacks.
Cloud computing and distributed databases: BFT systems are also widely used in cloud computing and distributed databases, where multiple servers or nodes need to store and process large amounts of data. BFT systems ensure that the data is consistent and available across the network, even if some of the nodes fail or act maliciously. BFT systems also improve the performance and scalability of the system, by allowing parallel processing and load balancing.
Aerospace and nuclear industries: BFT systems are also applied in critical domains, such as aerospace and nuclear industries, where safety and reliability are paramount. BFT systems ensure that the systems can function correctly, even if some of the components fail or are compromised. BFT systems also prevent catastrophic failures, such as crashes or meltdowns, by allowing the systems to recover from errors and resume normal operation.

Conclusion

The Byzantine Generals’ Problem is an interesting dilemma that gave rise to the BFT systems, which are widely used in various scenarios. Besides the blockchain industry, some of the use cases of BFT systems include aviation, aerospace, and nuclear industries. Achieving consensus on these systems requires continuous efforts, but the existing consensus algorithms have not overcome some limitations, such as scalability. Nevertheless, PoW and PoS are interesting approaches to BFT systems, and their potential applications inspire more innovations. In this article, we learned about BFT systems, starting from their background - the Byzantine Generals’ Problem, followed by the characteristics of BFT systems, and the consensus algorithms as solutions in the blockchain domain.