Incognito highway
Incognito uses a pBFT-based Proof-of-Stake consensus protocol. Combined with state sharding, this architecture allows us to maintain high TPS while preserving the security of the network.
Introduction
After 3 months of extensive testing on our public testnet, we encountered 2 problems that would hinder our ability to scale out further:
- A large percentage of Incognito validators will power the network through Incognito Physical Node device. As the Physical Node is normally set up at home, it will sit behind NAT/firewall so other peers cannot connect to it. Consider an extreme case when all validators of a shard are powered by physical Node devices; none will be able to connect with the others.
- Each validator’s client must manage too many connections (to other validators in the same shard, to other shards and to the beacon). To maintain sufficient data availability and to prevent message-filtering-attacks, messages must be broadcasted to everyone. This data eats up a lot of bandwidth as well as processing time.
We decided to tackle these problems with a new network topology, using highly dedicated nodes called highways. In simple terms, highways are proxies responsible for forwarding messages inside the network.
The design of highways embraces 4 key principles:
- Highly available: highways should always be available to serve node requests; we sacrifice consistency in the presence of network partition.
- Incremental scalability: we should be able to scale out one highway at a time according to the needs of the network.
- Heterogeneity: work distribution must be proportional to the capabilities of the individual highway.
- Symmetry: every highway performs the same functionalities. This leads to better provisioning and maintenance in the long run.
New network topology
The above figure gives a highly simplified view of Incognito’s network topology. Highways are responsible for each shard (and beacon). Instead of connecting directly, each node now connects to one highway supporting a shard that they need. All highways are connected (using a mesh/ring or fully-connected topology) to ensure the flow of messages. Also, each highway must have static public IP for other nodes to connect to.
When a node wants to broadcast data, it sends it to the highway instead. Then the highway will broadcast that data to other highways as well as other peers connected to it. Because of the topology (coupled with some additional logic to filter out duplicate messages), we can reduce the redundant data that a peer needs to receive and process. Also, latency is reduced since there are at most 3 hops to pass a message between 2 peers. The following diagram shows these 2 cases (the green path shows traces of a message M from A to B):
API
Just like Incognito chain, highway’s code is also open-source. This is only a reference implementation and everyone is welcome to contribute to it or implement a new one in their favorite programming language.
A highway needs to provide 3 main functions for an Incognito client:
- Broadcast and listen for new blocks as they are generated using a Publish/Subscribe model
- Request and provide some old blocks according to gRPC framework
- Broadcast and listen to the current state of the network
To put it more concretely, a highway satisfies this interface:
type Highway interface {
BroadcastBlock(blockHeight int, blockData []byte, shardID int) error
ListenToBlock(shardID int) (channel []byte, error)
RequestBlock(blockHeight int, shardID int) ([]byte, error)
PublishState(pubkey []byte, state []byte, shardID int) error
}
Scale
Incognito’s goal is to maintain a network of 64 shards, each with 256 validators working together. For the sake of further calculations, let’s assume that we will keep the blocktime (40 seconds/block) and blocksize (maximum 2MB/block).
We will now calculate the bandwidth required for each highway to maintain at maximum network capacity. 3 main types of messages contribute to the workload:
- pBFT messages
- Block broadcasting messages
- Cross-shard messages
For the first type, let’s call N the total number of validators of one shard. If each highway supports K validators, we need at least N⁄K highways. Since each round of pBFT requires O(N2) messages, a single highway will need to receive O(N2⁄ N⁄K) = O(N×K) messages (assuming K is much smaller than N). Plug the numbers in, we have 2MB×N×K ⁄ 40s = 12.8×K MB/s.
For the second type, assuming each validator will have 10 other substitutes, we need to broadcast the latest block to K×10 other nodes. This requires 2MB×K×10 ⁄ 40s = 0.5×K MB/s.
For the last type, each highway will receive a new block from 63 other shards (and 1 from beacon), therefore we have 64×2MB×K ⁄ 40s = 3.2×K MB/s.
In total, each highway needs a bandwidth of 16.5×K MB/s to maintain the network at full load. Using a modest K = 32, we obtain the hard requirement of 528 MB/s. In practice, this number is much lower. As of Feb 2020, most blocks on the Incognito mainnet are only around 10 KB each.
Another aspect we can look into is memory. To maintain low latency, highway needs to store in memory the most accessed blocks. To store M epochs of blocks, we need M×2MB/block × 350 blocks/epoch = M×700 MB. M doesn’t need to be high since validators are mostly always caught up to the latest blocks, therefore old blocks are rarely ever requested.
Design choices
For the first design principle (availability), highway must be crash-tolerant. This is achieved through replications. Each shard is supported by at least two highways (preferably on different data centers). Facing network partition, each highway acts on its own to serve the client’s requests (based on the data that it has at the moment). As all messages are rechecked by Incognito validators, there’s no need for strong consistency across all highways. This design choice simplifies a lot of operations under the hood.
The 2nd and 3rd principles are satisfied using consistent hashing, to choose which highway serves which Incognito nodes. This technique has 2 advantages:
- We can add/remove highways without affecting the whole network.
- In the case of an unstable network (nodes coming offline and online constantly), the topology will stay roughly the same.
For the first function (publish and subscribe blocks), a simple publish/subscribe pattern works fine for our use-case. For maximum compatibility (with old versions of Incognito and with other blockchains), we used the highly modular network stack libp2p.
For the second function (request old blocks), instead of broadcasting request messages to every peer (which would be costly), highway will find an appropriate peer and ask it directly. We implemented this function using the high-performant framework gRPC.
A small but quite important choice is how we maintain the membership of the network of highways. To keep the design simple, we reuse the pub/sub pattern above, using a dedicated topic to gossip the additions and removals of highways.
Discussion and on-going research
There are a few subtle points that we can observe from the design of highway:
- Everything is cryptographically signed by validators, highway cannot forge message in any way. Highway is essentially a messenger and doesn’t interfere with the consensus process.
- If a significant number of highways become unavailable (either due to attack or network outage), new blocks might be slowed down until new highway comes online to cover more than ⅔ of the validators.
- The centralization aspect of this design: highway is currently run by the Core Development Team. We are working on a plan to enable other teams/individuals to run and connect their highways to the Incognito network.
Around mid-December 2019, we successfully deployed highway on mainnet. The best thing is that it all happened in the background so users didn’t have to do anything to accommodate the change. As of Feb 2020, highway is fully operational and is currently being optimized.
As a separate on-going effort, we are evaluating other promising approaches. For example, WebRTC can be used to tackle the problem with NAT/firewall. Also, other techniques like Hole punching or Circuit relay might worth some experiments.
References
[GS12] S. Gilbert and N. Lynch, “Perspectives on the CAP Theorem” in Computer, vol. 45, no. 02, pp. 30-36, 2012.
[KLL97] David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (STOC ’97). Association for Computing Machinery, New York, NY, USA, 654–663.
[CHJ07] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP ’07). Association for Computing Machinery, New York, NY, USA, 205–220.