Highway: an upgrade to Incognito’s network topology

Incognito highway

Incognito uses a pBFT-based Proof-of-Stake consensus protocol. Combined with state sharding, this architecture allows us to maintain high TPS while preserving the security of the network.

Introduction

After 3 months of extensive testing on our public testnet, we encountered 2 problems that would hinder our ability to scale out further:

  • A large percentage of Incognito validators will power the network through Incognito Physical Node device. As the Physical Node is normally set up at home, it will sit behind NAT/firewall so other peers cannot connect to it. Consider an extreme case when all validators of a shard are powered by physical Node devices; none will be able to connect with the others.
  • Each validator’s client must manage too many connections (to other validators in the same shard, to other shards and to the beacon). To maintain sufficient data availability and to prevent message-filtering-attacks, messages must be broadcasted to everyone. This data eats up a lot of bandwidth as well as processing time.

We decided to tackle these problems with a new network topology, using highly dedicated nodes called highways. In simple terms, highways are proxies responsible for forwarding messages inside the network.

The design of highways embraces 4 key principles:

  1. Highly available: highways should always be available to serve node requests; we sacrifice consistency in the presence of network partition.
  2. Incremental scalability: we should be able to scale out one highway at a time according to the needs of the network.
  3. Heterogeneity: work distribution must be proportional to the capabilities of the individual highway.
  4. Symmetry: every highway performs the same functionalities. This leads to better provisioning and maintenance in the long run.

New network topology

14

The above figure gives a highly simplified view of Incognito’s network topology. Highways are responsible for each shard (and beacon). Instead of connecting directly, each node now connects to one highway supporting a shard that they need. All highways are connected (using a mesh/ring or fully-connected topology) to ensure the flow of messages. Also, each highway must have static public IP for other nodes to connect to.

When a node wants to broadcast data, it sends it to the highway instead. Then the highway will broadcast that data to other highways as well as other peers connected to it. Because of the topology (coupled with some additional logic to filter out duplicate messages), we can reduce the redundant data that a peer needs to receive and process. Also, latency is reduced since there are at most 3 hops to pass a message between 2 peers. The following diagram shows these 2 cases (the green path shows traces of a message M from A to B):

API

Just like Incognito chain, highway’s code is also open-source. This is only a reference implementation and everyone is welcome to contribute to it or implement a new one in their favorite programming language.

A highway needs to provide 3 main functions for an Incognito client:

  1. Broadcast and listen for new blocks as they are generated using a Publish/Subscribe model
  2. Request and provide some old blocks according to gRPC framework
  3. Broadcast and listen to the current state of the network

To put it more concretely, a highway satisfies this interface:

type Highway interface {
BroadcastBlock(blockHeight int, blockData []byte, shardID int) error
ListenToBlock(shardID int) (channel []byte, error)
RequestBlock(blockHeight int, shardID int) ([]byte, error)
PublishState(pubkey []byte, state []byte, shardID int) error
}

Scale

Incognito’s goal is to maintain a network of 64 shards, each with 256 validators working together. For the sake of further calculations, let’s assume that we will keep the blocktime (40 seconds/block) and blocksize (maximum 2MB/block).

We will now calculate the bandwidth required for each highway to maintain at maximum network capacity. 3 main types of messages contribute to the workload:

  1. pBFT messages
  2. Block broadcasting messages
  3. Cross-shard messages

For the first type, let’s call N the total number of validators of one shard. If each highway supports K validators, we need at least NK highways. Since each round of pBFT requires O(N2) messages, a single highway will need to receive O(N2NK) = O(N×K) messages (assuming K is much smaller than N). Plug the numbers in, we have 2MB×N×K ⁄ 40s = 12.8×K MB/s.

For the second type, assuming each validator will have 10 other substitutes, we need to broadcast the latest block to K×10 other nodes. This requires 2MB×K×10 ⁄ 40s = 0.5×K MB/s.

For the last type, each highway will receive a new block from 63 other shards (and 1 from beacon), therefore we have 64×2MB×K ⁄ 40s = 3.2×K MB/s.

In total, each highway needs a bandwidth of 16.5×K MB/s to maintain the network at full load. Using a modest K = 32, we obtain the hard requirement of 528 MB/s. In practice, this number is much lower. As of Feb 2020, most blocks on the Incognito mainnet are only around 10 KB each.

Another aspect we can look into is memory. To maintain low latency, highway needs to store in memory the most accessed blocks. To store M epochs of blocks, we need M×2MB/block × 350 blocks/epoch = M×700 MB. M doesn’t need to be high since validators are mostly always caught up to the latest blocks, therefore old blocks are rarely ever requested.

Design choices

For the first design principle (availability), highway must be crash-tolerant. This is achieved through replications. Each shard is supported by at least two highways (preferably on different data centers). Facing network partition, each highway acts on its own to serve the client’s requests (based on the data that it has at the moment). As all messages are rechecked by Incognito validators, there’s no need for strong consistency across all highways. This design choice simplifies a lot of operations under the hood.

The 2nd and 3rd principles are satisfied using consistent hashing, to choose which highway serves which Incognito nodes. This technique has 2 advantages:

  1. We can add/remove highways without affecting the whole network.
  2. In the case of an unstable network (nodes coming offline and online constantly), the topology will stay roughly the same.

For the first function (publish and subscribe blocks), a simple publish/subscribe pattern works fine for our use-case. For maximum compatibility (with old versions of Incognito and with other blockchains), we used the highly modular network stack libp2p.

For the second function (request old blocks), instead of broadcasting request messages to every peer (which would be costly), highway will find an appropriate peer and ask it directly. We implemented this function using the high-performant framework gRPC.

A small but quite important choice is how we maintain the membership of the network of highways. To keep the design simple, we reuse the pub/sub pattern above, using a dedicated topic to gossip the additions and removals of highways.

Discussion and on-going research

There are a few subtle points that we can observe from the design of highway:

  1. Everything is cryptographically signed by validators, highway cannot forge message in any way. Highway is essentially a messenger and doesn’t interfere with the consensus process.
  2. If a significant number of highways become unavailable (either due to attack or network outage), new blocks might be slowed down until new highway comes online to cover more than ⅔ of the validators.
  3. The centralization aspect of this design: highway is currently run by the Core Development Team. We are working on a plan to enable other teams/individuals to run and connect their highways to the Incognito network.

Around mid-December 2019, we successfully deployed highway on mainnet. The best thing is that it all happened in the background so users didn’t have to do anything to accommodate the change. As of Feb 2020, highway is fully operational and is currently being optimized.

As a separate on-going effort, we are evaluating other promising approaches. For example, WebRTC can be used to tackle the problem with NAT/firewall. Also, other techniques like Hole punching or Circuit relay might worth some experiments.

References

[GS12] S. Gilbert and N. Lynch, “Perspectives on the CAP Theorem” in Computer, vol. 45, no. 02, pp. 30-36, 2012.

[KLL97] David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (STOC ’97). Association for Computing Machinery, New York, NY, USA, 654–663.

[CHJ07] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP ’07). Association for Computing Machinery, New York, NY, USA, 205–220.

20 Likes

What does actually occur in this scenario? What is the potential effect on vNodes and validators?

1 Like

Hey @Mike_Despo, great questions!
In this situation, ‘attack’ means that someone is intentionally preventing highways from forwarding messages in the network (e.g., DDOS the highways, exploit bugs in code, …). Likewise, a network outage can occur (e.g., faulty equipment in the data centers, intercontinental network cable got broken, …) that will result in similar effects.

In these scenarios, validators (both vNodes and pNodes) cannot pass the data around to create new blocks (Incognito uses a variant of pBFT, votes/attestations for each block must be seen by 2/3 of the network for that block to be finalized). Validators will keep trying to create blocks to no avail. Since the block reward is proportional to the number of blocks created, validators might receive less PRV for that epoch.

Of course, when new highways go online, everything will resume to normal behaviors.

5 Likes

Interesting, thanks for the explanation!

1 Like

Is it currently possible to setup and run a highway node? If not, is this on the roadmap?

If I understand correctly this is currently the weakest link in incognito’s infrastructure as mentioned in the original post about someone DDOSing the limited amount of central highways currently established.

@hyng

2 Likes

:+1: SHARD VERY COOL! I get VERY VERY COOL!

4 Likes

I think two highways will not be enough for network availability if they are not functioning the same, as different highways support different sets of validators.

1 Like

In network design, if pBTF is employed, I think there should be at least 4 replications of the same thing to make sure the network still be functional.