[Shipped] Dynamic committee size

ruler · 20 April 2020 03:54

Great post @dungtran. There are two point in my mind about Dynamic Committee Size:

Committee Auto Adjustment between epoch. After each epoch the algorithm automatically adjust the committee size of all shards and beacon, so the can balanced the number of waiting candidate as well as committee in shard. I think fixed committee size in one epoch time is still reasonable at early stage of development. Because Incognito’s epoch is relatively short, in fact it’s really short (just 3 hour), compared to 24 hours of Harmony.one and others (maybe days). So if we got an honest validator from candidate list. The probability that validator still honest in next three hour is high. It’s too little time to corrupt that validator during an epoch.
Dynamic committee size during an epoch: I think this would help slashing become more effectively. Because Incognito Chain is using PBFT, so committee is very important to identify how many signatures proposer need to get so as to commit a block. So when nodes try to kick out some validator out of committee, we need to adjust the committee size in order to continue producing block with PBFT. In my opinion, this problem is very rare because of the assumption: we got 2f + 1 honest node in our system.

So I think Point 1 should be solved first then Point 2 next. The reason why Point 1 should be solved first because:

Incognito Chain currently has a lot of waiting candidate in list, it’s very much unbalanced 176 node in committee and more than 1000 node in waiting list. If we balance these two number then we greatly enhance the network security. Of course this also push a lot of pressure on network bandwidth and synchronization.
Beside committee size is algorithmically changed in a secure way. Not just waiting for an external force.

I think Point 1 match @dungtran Problem 1 then Point 2 match Problem 2. I haven’t thought much about Problem 3.

ruler · 20 April 2020 04:14

In Problem 2, i have a question. Why don’t we let shard change committee itself then update to beacon later? This enhance shard independent ability also avoiding beacon single point of failure. Is there any other problem?

dungtran · 20 April 2020 04:20

Thank you for interesting questions.

The committee size is actually determined by the protocol, not by Beacon committee. The Beacon chain only stores the aggregated signature on its chain as a proof of missing-vote nodes.
If the beacon chain failed to commit new blocks for a while, it can base on this rule to auto replace the disconnected nodes.

In practice, the aggregated signature might be mistaken to include the signatures from some validators. However, we are doing statistical analysis to find the error rates. For example if some validators missed their votes in more than 30% of committed blocks while these rates are less than 10% for all other validators. If it’s the case, it’s safe to say that some validators don’t have fairly contribute to the robustness of the chain. The chain could somehow limit their appearance in the committee until they improve the performance. The protocol could base on the beacon on-chain db to make the decisions.

dungtran · 20 April 2020 04:52

If a shard auto updates its committees, then it has to wait until the beacon updates this new committee on the beacon chain, and other shards can acknowledge the update, i.e the system (beacon/shards) has the same view. Therefore, if the beacon chain auto updates the shard committee by the protocol, then we should save one step of submitting a new committee by a shard to the beacon chain.

ruler · 22 April 2020 03:19

However, on the other hands if we let beacon chain auto updates shard committee by protocol, shard has to wait for beacon to acknowledge there’s some problem in shard. Then beacon chain has to committed new block or a kind of message that beacon chain agree to change shard committee.
If we let’s shard change committee by itself, shard could continue to work more quickly. So there’s a tradeoff here, I think we should choose what benefit to maximize.

Shard update committee itself: Shard can continue to work quickly, but it will delay beacon and other to receive new committee shard list. In this case, how about, shard auto adjust his committee size, produce new block with slashing proof and committee size adjustment
Beacon update shard committee: Shard will have to wait for beacon update, this would delay block creation. But help network to acknowledge new shard committee quickly

dungtran · 23 April 2020 03:46

Cross-shard tx won’t be accepted by Beacon or other shards, since they have no idea about the new committee.
Per security perspective, beacon itself cannot process any tx. However, a shard committee can process tx, the system may introduce more risk when allowing committee able to change by themselves then they process their favor tx. Even updating new committee must respect the defined protocol, but if there is a chance to update to attacker’s “friendly” committee, they are willing create the trigger for committee change happen.

ruler · 24 April 2020 02:24

We can add constraint in block like this. Said, no cross-shard-tx in this kind of block or block producer will consider byzantine.

I totally agree with this point. There will be a risk if we think shard change committee via some kind of tx.
However, if we take a look back at the figure 1 above. Shard node some how send the best collected signatures to beacon node to demonstrate that block can’t be finalized and committee size must be shrink.
No matter shard update committee by itself or beacon. Shard nodes must provide some kind of proof that verifiable by beacon. Otherwise, a malicious validator with enough power to change committees anytime they want. So I think the point here is, what kind of proof that shard present to beacon. This proof must be verifiable by beacon, or even it will be challenged by other nodes in committee

dungtran · 24 April 2020 10:03

Each node submits to the beacon its best signature collection, beacon will choose the biggest one.

dungtran · 24 April 2020 10:22

Update to April 24th:

Let n be the total number of validators (candidate + committee)
Let m be the committee size
Let k be the number of validators owned by attacker

Assume that the probability of a validator to be selected into committee is uniform and independent, i.e this probability = 1/n.

Let P be the probability of k validators of the attacker will be in committee at the same time.
P = C(m-k, n-k)/C(m, n)
where C(x, y) is the total possible combinations of x items from the set of y items.

Assume that given some fixed n, k, we are working on finding the best m (committee size) such that P is minimized.

mesquka · 25 April 2020 04:29

So the node members submit their signature collections to the beacon committee to check? What’s the contents of the batch, just the signatures? Is there a window for a node to challenge? If so how? By submitting their partial signature? Is it possible to verify this post-hoc?

ruler · 27 April 2020 02:46

I think signatures collection might reflect about one’s availability. One problem is, this collection totally depends on the node which create it. Byzantine node can easily delete one’s signatures in its collection. In this case, Byzantine node somehow can manage to kick other node out. Then signature collection doesn’t reflect one’s availability any more. Do you think about some bad situation we might fall in? There are some I think of:

What if we have two signature collection of the same size, but the are not entirely the same?
What if there are two collections, a bigger (B) one and a smaller (S) one, but S is not subset of B? Some signatures in S didn’t appear in B.
What if there are two disjoint set of signatures?

Moreover, should we have challenge time? During this period other node can submit and challenge their signature collect. The purpose is to figure out byzantine node or get the best available signature collection.

dungtran · 27 April 2020 04:26

In the normal case, a node collects enough signatures (> 2/3n) then commits and forward block header + aggregated signatures to the beacon committee. Nodes which failed to collect enough signatures won’t commit or send anything to the beacon committee. However, after multiple consecutive failed to commit blocks (~25-30 blocktimes), node starts to send whatever it can collect to the beacon. Beacon based on this information to update the new committee in case.

ruler · 2 May 2020 02:52

@dungtran after researching for a while, I have some more issues.

Balance shards’ committee size
We need to consider the number of committee between shard, we should try to balance them or one shard will get much more committee than other shard. Which mean one shard is more secure than other shard. Random assignment validator into shard might get equal distribution. But with dynamic committee size approach, one shard can kick off many validators than other shard. This leads to an unbalance in shards committee size.
Slowly adaptive:

If we keep adding without swapping any node out. Byzantine nodes can stay long enough then slowly adaptive and corrupt this shard.

Imagine we have many nodes which enough to corrupt one shard but not the entire blockchain, and we want to put them into one shard.

Step 1: We stake them into our network. Thanks to random assignment, these nodes will be distributed to shard randomly.
Step 2: Then we choose a target shard, which happens to have most of my nodes assigned in (this could be equal, so we can choose any target shard)
Step 3: For nodes, which fall into other shard, will try to get out then re-stake again. Until nodes get into a particular target shard.

I can shut down this node, so this will be kick out of committee. Because of most of my stake are safe. Re-staking would be an easy job until our node get into a target shard.

In case, kick out of committee mean slashing the stake amount. Node can waiting for one epoch the re-staking. This could make slowly adaptive harder to achieve but still possible
The more we tried to get in target shard, the more likely the waiting list will be > 120% committee size. So none of my node in that committee won’t be kicked out.

In step 3: if nodes in our target shard can stay long enough then we just can wait for our nodes in other shard to finish their job and get out then re-stake.
Also notice that, unbalanced in shard committee-size would make this kind of attack more practical. Shard with smaller committee can be chosen as target easily.

Notice: the more stake at the beginning one have, the more reward he/she will get. Which make attackers have more PRV to conduct their attack.

In summary, we could encounter these problem:

Balancing shard committee size
How to fight against slowly adaptive attacker?
A proper slashing mechanism
A supportive validator assignment and committee swap

annie · 8 May 2020 08:00

This received 5/5 vote and is funded @dungtran. Congrats! I will move this under “funded projects”.

Annie

dungtran · 8 May 2020 10:38

Update on May 8th, 2020

Work with @ruler regarding randomly assign nodes to shard

Discuss about possibility of applying “bounded cuckoo rule” http://www.cs.jhu.edu/~baruch/RESEARCH/Research_areas/Peer-to-Peer/2006_SPAA/virtual5.pdf
Life cycle of validator
- Common pool: new staking note
- Probation pool: penalized node by slashing rules
- Shard pool: node assigned in a shard, it may have 3 states: downloading, ready (candidate), duty (in committee).

Note: To avoid the join-leave attack (attacker would like to send its nodes into the same shards), unstake is executed only when it’s in the common pool.

Committee size: taking into account the feasibility of Ethereum bridge feature.

Case 1. Fewer than 30 committee members

Assign all candidates to committees, i.e allow empty waiting list
No lower bound of committee members

Case 2. 30-150 committee members

The size of the waiting list is 100-200% of the current committee size
1. Committee size is unchanged, 10% of committee members are replaced at each epoch.
If number of candidates in waiting list > 200% of the current committee size
1. Increase committee size by up to 15% of the current committee size.
2. While the committee size increases, don’t replace any current committee members.
If the number of candidates in waiting list < 100% of the current committee size
1. Reduce the committee size by up to 10% and keep committee size not less than 30.
2. While committee size reduces, replace 10% of current committee members (after reduced).

Case 3. Greater than 150 committee members

If the size of the waiting list is 400% of the current committee size, every additional 100% of the current committee size the staking amount increased by 1750 PRV for new staking nodes.

For example:
- 400-500% in waiting list, new staking amount = 1750 + 1750 = 3500PRV.
- 500-600% in waiting list, new staking amount = 3500 + 1750 = 5250PRV.
and so on.

Based on our analysis in the post on April 24th, the size of the waiting list is proportional to the security of the chain, but the earning of the validators is reduced. Hence to give credit for early adoption validators, increasing the staking amount for late one is reasonable.

We continue working on the problem of balancing shard committee size

hungngo · 19 May 2020 07:46

Update on May 15th, 2020
Our weekly progress has been update in this document: https://docs.google.com/document/d/19KChrg0B2LmfT_t-K8vovCG-mXUkAYG88j5FQsKPXT4/edit?usp=sharing

Achievement:

Define more specific constraints in section 1
Re-structuring role of nodes in shard pool. Refine staking flow, link.
New swapping rule, which satisfies pre-defined constraints
Incognito chain will give beacon node more power, which make them be able to fully verify shard block

Next Research Problem:

How we handle faulty block?
How to handle data availability in one shard?
What will chain do next if a faulty block is found
Dynamic sharding
Draw more details activity chart for each operation

raz · 19 May 2020 08:02

You are doing a great job @hungngo ! Thank you for being one of the technical saviors of Incognito!

hungngo · 19 May 2020 08:04

thanks a lot @raz

hungngo · 24 May 2020 15:49

Update on May 22th, 2020

Achievement:

Agreement on new feature beacon node: be able to verify all shard (all transactions and block).
Agreement on all problem statement and solutions (provided in link below)
Proposed implementation roadmap

Notice:

Some of minor issue or not urgent issue we still leave un-solved. Because, the more time we have, the more information and knowledge we occupy. Team will solve all any unclear problems later.

Our weekly progress has been update in this document: https://docs.google.com/document/d/19KChrg0B2LmfT_t-K8vovCG-mXUkAYG88j5FQsKPXT4/edit?usp=sharing

hungngo · 29 May 2020 08:39

Dear everyone, we are moving to implementation phase, follow us on Dynamic committee size and dynamic sharding: implementation phase