More about today's network issue

Hi everyone, here’s a follow up regarding today’s network issues.

As some of you will have noticed, the mainnet is now back up and running. What happened on the backend is fairly straightforward – a few servers where many of the core team host nodes went down last night. The outage is linked to a single data center that owns not just these servers, but also many others.

As you probably know, Incognito implements a variant of pBFT at the consensus layer. For every shard block to be created, the consensus layer requires more than ⅔ votes (signatures) from the current shard committee. As a result of the outage, there weren’t enough signatures necessary to produce a new shard block.

We immediately re-deployed our nodes across additional data centers, so that if any provider goes down in the future, there will be a much smaller risk of block creation being halted. This will do for the short term, but we have already begun work on longer-term solutions designed to keep the network stable and adaptable through multiple eventualities.

One such solution is Dynamic Committee Size. Once this is implemented (see Roadmap, Initiative 5, Objective 5.1), similar incidents will not halt block production. New blocks will continue to be produced even if some committee members go offline.

Additionally, we’re considering bringing slashing back. Slashing was part of the testnet consensus last year, but was later removed from the mainnet deployment as the initial design was deemed too stringent. We are currently revising our slashing policy so that it is friendly to validators, while remaining sufficiently rigorous to deter nodes from going offline.

Our aim, as it has always been, is to be as transparent as possible regarding every aspect of the project. So if you have any questions at all, please feel free to reach out.

11 Likes

with that said I guess the issue I noticed earlier with the “recommendation” setting up new nodes on Digital Ocean as only the choice by the community should also be changed.

As you can see holding most of nodes on one vps makes them dependent on that vps stability, that decreases decentralization and network security

just my 2 cents)

I didn’t even know until now! So does this mean that the downtime is also a loss in mining crypto for that period? Thanks for the information!

2 Likes

thanks @nickvasilich. this is why we built node.

node solves 2 problems. the first one is obvious – it’s the ease of use for the everyday consumer. if we do it right, there will be a node in every home.

the second one is less obvious – but it’s for the security and stability of the network. nodes are not centrally located in one or a few data centers like VPS. nodes are spreading out around the world in thousands of individual houses. so if one node is down, it won’t affect the network as much as one data center is down.

3 Likes

for sure - the net was down - no blocks - no rewards per block

1 Like

That beats all odds there! But it’s better to irk the kinks and bugs out now, not went there’s 2-3 million clients!! Good work on getting the mainnet back up!!! Also…is this the reason no badges were paid out yesterday?

sure, and what I mean, is related to problem #2 you designated above - I mean, we shouldn’t suggest new nodes to use only DO as that mentioned in here

image

1 Like

good point. @annie can you consider revising it so that node virtual are not too centrally located at a single provider (digital ocean)?

2 Likes

@ning I really appreciate your detailed answer! Thank you

these fixed nodes are one of the issues I questioned a few weeks ago in the Hard Questions thread. What yesterday happened is not tragic, the team was able to recover relatively fast, but still reflected the current centralization of the network. I could imagine situations when a similar issue has fatal consequences.

The technical background of this decision (fixed nodes) is clear, currently the network would be too small to risk it, so I really hope flexible committee will heal this weakness of Incognito

9 Likes

You also fail to mention that releasing more validotor spots from the admin team would also help prevent this from happening. This is another reason not to push back the original timeline.

2 Likes

rick… the reason was stated in the roadmap initiative 5. it was also mentioned (and linked to) in the above update.

A list of providers and recommended options / settings across key countries would be a good start. In the U.K. for example a couple of prominent providers are ‘Fasthosts’ and ‘Cobweb’. Hope that helps!

@ning Please don’t add slashing, I think it would punish nontechnical users the most and let them turn away from running a node. I think the current decision to not punish someone having a power outage at home is fair… technical users probably use the best cloud hosting providers anyway with 100% uptime

5 Likes

thanks @raz, that’s a fair perspective. i’m gonna loop @dungtran here on this - he’s heading up the dynamic committee size proposal.

@dungtran, let us know your thoughts!

Looking at UK Fasthosts VPS hosting - what would be the minimum to get away with running a virtual node?

Here is the available specs: https://www.fasthosts.co.uk/virtual-private-servers

The 20 a month one

I assume you mean the £24 / month one as that is not an option?

This one

  • 4 vCPU
  • 8GB RAM
  • 120GB SSD
1 Like