Approve multiple candidates with a single signature (#1191)

Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,

## Overall idea

When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.

This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:

- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.

## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.

## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x]  Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
This commit is contained in:
Alexandru Gheorghe
2023-12-13 08:43:15 +02:00
committed by GitHub
parent d18a682bf7
commit a84dd0dba5
82 changed files with 5883 additions and 1483 deletions
@@ -22,8 +22,8 @@ use polkadot_node_network_protocol::{
grid_topology::{GridNeighbors, RequiredRouting, SessionBoundGridTopologyStorage},
peer_set::{IsAuthority, PeerSet, ValidationVersion},
v1::{self as protocol_v1, StatementMetadata},
v2 as protocol_v2, vstaging as protocol_vstaging, IfDisconnected, PeerId,
UnifiedReputationChange as Rep, Versioned, View,
v2 as protocol_v2, v3 as protocol_v3, IfDisconnected, PeerId, UnifiedReputationChange as Rep,
Versioned, View,
};
use polkadot_node_primitives::{
SignedFullStatement, Statement, StatementWithPVD, UncheckedSignedFullStatement,
@@ -1075,7 +1075,7 @@ async fn circulate_statement<'a, Context>(
})
.partition::<Vec<_>, _>(|(_, _, version)| match version {
ValidationVersion::V1 => true,
ValidationVersion::V2 | ValidationVersion::VStaging => false,
ValidationVersion::V2 | ValidationVersion::V3 => false,
}); // partition is handy here but not if we add more protocol versions
let payload = v1_statement_message(relay_parent, stored.statement.clone(), metrics);
@@ -1108,8 +1108,7 @@ async fn circulate_statement<'a, Context>(
.collect();
let v2_peers_to_send = filter_by_peer_version(&peers_to_send, ValidationVersion::V2.into());
let vstaging_to_send =
filter_by_peer_version(&peers_to_send, ValidationVersion::VStaging.into());
let v3_to_send = filter_by_peer_version(&peers_to_send, ValidationVersion::V3.into());
if !v2_peers_to_send.is_empty() {
gum::trace!(
@@ -1126,17 +1125,17 @@ async fn circulate_statement<'a, Context>(
.await;
}
if !vstaging_to_send.is_empty() {
if !v3_to_send.is_empty() {
gum::trace!(
target: LOG_TARGET,
?vstaging_to_send,
?v3_to_send,
?relay_parent,
statement = ?stored.statement,
"Sending statement to vstaging peers",
"Sending statement to v3 peers",
);
ctx.send_message(NetworkBridgeTxMessage::SendValidationMessage(
vstaging_to_send,
compatible_v1_message(ValidationVersion::VStaging, payload.clone()).into(),
v3_to_send,
compatible_v1_message(ValidationVersion::V3, payload.clone()).into(),
))
.await;
}
@@ -1472,10 +1471,8 @@ async fn handle_incoming_message<'a, Context>(
let message = match message {
Versioned::V1(m) => m,
Versioned::V2(protocol_v2::StatementDistributionMessage::V1Compatibility(m)) |
Versioned::VStaging(protocol_vstaging::StatementDistributionMessage::V1Compatibility(
m,
)) => m,
Versioned::V2(_) | Versioned::VStaging(_) => {
Versioned::V3(protocol_v3::StatementDistributionMessage::V1Compatibility(m)) => m,
Versioned::V2(_) | Versioned::V3(_) => {
// The higher-level subsystem code is supposed to filter out
// all non v1 messages.
gum::debug!(
@@ -2201,8 +2198,7 @@ fn compatible_v1_message(
ValidationVersion::V1 => Versioned::V1(message),
ValidationVersion::V2 =>
Versioned::V2(protocol_v2::StatementDistributionMessage::V1Compatibility(message)),
ValidationVersion::VStaging => Versioned::VStaging(
protocol_vstaging::StatementDistributionMessage::V1Compatibility(message),
),
ValidationVersion::V3 =>
Versioned::V3(protocol_v3::StatementDistributionMessage::V1Compatibility(message)),
}
}