Approve multiple candidates with a single signature (#1191)

Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,

## Overall idea

When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.

This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:

- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.

## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.

## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x]  Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
This commit is contained in:
Alexandru Gheorghe
2023-12-13 08:43:15 +02:00
committed by GitHub
parent d18a682bf7
commit a84dd0dba5
82 changed files with 5883 additions and 1483 deletions
+164 -1
View File
@@ -16,14 +16,23 @@
//! Time utilities for approval voting.
use futures::prelude::*;
use futures::{
future::BoxFuture,
prelude::*,
stream::{FusedStream, FuturesUnordered},
Stream, StreamExt,
};
use polkadot_node_primitives::approval::v1::DelayTranche;
use sp_consensus_slots::Slot;
use std::{
collections::HashSet,
pin::Pin,
task::Poll,
time::{Duration, SystemTime},
};
use polkadot_primitives::{Hash, ValidatorIndex};
const TICK_DURATION_MILLIS: u64 = 500;
/// A base unit of time, starting from the Unix epoch, split into half-second intervals.
@@ -88,3 +97,157 @@ pub(crate) fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
u64::from(slot) * ticks_per_slot
}
/// A list of delayed futures that gets triggered when the waiting time has expired and it is
/// time to sign the candidate.
/// We have a timer per relay-chain block.
#[derive(Default)]
pub struct DelayedApprovalTimer {
timers: FuturesUnordered<BoxFuture<'static, (Hash, ValidatorIndex)>>,
blocks: HashSet<Hash>,
}
impl DelayedApprovalTimer {
/// Starts a single timer per block hash
///
/// Guarantees that if a timer already exits for the give block hash,
/// no additional timer is started.
pub(crate) fn maybe_arm_timer(
&mut self,
wait_untill: Tick,
clock: &dyn Clock,
block_hash: Hash,
validator_index: ValidatorIndex,
) {
if self.blocks.insert(block_hash) {
let clock_wait = clock.wait(wait_untill);
self.timers.push(Box::pin(async move {
clock_wait.await;
(block_hash, validator_index)
}));
}
}
}
impl Stream for DelayedApprovalTimer {
type Item = (Hash, ValidatorIndex);
fn poll_next(
mut self: std::pin::Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Option<Self::Item>> {
let poll_result = self.timers.poll_next_unpin(cx);
match poll_result {
Poll::Ready(Some(result)) => {
self.blocks.remove(&result.0);
Poll::Ready(Some(result))
},
_ => poll_result,
}
}
}
impl FusedStream for DelayedApprovalTimer {
fn is_terminated(&self) -> bool {
self.timers.is_terminated()
}
}
#[cfg(test)]
mod tests {
use std::time::Duration;
use futures::{executor::block_on, FutureExt, StreamExt};
use futures_timer::Delay;
use polkadot_primitives::{Hash, ValidatorIndex};
use crate::time::{Clock, SystemClock};
use super::DelayedApprovalTimer;
#[test]
fn test_select_empty_timer() {
block_on(async move {
let mut timer = DelayedApprovalTimer::default();
for _ in 1..10 {
let result = futures::select!(
_ = timer.select_next_some() => {
0
}
// Only this arm should fire
_ = Delay::new(Duration::from_millis(100)).fuse() => {
1
}
);
assert_eq!(result, 1);
}
});
}
#[test]
fn test_timer_functionality() {
block_on(async move {
let mut timer = DelayedApprovalTimer::default();
let test_hashes =
vec![Hash::repeat_byte(0x01), Hash::repeat_byte(0x02), Hash::repeat_byte(0x03)];
for (index, hash) in test_hashes.iter().enumerate() {
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
}
let timeout_hash = Hash::repeat_byte(0x02);
for i in 0..test_hashes.len() * 2 {
let result = futures::select!(
(hash, _) = timer.select_next_some() => {
hash
}
// Timers should fire only once, so for the rest of the iterations we should timeout through here.
_ = Delay::new(Duration::from_secs(2)).fuse() => {
timeout_hash
}
);
assert_eq!(test_hashes.get(i).cloned().unwrap_or(timeout_hash), result);
}
// Now check timer can be restarted if already fired
for (index, hash) in test_hashes.iter().enumerate() {
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
}
for i in 0..test_hashes.len() * 2 {
let result = futures::select!(
(hash, _) = timer.select_next_some() => {
hash
}
// Timers should fire only once, so for the rest of the iterations we should timeout through here.
_ = Delay::new(Duration::from_secs(2)).fuse() => {
timeout_hash
}
);
assert_eq!(test_hashes.get(i).cloned().unwrap_or(timeout_hash), result);
}
});
}
}