Files
pezkuwi-subxt/polkadot/node/core/approval-voting/src/time.rs
T
Alexandru Gheorghe f9f886886b Introduce approval-voting/distribution benchmark (#2621)
## Summary
Built on top of the tooling and ideas introduced in
https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.

Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
    last_considered_tranche: 89
    min_coalesce: 1
    max_coalesce: 6
    enable_assignments_v2: true
    send_till_tranche: 60
    stop_when_approved: false
    coalesce_tranche_diff: 12
    workdir_prefix: "/tmp"
    num_no_shows_per_candidate: 0
    approval_distribution_expected_tof: 6.0
    approval_distribution_cpu_ms: 3.0
    approval_voting_cpu_ms: 4.30
  n_validators: 500
  n_cores: 100
  n_included_candidates: 100
  min_pov_size: 1120
  max_pov_size: 5120
  peer_bandwidth: 524288000000
  bandwidth: 524288000000
  latency:
    min_latency:
      secs: 0
      nanos: 1000000
    max_latency:
      secs: 0
      nanos: 100000000
  error: 0
  num_blocks: 10
```

## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.

## Benchmarking scenarios

### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.

### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.

### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.

## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```

## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">

<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">

<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">

<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">

### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">



### Useful  logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```

2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```

3. Time passed until a given block is approved
```
 Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```

### Using benchmark to quantify improvements from
https://github.com/paritytech/polkadot-sdk/pull/1178 +
https://github.com/paritytech/polkadot-sdk/pull/1191

Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.

Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.

#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
    Number of blocks: 10
    Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
    Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
    approval-distribution CPU usage 6.732s
    approval-distribution CPU usage per block 0.673s
    approval-voting CPU usage 9.523s
    approval-voting CPU usage per block 0.952s
```

vs Optimisation enabled
```
   Number of blocks: 10
   Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
   Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
   approval-distribution CPU usage 4.658s
   approval-distribution CPU usage per block 0.466s
   approval-voting CPU usage 6.236s
   approval-voting CPU usage per block 0.624s
```

#### Worst case all tranches sent, very unlikely happens when sharding
breaks.

Unoptimised
```
   Number of blocks: 10
   Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
   Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
   approval-distribution CPU usage 118.681s
   approval-distribution CPU usage per block 11.868s
   approval-voting CPU usage 124.118s
   approval-voting CPU usage per block 12.412s
```

vs optimised
```
    Number of blocks: 10
    Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
    Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
    approval-distribution CPU usage 84.061s
    approval-distribution CPU usage per block 8.406s
    approval-voting CPU usage 96.532s
    approval-voting CPU usage per block 9.653s
```


## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
https://github.com/paritytech/polkadot-sdk/pull/1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
2024-02-05 06:46:22 +00:00

266 lines
7.2 KiB
Rust

// Copyright (C) Parity Technologies (UK) Ltd.
// This file is part of Polkadot.
// Polkadot is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// Polkadot is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
// You should have received a copy of the GNU General Public License
// along with Polkadot. If not, see <http://www.gnu.org/licenses/>.
//! Time utilities for approval voting.
use futures::{
future::BoxFuture,
prelude::*,
stream::{FusedStream, FuturesUnordered},
Stream, StreamExt,
};
use polkadot_node_primitives::approval::v1::DelayTranche;
use sp_consensus_slots::Slot;
use std::{
collections::HashSet,
pin::Pin,
task::Poll,
time::{Duration, SystemTime},
};
use polkadot_primitives::{Hash, ValidatorIndex};
pub const TICK_DURATION_MILLIS: u64 = 500;
/// A base unit of time, starting from the Unix epoch, split into half-second intervals.
pub type Tick = u64;
/// A clock which allows querying of the current tick as well as
/// waiting for a tick to be reached.
pub trait Clock {
/// Yields the current tick.
fn tick_now(&self) -> Tick;
/// Yields a future which concludes when the given tick is reached.
fn wait(&self, tick: Tick) -> Pin<Box<dyn Future<Output = ()> + Send + 'static>>;
}
/// Extension methods for clocks.
pub trait ClockExt {
fn tranche_now(&self, slot_duration_millis: u64, base_slot: Slot) -> DelayTranche;
}
impl<C: Clock + ?Sized> ClockExt for C {
fn tranche_now(&self, slot_duration_millis: u64, base_slot: Slot) -> DelayTranche {
self.tick_now()
.saturating_sub(slot_number_to_tick(slot_duration_millis, base_slot)) as u32
}
}
/// A clock which uses the actual underlying system clock.
#[derive(Clone)]
pub struct SystemClock;
impl Clock for SystemClock {
/// Yields the current tick.
fn tick_now(&self) -> Tick {
match SystemTime::now().duration_since(SystemTime::UNIX_EPOCH) {
Err(_) => 0,
Ok(d) => d.as_millis() as u64 / TICK_DURATION_MILLIS,
}
}
/// Yields a future which concludes when the given tick is reached.
fn wait(&self, tick: Tick) -> Pin<Box<dyn Future<Output = ()> + Send>> {
let fut = async move {
let now = SystemTime::now();
let tick_onset = tick_to_time(tick);
if now < tick_onset {
if let Some(until) = tick_onset.duration_since(now).ok() {
futures_timer::Delay::new(until).await;
}
}
};
Box::pin(fut)
}
}
fn tick_to_time(tick: Tick) -> SystemTime {
SystemTime::UNIX_EPOCH + Duration::from_millis(TICK_DURATION_MILLIS * tick)
}
/// assumes `slot_duration_millis` evenly divided by tick duration.
pub fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
u64::from(slot) * ticks_per_slot
}
/// Converts a tick to the slot number.
pub fn tick_to_slot_number(slot_duration_millis: u64, tick: Tick) -> Slot {
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
(tick / ticks_per_slot).into()
}
/// Converts a tranche from a slot to the tick number.
pub fn tranche_to_tick(slot_duration_millis: u64, slot: Slot, tranche: u32) -> Tick {
slot_number_to_tick(slot_duration_millis, slot) + tranche as u64
}
/// A list of delayed futures that gets triggered when the waiting time has expired and it is
/// time to sign the candidate.
/// We have a timer per relay-chain block.
#[derive(Default)]
pub struct DelayedApprovalTimer {
timers: FuturesUnordered<BoxFuture<'static, (Hash, ValidatorIndex)>>,
blocks: HashSet<Hash>,
}
impl DelayedApprovalTimer {
/// Starts a single timer per block hash
///
/// Guarantees that if a timer already exits for the give block hash,
/// no additional timer is started.
pub(crate) fn maybe_arm_timer(
&mut self,
wait_untill: Tick,
clock: &dyn Clock,
block_hash: Hash,
validator_index: ValidatorIndex,
) {
if self.blocks.insert(block_hash) {
let clock_wait = clock.wait(wait_untill);
self.timers.push(Box::pin(async move {
clock_wait.await;
(block_hash, validator_index)
}));
}
}
}
impl Stream for DelayedApprovalTimer {
type Item = (Hash, ValidatorIndex);
fn poll_next(
mut self: std::pin::Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Option<Self::Item>> {
let poll_result = self.timers.poll_next_unpin(cx);
match poll_result {
Poll::Ready(Some(result)) => {
self.blocks.remove(&result.0);
Poll::Ready(Some(result))
},
_ => poll_result,
}
}
}
impl FusedStream for DelayedApprovalTimer {
fn is_terminated(&self) -> bool {
self.timers.is_terminated()
}
}
#[cfg(test)]
mod tests {
use std::time::Duration;
use futures::{executor::block_on, FutureExt, StreamExt};
use futures_timer::Delay;
use polkadot_primitives::{Hash, ValidatorIndex};
use crate::time::{Clock, SystemClock};
use super::DelayedApprovalTimer;
#[test]
fn test_select_empty_timer() {
block_on(async move {
let mut timer = DelayedApprovalTimer::default();
for _ in 1..10 {
let result = futures::select!(
_ = timer.select_next_some() => {
0
}
// Only this arm should fire
_ = Delay::new(Duration::from_millis(100)).fuse() => {
1
}
);
assert_eq!(result, 1);
}
});
}
#[test]
fn test_timer_functionality() {
block_on(async move {
let mut timer = DelayedApprovalTimer::default();
let test_hashes =
vec![Hash::repeat_byte(0x01), Hash::repeat_byte(0x02), Hash::repeat_byte(0x03)];
for (index, hash) in test_hashes.iter().enumerate() {
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
}
let timeout_hash = Hash::repeat_byte(0x02);
for i in 0..test_hashes.len() * 2 {
let result = futures::select!(
(hash, _) = timer.select_next_some() => {
hash
}
// Timers should fire only once, so for the rest of the iterations we should timeout through here.
_ = Delay::new(Duration::from_secs(2)).fuse() => {
timeout_hash
}
);
assert_eq!(test_hashes.get(i).cloned().unwrap_or(timeout_hash), result);
}
// Now check timer can be restarted if already fired
for (index, hash) in test_hashes.iter().enumerate() {
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
timer.maybe_arm_timer(
SystemClock.tick_now() + index as u64,
&SystemClock,
*hash,
ValidatorIndex::from(2),
);
}
for i in 0..test_hashes.len() * 2 {
let result = futures::select!(
(hash, _) = timer.select_next_some() => {
hash
}
// Timers should fire only once, so for the rest of the iterations we should timeout through here.
_ = Delay::new(Duration::from_secs(2)).fuse() => {
timeout_hash
}
);
assert_eq!(test_hashes.get(i).cloned().unwrap_or(timeout_hash), result);
}
});
}
}