mirror of
https://github.com/pezkuwichain/pezkuwi-subxt.git
synced 2026-06-14 04:01:10 +00:00
Introduce approval-voting/distribution benchmark (#2621)
## Summary Built on top of the tooling and ideas introduced in https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces a synthetic benchmark for measuring and assessing the performance characteristics of the approval-voting and approval-distribution subsystems. Currently this allows, us to simulate the behaviours of these systems based on the following dimensions: ``` TestConfiguration: # Test 1 - objective: !ApprovalsTest last_considered_tranche: 89 min_coalesce: 1 max_coalesce: 6 enable_assignments_v2: true send_till_tranche: 60 stop_when_approved: false coalesce_tranche_diff: 12 workdir_prefix: "/tmp" num_no_shows_per_candidate: 0 approval_distribution_expected_tof: 6.0 approval_distribution_cpu_ms: 3.0 approval_voting_cpu_ms: 4.30 n_validators: 500 n_cores: 100 n_included_candidates: 100 min_pov_size: 1120 max_pov_size: 5120 peer_bandwidth: 524288000000 bandwidth: 524288000000 latency: min_latency: secs: 0 nanos: 1000000 max_latency: secs: 0 nanos: 100000000 error: 0 num_blocks: 10 ``` ## The approach 1. We build a real overseer with the real implementations for approval-voting and approval-distribution subsystems. 2. For a given network size, for each validator we pre-computed all potential assignments and approvals it would send, because this a computation heavy operation this will be cached on a file on disk and be re-used if the generation parameters don't change. 3. The messages will be sent accordingly to the configured parameters and those are split into 3 main benchmarking scenarios. ## Benchmarking scenarios ### Best case scenario *approvals_throughput_best_case.yaml* It send to the approval-distribution only the minimum required tranche to gathered the needed_approvals, so that a candidate is approved. ### Behaviour in the presence of no-shows *approvals_no_shows.yaml* It sends the tranche needed to approve a candidate when we have a maximum of *num_no_shows_per_candidate* tranches with no-shows for each candidate. ### Maximum throughput *approvals_throughput.yaml* It sends all the tranches for each block and measures the used CPU and necessary network bandwidth. by the approval-voting and approval-distribution subsystem. ## How to run it ``` cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml ``` ## Evaluating performance ### Use the real subsystems metrics If you follow the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana for installing locally prometheus and grafana, all real metrics for the `approval-distribution`, `approval-voting` and overseer are available. E.g: <img width="2149" alt="Screenshot 2023-12-05 at 11 07 46" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38"> <img width="2551" alt="Screenshot 2023-12-05 at 11 09 42" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b"> <img width="2154" alt="Screenshot 2023-12-05 at 11 10 15" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f"> <img width="2535" alt="Screenshot 2023-12-05 at 11 10 52" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2"> ### Profile with pyroscope 1. Setup pyroscope following the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope, then run any of the benchmark scenario with `--profile` as the arguments. 2. Open the pyroscope dashboard in grafana, e.g: <img width="2544" alt="Screenshot 2024-01-09 at 17 09 58" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9"> ### Useful logs 1. Network bandwidth requirements: ``` Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block ``` 2. Cpu usage by the approval-distribution/approval-voting subsystems. ``` approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` 3. Time passed until a given block is approved ``` Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101 Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202 ``` ### Using benchmark to quantify improvements from https://github.com/paritytech/polkadot-sdk/pull/1178 + https://github.com/paritytech/polkadot-sdk/pull/1191 Using a versi-node we compare the scenarios where all new optimisations are disabled with a scenarios where tranche0 assignments are sent in a single message and a conservative simulation where the coalescing of approvals gives us just 50% reduction in the number of messages we send. Overall, what we see is a speedup of around 30-40% in the time it takes to process the necessary messages and a 30-40% reduction in the necessary bandwidth. #### Best case scenario comparison(minimum required tranches sent). Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 53289 KiB total, 5328 KiB/block Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block approval-distribution CPU usage 6.732s approval-distribution CPU usage per block 0.673s approval-voting CPU usage 9.523s approval-voting CPU usage per block 0.952s ``` vs Optimisation enabled ``` Number of blocks: 10 Payload bytes received from peers: 32141 KiB total, 3214 KiB/block Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block approval-distribution CPU usage 4.658s approval-distribution CPU usage per block 0.466s approval-voting CPU usage 6.236s approval-voting CPU usage per block 0.624s ``` #### Worst case all tranches sent, very unlikely happens when sharding breaks. Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 746393 KiB total, 74639 KiB/block Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block approval-distribution CPU usage 118.681s approval-distribution CPU usage per block 11.868s approval-voting CPU usage 124.118s approval-voting CPU usage per block 12.412s ``` vs optimised ``` Number of blocks: 10 Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` ## TODOs [x] Polish implementation. [x] Use what we have so far to evaluate https://github.com/paritytech/polkadot-sdk/pull/1191 before merging. [x] List of features and additional dimensions we want to use for benchmarking. [x] Run benchmark on hardware similar with versi and kusama nodes. [ ] Add benchmark to be run in CI for catching regression in performance. [ ] Rebase on latest changes for network emulation. --------- Signed-off-by: Andrei Sandu <andrei-mihail@parity.io> Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by: Andrei Sandu <andrei-mihail@parity.io> Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
90849b66b9
commit
f9f886886b
@@ -55,11 +55,11 @@ pub struct OurAssignment {
|
||||
}
|
||||
|
||||
impl OurAssignment {
|
||||
pub(crate) fn cert(&self) -> &AssignmentCertV2 {
|
||||
pub fn cert(&self) -> &AssignmentCertV2 {
|
||||
&self.cert
|
||||
}
|
||||
|
||||
pub(crate) fn tranche(&self) -> DelayTranche {
|
||||
pub fn tranche(&self) -> DelayTranche {
|
||||
self.tranche
|
||||
}
|
||||
|
||||
@@ -225,7 +225,7 @@ fn assigned_core_transcript(core_index: CoreIndex) -> Transcript {
|
||||
|
||||
/// Information about the world assignments are being produced in.
|
||||
#[derive(Clone, Debug)]
|
||||
pub(crate) struct Config {
|
||||
pub struct Config {
|
||||
/// The assignment public keys for validators.
|
||||
assignment_keys: Vec<AssignmentId>,
|
||||
/// The groups of validators assigned to each core.
|
||||
@@ -321,7 +321,7 @@ impl AssignmentCriteria for RealAssignmentCriteria {
|
||||
/// different times. The idea is that most assignments are never triggered and fall by the wayside.
|
||||
///
|
||||
/// This will not assign to anything the local validator was part of the backing group for.
|
||||
pub(crate) fn compute_assignments(
|
||||
pub fn compute_assignments(
|
||||
keystore: &LocalKeystore,
|
||||
relay_vrf_story: RelayVRFStory,
|
||||
config: &Config,
|
||||
|
||||
@@ -92,11 +92,11 @@ use time::{slot_number_to_tick, Clock, ClockExt, DelayedApprovalTimer, SystemClo
|
||||
mod approval_checking;
|
||||
pub mod approval_db;
|
||||
mod backend;
|
||||
mod criteria;
|
||||
pub mod criteria;
|
||||
mod import;
|
||||
mod ops;
|
||||
mod persisted_entries;
|
||||
mod time;
|
||||
pub mod time;
|
||||
|
||||
use crate::{
|
||||
approval_checking::{Check, TranchesToApproveResult},
|
||||
@@ -159,6 +159,7 @@ pub struct ApprovalVotingSubsystem {
|
||||
db: Arc<dyn Database>,
|
||||
mode: Mode,
|
||||
metrics: Metrics,
|
||||
clock: Box<dyn Clock + Send + Sync>,
|
||||
}
|
||||
|
||||
#[derive(Clone)]
|
||||
@@ -444,6 +445,25 @@ impl ApprovalVotingSubsystem {
|
||||
keystore: Arc<LocalKeystore>,
|
||||
sync_oracle: Box<dyn SyncOracle + Send>,
|
||||
metrics: Metrics,
|
||||
) -> Self {
|
||||
ApprovalVotingSubsystem::with_config_and_clock(
|
||||
config,
|
||||
db,
|
||||
keystore,
|
||||
sync_oracle,
|
||||
metrics,
|
||||
Box::new(SystemClock {}),
|
||||
)
|
||||
}
|
||||
|
||||
/// Create a new approval voting subsystem with the given keystore, config, and database.
|
||||
pub fn with_config_and_clock(
|
||||
config: Config,
|
||||
db: Arc<dyn Database>,
|
||||
keystore: Arc<LocalKeystore>,
|
||||
sync_oracle: Box<dyn SyncOracle + Send>,
|
||||
metrics: Metrics,
|
||||
clock: Box<dyn Clock + Send + Sync>,
|
||||
) -> Self {
|
||||
ApprovalVotingSubsystem {
|
||||
keystore,
|
||||
@@ -452,6 +472,7 @@ impl ApprovalVotingSubsystem {
|
||||
db_config: DatabaseConfig { col_approval_data: config.col_approval_data },
|
||||
mode: Mode::Syncing(sync_oracle),
|
||||
metrics,
|
||||
clock,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -493,15 +514,10 @@ fn db_sanity_check(db: Arc<dyn Database>, config: DatabaseConfig) -> SubsystemRe
|
||||
impl<Context: Send> ApprovalVotingSubsystem {
|
||||
fn start(self, ctx: Context) -> SpawnedSubsystem {
|
||||
let backend = DbBackend::new(self.db.clone(), self.db_config);
|
||||
let future = run::<DbBackend, Context>(
|
||||
ctx,
|
||||
self,
|
||||
Box::new(SystemClock),
|
||||
Box::new(RealAssignmentCriteria),
|
||||
backend,
|
||||
)
|
||||
.map_err(|e| SubsystemError::with_origin("approval-voting", e))
|
||||
.boxed();
|
||||
let future =
|
||||
run::<DbBackend, Context>(ctx, self, Box::new(RealAssignmentCriteria), backend)
|
||||
.map_err(|e| SubsystemError::with_origin("approval-voting", e))
|
||||
.boxed();
|
||||
|
||||
SpawnedSubsystem { name: "approval-voting-subsystem", future }
|
||||
}
|
||||
@@ -909,7 +925,6 @@ enum Action {
|
||||
async fn run<B, Context>(
|
||||
mut ctx: Context,
|
||||
mut subsystem: ApprovalVotingSubsystem,
|
||||
clock: Box<dyn Clock + Send + Sync>,
|
||||
assignment_criteria: Box<dyn AssignmentCriteria + Send + Sync>,
|
||||
mut backend: B,
|
||||
) -> SubsystemResult<()>
|
||||
@@ -923,7 +938,7 @@ where
|
||||
let mut state = State {
|
||||
keystore: subsystem.keystore,
|
||||
slot_duration_millis: subsystem.slot_duration_millis,
|
||||
clock,
|
||||
clock: subsystem.clock,
|
||||
assignment_criteria,
|
||||
spans: HashMap::new(),
|
||||
};
|
||||
|
||||
@@ -549,7 +549,7 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(
|
||||
|
||||
let subsystem = run(
|
||||
context,
|
||||
ApprovalVotingSubsystem::with_config(
|
||||
ApprovalVotingSubsystem::with_config_and_clock(
|
||||
Config {
|
||||
col_approval_data: test_constants::TEST_CONFIG.col_approval_data,
|
||||
slot_duration_millis: SLOT_DURATION_MILLIS,
|
||||
@@ -558,8 +558,8 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(
|
||||
Arc::new(keystore),
|
||||
sync_oracle,
|
||||
Metrics::default(),
|
||||
clock.clone(),
|
||||
),
|
||||
clock.clone(),
|
||||
assignment_criteria,
|
||||
backend,
|
||||
);
|
||||
|
||||
@@ -33,14 +33,14 @@ use std::{
|
||||
};
|
||||
|
||||
use polkadot_primitives::{Hash, ValidatorIndex};
|
||||
const TICK_DURATION_MILLIS: u64 = 500;
|
||||
pub const TICK_DURATION_MILLIS: u64 = 500;
|
||||
|
||||
/// A base unit of time, starting from the Unix epoch, split into half-second intervals.
|
||||
pub(crate) type Tick = u64;
|
||||
pub type Tick = u64;
|
||||
|
||||
/// A clock which allows querying of the current tick as well as
|
||||
/// waiting for a tick to be reached.
|
||||
pub(crate) trait Clock {
|
||||
pub trait Clock {
|
||||
/// Yields the current tick.
|
||||
fn tick_now(&self) -> Tick;
|
||||
|
||||
@@ -49,7 +49,7 @@ pub(crate) trait Clock {
|
||||
}
|
||||
|
||||
/// Extension methods for clocks.
|
||||
pub(crate) trait ClockExt {
|
||||
pub trait ClockExt {
|
||||
fn tranche_now(&self, slot_duration_millis: u64, base_slot: Slot) -> DelayTranche;
|
||||
}
|
||||
|
||||
@@ -61,7 +61,8 @@ impl<C: Clock + ?Sized> ClockExt for C {
|
||||
}
|
||||
|
||||
/// A clock which uses the actual underlying system clock.
|
||||
pub(crate) struct SystemClock;
|
||||
#[derive(Clone)]
|
||||
pub struct SystemClock;
|
||||
|
||||
impl Clock for SystemClock {
|
||||
/// Yields the current tick.
|
||||
@@ -93,11 +94,22 @@ fn tick_to_time(tick: Tick) -> SystemTime {
|
||||
}
|
||||
|
||||
/// assumes `slot_duration_millis` evenly divided by tick duration.
|
||||
pub(crate) fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
|
||||
pub fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
|
||||
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
|
||||
u64::from(slot) * ticks_per_slot
|
||||
}
|
||||
|
||||
/// Converts a tick to the slot number.
|
||||
pub fn tick_to_slot_number(slot_duration_millis: u64, tick: Tick) -> Slot {
|
||||
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
|
||||
(tick / ticks_per_slot).into()
|
||||
}
|
||||
|
||||
/// Converts a tranche from a slot to the tick number.
|
||||
pub fn tranche_to_tick(slot_duration_millis: u64, slot: Slot, tranche: u32) -> Tick {
|
||||
slot_number_to_tick(slot_duration_millis, slot) + tranche as u64
|
||||
}
|
||||
|
||||
/// A list of delayed futures that gets triggered when the waiting time has expired and it is
|
||||
/// time to sign the candidate.
|
||||
/// We have a timer per relay-chain block.
|
||||
|
||||
Reference in New Issue
Block a user