Files
pezkuwi-subxt/polkadot/zombienet_tests/functional/0001-parachains-pvf.zndsl
T
Alexandru Gheorghe a84dd0dba5 Approve multiple candidates with a single signature (#1191)
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,

## Overall idea

When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.

This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:

- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.

## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.

## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x]  Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
2023-12-13 08:43:15 +02:00

85 lines
6.3 KiB
Plaintext

Description: PVF preparation & execution time
Network: ./0001-parachains-pvf.toml
Creds: config
# Check authority status.
alice: reports node_roles is 4
bob: reports node_roles is 4
charlie: reports node_roles is 4
dave: reports node_roles is 4
eve: reports node_roles is 4
ferdie: reports node_roles is 4
one: reports node_roles is 4
two: reports node_roles is 4
# Ensure parachains are registered.
alice: parachain 2000 is registered within 60 seconds
bob: parachain 2001 is registered within 60 seconds
charlie: parachain 2002 is registered within 60 seconds
dave: parachain 2003 is registered within 60 seconds
ferdie: parachain 2004 is registered within 60 seconds
eve: parachain 2005 is registered within 60 seconds
one: parachain 2006 is registered within 60 seconds
two: parachain 2007 is registered within 60 seconds
# Ensure parachains made progress.
alice: parachain 2000 block height is at least 10 within 300 seconds
alice: parachain 2001 block height is at least 10 within 300 seconds
alice: parachain 2002 block height is at least 10 within 300 seconds
alice: parachain 2003 block height is at least 10 within 300 seconds
alice: parachain 2004 block height is at least 10 within 300 seconds
alice: parachain 2005 block height is at least 10 within 300 seconds
alice: parachain 2006 block height is at least 10 within 300 seconds
alice: parachain 2007 block height is at least 10 within 300 seconds
alice: reports substrate_block_height{status="finalized"} is at least 30 within 400 seconds
# Check preparation time is under 10s.
# Check all buckets <= 10.
alice: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
bob: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
charlie: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
dave: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
ferdie: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
eve: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
one: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
two: reports histogram polkadot_pvf_preparation_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2", "3", "10"] within 10 seconds
# Check all buckets >= 20.
alice: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
bob: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
charlie: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
dave: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
ferdie: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
eve: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
one: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
two: reports histogram polkadot_pvf_preparation_time has 0 samples in buckets ["20", "30", "60", "120", "+Inf"] within 10 seconds
# Check execution time.
# There are two different timeout conditions: DEFAULT_BACKING_EXECUTION_TIMEOUT(2s) and
# DEFAULT_APPROVAL_EXECUTION_TIMEOUT(12s). Currently these are not differentiated by metrics
# because the metrics are defined in `polkadot-node-core-pvf` which is a level below
# the relevant subsystems.
# That being said, we will take the simplifying assumption of testing only the
# 2s timeout.
# We do this check by ensuring all executions fall into bucket le="2" or lower.
# First, check if we have at least 1 sample, but we should have many more.
alice: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
bob: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
charlie: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
dave: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
ferdie: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
eve: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
one: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
two: reports histogram polkadot_pvf_execution_time has at least 1 samples in buckets ["0.1", "0.5", "1", "2"] within 10 seconds
# Check if we have no samples > 2s.
alice: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
bob: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
charlie: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
dave: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
ferdie: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
eve: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
one: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds
two: reports histogram polkadot_pvf_execution_time has 0 samples in buckets ["3", "4", "5", "6", "+Inf"] within 10 seconds