Approve multiple candidates with a single signature (#1191)

Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,

## Overall idea

When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.

This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:

- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.

## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.

## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x]  Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
This commit is contained in:
Alexandru Gheorghe
2023-12-13 08:43:15 +02:00
committed by GitHub
parent d18a682bf7
commit a84dd0dba5
82 changed files with 5883 additions and 1483 deletions
@@ -4,10 +4,13 @@ Reading the [section on the approval protocol](../../protocol-approval.md) will
aims of this subsystem.
Approval votes are split into two parts: Assignments and Approvals. Validators first broadcast their assignment to
indicate intent to check a candidate. Upon successfully checking, they broadcast an approval vote. If a validator
doesn't broadcast their approval vote shortly after issuing an assignment, this is an indication that they are being
prevented from recovering or validating the block data and that more validators should self-select to check the
candidate. This is known as a "no-show".
indicate intent to check a candidate. Upon successfully checking, they don't immediately send the vote instead
they queue the check for a short period of time `MAX_APPROVAL_COALESCE_WAIT_TICKS` to give the opportunity of the
validator to vote for more than one candidate. Once MAX_APPROVAL_COALESCE_WAIT_TICKS have passed or at least
`MAX_APPROVAL_COALESCE_COUNT` are ready they broadcast an approval vote for all candidates. If a validator
doesn't broadcast their approval vote shortly after issuing an assignment, this is an indication that they are
being prevented from recovering or validating the block data and that more validators should self-select to
check the candidate. This is known as a "no-show".
The core of this subsystem is a Tick-based timer loop, where Ticks are 500ms. We also reason about time in terms of
`DelayTranche`s, which measure the number of ticks elapsed since a block was produced. We track metadata for all
@@ -120,6 +123,13 @@ struct BlockEntry {
// this block. The block can be considered approved has all bits set to 1
approved_bitfield: Bitfield,
children: Vec<Hash>,
// A list of candidates we have checked, but didn't not sign and
// advertise the vote yet.
candidates_pending_signature: BTreeMap<CandidateIndex, CandidateSigningContext>,
// Assignments we already distributed. A 1 bit means the candidate index for which
// we already have sent out an assignment. We need this to avoid distributing
// multiple core assignments more than once.
distributed_assignments: Bitfield,
}
// slot_duration * 2 + DelayTranche gives the number of delay tranches since the
@@ -303,12 +313,12 @@ entry. The cert itself contains information necessary to determine the candidate
On receiving a `CheckAndImportApproval(indirect_approval_vote, response_channel)` message:
* Fetch the `BlockEntry` from the indirect approval vote's `block_hash`. If none, return `ApprovalCheckResult::Bad`.
* Fetch the `CandidateEntry` from the indirect approval vote's `candidate_index`. If the block did not trigger
* Fetch all `CandidateEntry` from the indirect approval vote's `candidate_indices`. If the block did not trigger
inclusion of enough candidates, return `ApprovalCheckResult::Bad`.
* Construct a `SignedApprovalVote` using the candidate hash and check against the validator's approval key, based on
the session info of the block. If invalid or no such validator, return `ApprovalCheckResult::Bad`.
* Construct a `SignedApprovalVote` using the candidates hashes and check against the validator's approval key,
based on the session info of the block. If invalid or no such validator, return `ApprovalCheckResult::Bad`.
* Send `ApprovalCheckResult::Accepted`
* [Import the checked approval vote](#import-checked-approval)
* [Import the checked approval vote](#import-checked-approval) for all candidates
#### `ApprovalVotingMessage::ApprovedAncestor`
@@ -402,10 +412,25 @@ On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
#### Issue Approval Vote
* Fetch the block entry and candidate entry. Ignore if `None` - we've probably just lost a race with finality.
* Construct a `SignedApprovalVote` with the validator index for the session.
* [Import the checked approval vote](#import-checked-approval). It is "checked" as we've just issued the signature.
* Construct a `IndirectSignedApprovalVote` using the information about the vote.
* Dispatch `ApprovalDistributionMessage::DistributeApproval`.
* IF `MAX_APPROVAL_COALESCE_COUNT` candidates are in the waiting queue
* Construct a `SignedApprovalVote` with the validator index for the session and all candidate hashes in the waiting queue.
* Construct a `IndirectSignedApprovalVote` using the information about the vote.
* Dispatch `ApprovalDistributionMessage::DistributeApproval`.
* ELSE
* Queue the candidate in the `BlockEntry::candidates_pending_signature`
* Arm a per BlockEntry timer with latest tick we can send the vote.
### Delayed vote distribution
* [Issue Approval Vote](#issue-approval-vote) arms once a per block timer if there are no requirements to send the
vote immediately.
* When the timer wakes up it will either:
* IF there is a candidate in the queue past its sending tick:
* Construct a `SignedApprovalVote` with the validator index for the session and all candidate hashes in the waiting queue.
* Construct a `IndirectSignedApprovalVote` using the information about the vote.
* Dispatch `ApprovalDistributionMessage::DistributeApproval`.
* ELSE
* Re-arm the timer with latest tick we have the send a the vote.
### Determining Approval of Candidate