Reduce dispute coordinator load (#5785)

* Don't import backing statements directly

into the dispute coordinator. This also gets rid of a redundant
signature check. Both should have some impact on backing performance.
In general this PR should make us scale better in the number of parachains.

Reasoning (aka why this is fine):

For the signature check: As mentioned, it is a redundant check. The
signature has already been checked at this point. This is even made
obvious by the used types. The smart constructor is not perfect as
discussed [here](https://github.com/paritytech/polkadot/issues/3455),
but is still a reasonable security.

For not importing to the dispute-coordinator: This should be good as the
dispute coordinator does scrape backing votes from chain. This suffices
in practice as a super majority of validators must have seen a backing
fork in order for a candidate to get included and only included
candidates pose a threat to our system. The import from chain is
preferable over direct import of backing votes for two reasons:

1. The import is batched, greatly improving import performance. All
   backing votes for a candidate are imported with a single import.
   And indeed we were able to see in metrics that importing votes
   from chain is fast.
2. We do less work in general as not every candidate for which
   statements are gossiped might actually make it on a chain. The
   dispute coordinator as with the current implementation would still
   import and keep those votes around for six sessions.

While redundancy is good for reliability in the event of bugs, this also
comes at a non negligible cost. The dispute-coordinator right now is the
subsystem with the highest load, despite the fact that it should not be
doing much during mormal operation and it is only getting worse
with more parachains as the load is a direct function of the number of statements.

We'll see on Versi how much of a performance improvement this PR

* Get rid of dead code.

* Dont send approval vote

* Make it pass CI

* Bring back tests for fixing them later.

* Explicit signature check.

* Resurrect approval-voting tests (not fixed yet)

* Send out approval votes in dispute-distribution.

Use BTreeMap for ordered dispute votes.

* Bring back an important warning.

* Fix approval voting tests.

* Don't send out dispute message on import + test

+ Some cleanup.

* Guide changes.

Note that the introduced complexity is actually redundant.

* WIP: guide changes.

* Finish guide changes about dispute-coordinator

conceputally. Requires more proof read still.

Also removed obsolete implementation details, where the code is better
suited as the source of truth.

* Finish guide changes for now.

* Remove own approval vote import logic.

* Implement logic for retrieving approval-votes

into approval-voting and approval-distribution subsystems.

* Update roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md

Co-authored-by: asynchronous rob <rphmeier@gmail.com>

* Review feedback.

In particular: Add note about disputes of non included candidates.

* Incorporate Review Remarks

* Get rid of superfluous space.

* Tidy up import logic a bit.

Logical vote import is now separated, making the code more readable and
maintainable.

Also: Accept import if there is at least one invalid signer that has not
exceeded its spam slots, instead of requiring all of them to not exceed
their limits. This is more correct and a preparation for vote batching.

* We don't need/have empty imports.

* Fix tests and bugs.

* Remove error prone redundancy.

* Import approval votes on dispute initiated/concluded.

* Add test for approval vote import.

* Make guide checker happy (hopefully)

* Another sanity check + better logs.

* Reasoning about boundedness.

* Use `CandidateIndex` as opposed to `CoreIndex`.

* Remove redundant import.

* Review remarks.

* Add metric for calls to request signatures

* More review remarks.

* Add metric on imported approval votes.

* Include candidate hash in logs.

* More trace log

* Break cycle.

* Add some tracing.

* Cleanup allowed messages.

* fmt

* Tracing + timeout for get inherent data.

* Better error.

* Break cycle in all places.

* Clarified comment some more.

* Typo.

* Break cycle approval-distribution - approval-voting.

Co-authored-by: asynchronous rob <rphmeier@gmail.com>
This commit is contained in:
Robert Klotzner
2022-08-16 08:16:30 +02:00
committed by GitHub
parent 219fd5a3ed
commit ca6297c853
29 changed files with 2412 additions and 1014 deletions
+6 -87
View File
@@ -31,15 +31,15 @@ use futures::{
use error::{Error, FatalResult};
use polkadot_node_primitives::{
AvailableData, InvalidCandidate, PoV, SignedDisputeStatement, SignedFullStatement, Statement,
ValidationResult, BACKING_EXECUTION_TIMEOUT,
AvailableData, InvalidCandidate, PoV, SignedFullStatement, Statement, ValidationResult,
BACKING_EXECUTION_TIMEOUT,
};
use polkadot_node_subsystem::{
jaeger,
messages::{
AvailabilityDistributionMessage, AvailabilityStoreMessage, CandidateBackingMessage,
CandidateValidationMessage, CollatorProtocolMessage, DisputeCoordinatorMessage,
ProvisionableData, ProvisionerMessage, RuntimeApiRequest, StatementDistributionMessage,
CandidateValidationMessage, CollatorProtocolMessage, ProvisionableData, ProvisionerMessage,
RuntimeApiRequest, StatementDistributionMessage,
},
overseer, ActiveLeavesUpdate, FromOrchestra, OverseerSignal, PerLeafSpan, SpawnedSubsystem,
Stage, SubsystemError,
@@ -50,8 +50,8 @@ use polkadot_node_subsystem_util::{
};
use polkadot_primitives::v2::{
BackedCandidate, CandidateCommitments, CandidateHash, CandidateReceipt, CollatorId,
CommittedCandidateReceipt, CoreIndex, CoreState, Hash, Id as ParaId, SessionIndex,
SigningContext, ValidatorId, ValidatorIndex, ValidatorSignature, ValidityAttestation,
CommittedCandidateReceipt, CoreIndex, CoreState, Hash, Id as ParaId, SigningContext,
ValidatorId, ValidatorIndex, ValidatorSignature, ValidityAttestation,
};
use sp_keystore::SyncCryptoStorePtr;
use statement_table::{
@@ -380,7 +380,6 @@ async fn handle_active_leaves_update<Context>(
let job = CandidateBackingJob {
parent,
session_index,
assignment,
required_collator,
issued_statements: HashSet::new(),
@@ -411,8 +410,6 @@ struct JobAndSpan<Context> {
struct CandidateBackingJob<Context> {
/// The hash of the relay parent on top of which this job is doing it's work.
parent: Hash,
/// The session index this corresponds to.
session_index: SessionIndex,
/// The `ParaId` assigned to this validator
assignment: Option<ParaId>,
/// The collator required to author the candidate, if any.
@@ -783,8 +780,6 @@ async fn validate_and_make_available(
tx_command.send((relay_parent, make_command(res))).await.map_err(Into::into)
}
struct ValidatorIndexOutOfBounds;
#[overseer::contextbounds(CandidateBacking, prefix = self::overseer)]
impl<Context> CandidateBackingJob<Context> {
async fn handle_validated_candidate_command(
@@ -1014,21 +1009,6 @@ impl<Context> CandidateBackingJob<Context> {
)
};
if let Err(ValidatorIndexOutOfBounds) = self
.dispatch_new_statement_to_dispute_coordinator(ctx.sender(), candidate_hash, &statement)
.await
{
gum::warn!(
target: LOG_TARGET,
session_index = ?self.session_index,
relay_parent = ?self.parent,
validator_index = statement.validator_index().0,
"Supposedly 'Signed' statement has validator index out of bounds."
);
return Ok(None)
}
let stmt = primitive_statement_to_table(statement);
let summary = self.table.import_statement(&self.table_context, stmt);
@@ -1083,67 +1063,6 @@ impl<Context> CandidateBackingJob<Context> {
Ok(summary)
}
/// The dispute coordinator keeps track of all statements by validators about every recent
/// candidate.
///
/// When importing a statement, this should be called access the candidate receipt either
/// from the statement itself or from the underlying statement table in order to craft
/// and dispatch the notification to the dispute coordinator.
///
/// This also does bounds-checking on the validator index and will return an error if the
/// validator index is out of bounds for the current validator set. It's expected that
/// this should never happen due to the interface of the candidate backing subsystem -
/// the networking component responsible for feeding statements to the backing subsystem
/// is meant to check the signature and provenance of all statements before submission.
async fn dispatch_new_statement_to_dispute_coordinator(
&self,
sender: &mut impl overseer::CandidateBackingSenderTrait,
candidate_hash: CandidateHash,
statement: &SignedFullStatement,
) -> Result<(), ValidatorIndexOutOfBounds> {
// Dispatch the statement to the dispute coordinator.
let validator_index = statement.validator_index();
let signing_context =
SigningContext { parent_hash: self.parent, session_index: self.session_index };
let validator_public = match self.table_context.validators.get(validator_index.0 as usize) {
None => return Err(ValidatorIndexOutOfBounds),
Some(v) => v,
};
let maybe_candidate_receipt = match statement.payload() {
Statement::Seconded(receipt) => Some(receipt.to_plain()),
Statement::Valid(candidate_hash) => {
// Valid statements are only supposed to be imported
// once we've seen at least one `Seconded` statement.
self.table.get_candidate(&candidate_hash).map(|c| c.to_plain())
},
};
let maybe_signed_dispute_statement = SignedDisputeStatement::from_backing_statement(
statement.as_unchecked(),
signing_context,
validator_public.clone(),
)
.ok();
if let (Some(candidate_receipt), Some(dispute_statement)) =
(maybe_candidate_receipt, maybe_signed_dispute_statement)
{
sender
.send_message(DisputeCoordinatorMessage::ImportStatements {
candidate_hash,
candidate_receipt,
session: self.session_index,
statements: vec![(dispute_statement, validator_index)],
pending_confirmation: None,
})
.await;
}
Ok(())
}
async fn handle_second_msg(
&mut self,
root_span: &jaeger::Span,