mirror of
https://github.com/pezkuwichain/pezkuwi-subxt.git
synced 2026-04-26 11:07:56 +00:00
Reduce dispute coordinator load (#5785)
* Don't import backing statements directly into the dispute coordinator. This also gets rid of a redundant signature check. Both should have some impact on backing performance. In general this PR should make us scale better in the number of parachains. Reasoning (aka why this is fine): For the signature check: As mentioned, it is a redundant check. The signature has already been checked at this point. This is even made obvious by the used types. The smart constructor is not perfect as discussed [here](https://github.com/paritytech/polkadot/issues/3455), but is still a reasonable security. For not importing to the dispute-coordinator: This should be good as the dispute coordinator does scrape backing votes from chain. This suffices in practice as a super majority of validators must have seen a backing fork in order for a candidate to get included and only included candidates pose a threat to our system. The import from chain is preferable over direct import of backing votes for two reasons: 1. The import is batched, greatly improving import performance. All backing votes for a candidate are imported with a single import. And indeed we were able to see in metrics that importing votes from chain is fast. 2. We do less work in general as not every candidate for which statements are gossiped might actually make it on a chain. The dispute coordinator as with the current implementation would still import and keep those votes around for six sessions. While redundancy is good for reliability in the event of bugs, this also comes at a non negligible cost. The dispute-coordinator right now is the subsystem with the highest load, despite the fact that it should not be doing much during mormal operation and it is only getting worse with more parachains as the load is a direct function of the number of statements. We'll see on Versi how much of a performance improvement this PR * Get rid of dead code. * Dont send approval vote * Make it pass CI * Bring back tests for fixing them later. * Explicit signature check. * Resurrect approval-voting tests (not fixed yet) * Send out approval votes in dispute-distribution. Use BTreeMap for ordered dispute votes. * Bring back an important warning. * Fix approval voting tests. * Don't send out dispute message on import + test + Some cleanup. * Guide changes. Note that the introduced complexity is actually redundant. * WIP: guide changes. * Finish guide changes about dispute-coordinator conceputally. Requires more proof read still. Also removed obsolete implementation details, where the code is better suited as the source of truth. * Finish guide changes for now. * Remove own approval vote import logic. * Implement logic for retrieving approval-votes into approval-voting and approval-distribution subsystems. * Update roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md Co-authored-by: asynchronous rob <rphmeier@gmail.com> * Review feedback. In particular: Add note about disputes of non included candidates. * Incorporate Review Remarks * Get rid of superfluous space. * Tidy up import logic a bit. Logical vote import is now separated, making the code more readable and maintainable. Also: Accept import if there is at least one invalid signer that has not exceeded its spam slots, instead of requiring all of them to not exceed their limits. This is more correct and a preparation for vote batching. * We don't need/have empty imports. * Fix tests and bugs. * Remove error prone redundancy. * Import approval votes on dispute initiated/concluded. * Add test for approval vote import. * Make guide checker happy (hopefully) * Another sanity check + better logs. * Reasoning about boundedness. * Use `CandidateIndex` as opposed to `CoreIndex`. * Remove redundant import. * Review remarks. * Add metric for calls to request signatures * More review remarks. * Add metric on imported approval votes. * Include candidate hash in logs. * More trace log * Break cycle. * Add some tracing. * Cleanup allowed messages. * fmt * Tracing + timeout for get inherent data. * Better error. * Break cycle in all places. * Clarified comment some more. * Typo. * Break cycle approval-distribution - approval-voting. Co-authored-by: asynchronous rob <rphmeier@gmail.com>
This commit is contained in:
@@ -58,6 +58,9 @@ pub enum Error {
|
||||
#[error("failed to send message to CandidateBacking to get backed candidates")]
|
||||
GetBackedCandidatesSend(#[source] mpsc::SendError),
|
||||
|
||||
#[error("Send inherent data timeout.")]
|
||||
SendInherentDataTimeout,
|
||||
|
||||
#[error("failed to send return message with Inherents")]
|
||||
InherentDataReturnChannel,
|
||||
|
||||
|
||||
@@ -35,7 +35,9 @@ use polkadot_node_subsystem::{
|
||||
overseer, ActivatedLeaf, ActiveLeavesUpdate, FromOrchestra, LeafStatus, OverseerSignal,
|
||||
PerLeafSpan, SpawnedSubsystem, SubsystemError,
|
||||
};
|
||||
use polkadot_node_subsystem_util::{request_availability_cores, request_persisted_validation_data};
|
||||
use polkadot_node_subsystem_util::{
|
||||
request_availability_cores, request_persisted_validation_data, TimeoutExt,
|
||||
};
|
||||
use polkadot_primitives::v2::{
|
||||
BackedCandidate, BlockNumber, CandidateHash, CandidateReceipt, CoreState, DisputeState,
|
||||
DisputeStatement, DisputeStatementSet, Hash, MultiDisputeStatementSet, OccupiedCoreAssumption,
|
||||
@@ -55,6 +57,8 @@ mod tests;
|
||||
|
||||
/// How long to wait before proposing.
|
||||
const PRE_PROPOSE_TIMEOUT: std::time::Duration = core::time::Duration::from_millis(2000);
|
||||
/// Some timeout to ensure task won't hang around in the background forever on issues.
|
||||
const SEND_INHERENT_DATA_TIMEOUT: std::time::Duration = core::time::Duration::from_millis(500);
|
||||
|
||||
const LOG_TARGET: &str = "parachain::provisioner";
|
||||
|
||||
@@ -153,6 +157,12 @@ async fn run_iteration<Context>(
|
||||
if let Some(state) = per_relay_parent.get_mut(&hash) {
|
||||
state.is_inherent_ready = true;
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?hash,
|
||||
"Inherent Data became ready"
|
||||
);
|
||||
|
||||
let return_senders = std::mem::take(&mut state.awaiting_inherent);
|
||||
if !return_senders.is_empty() {
|
||||
send_inherent_data_bg(ctx, &state, return_senders, metrics.clone()).await?;
|
||||
@@ -188,11 +198,19 @@ async fn handle_communication<Context>(
|
||||
) -> Result<(), Error> {
|
||||
match message {
|
||||
ProvisionerMessage::RequestInherentData(relay_parent, return_sender) => {
|
||||
gum::trace!(target: LOG_TARGET, ?relay_parent, "Inherent data got requested.");
|
||||
|
||||
if let Some(state) = per_relay_parent.get_mut(&relay_parent) {
|
||||
if state.is_inherent_ready {
|
||||
gum::trace!(target: LOG_TARGET, ?relay_parent, "Calling send_inherent_data.");
|
||||
send_inherent_data_bg(ctx, &state, vec![return_sender], metrics.clone())
|
||||
.await?;
|
||||
} else {
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
?relay_parent,
|
||||
"Queuing inherent data request (inherent data not yet ready)."
|
||||
);
|
||||
state.awaiting_inherent.push(return_sender);
|
||||
}
|
||||
}
|
||||
@@ -202,6 +220,8 @@ async fn handle_communication<Context>(
|
||||
let span = state.span.child("provisionable-data");
|
||||
let _timer = metrics.time_provisionable_data();
|
||||
|
||||
gum::trace!(target: LOG_TARGET, ?relay_parent, "Received provisionable data.");
|
||||
|
||||
note_provisionable_data(state, &span, data);
|
||||
}
|
||||
},
|
||||
@@ -228,28 +248,42 @@ async fn send_inherent_data_bg<Context>(
|
||||
let _span = span;
|
||||
let _timer = metrics.time_request_inherent_data();
|
||||
|
||||
if let Err(err) = send_inherent_data(
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Sending inherent data in background."
|
||||
);
|
||||
|
||||
let send_result = send_inherent_data(
|
||||
&leaf,
|
||||
&signed_bitfields,
|
||||
&backed_candidates,
|
||||
return_senders,
|
||||
&mut sender,
|
||||
&metrics,
|
||||
)
|
||||
.await
|
||||
{
|
||||
gum::warn!(target: LOG_TARGET, err = ?err, "failed to assemble or send inherent data");
|
||||
metrics.on_inherent_data_request(Err(()));
|
||||
} else {
|
||||
metrics.on_inherent_data_request(Ok(()));
|
||||
gum::debug!(
|
||||
target: LOG_TARGET,
|
||||
signed_bitfield_count = signed_bitfields.len(),
|
||||
backed_candidates_count = backed_candidates.len(),
|
||||
leaf_hash = ?leaf.hash,
|
||||
"inherent data sent successfully"
|
||||
);
|
||||
metrics.observe_inherent_data_bitfields_count(signed_bitfields.len());
|
||||
) // Make sure call is not taking forever:
|
||||
.timeout(SEND_INHERENT_DATA_TIMEOUT)
|
||||
.map(|v| match v {
|
||||
Some(r) => r,
|
||||
None => Err(Error::SendInherentDataTimeout),
|
||||
});
|
||||
|
||||
match send_result.await {
|
||||
Err(err) => {
|
||||
gum::warn!(target: LOG_TARGET, err = ?err, "failed to assemble or send inherent data");
|
||||
metrics.on_inherent_data_request(Err(()));
|
||||
},
|
||||
Ok(()) => {
|
||||
metrics.on_inherent_data_request(Ok(()));
|
||||
gum::debug!(
|
||||
target: LOG_TARGET,
|
||||
signed_bitfield_count = signed_bitfields.len(),
|
||||
backed_candidates_count = backed_candidates.len(),
|
||||
leaf_hash = ?leaf.hash,
|
||||
"inherent data sent successfully"
|
||||
);
|
||||
metrics.observe_inherent_data_bitfields_count(signed_bitfields.len());
|
||||
},
|
||||
}
|
||||
};
|
||||
|
||||
@@ -312,12 +346,27 @@ async fn send_inherent_data(
|
||||
from_job: &mut impl overseer::ProvisionerSenderTrait,
|
||||
metrics: &Metrics,
|
||||
) -> Result<(), Error> {
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Requesting availability cores"
|
||||
);
|
||||
let availability_cores = request_availability_cores(leaf.hash, from_job)
|
||||
.await
|
||||
.await
|
||||
.map_err(|err| Error::CanceledAvailabilityCores(err))??;
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Selecting disputes"
|
||||
);
|
||||
let disputes = select_disputes(from_job, metrics, leaf).await?;
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Selected disputes"
|
||||
);
|
||||
|
||||
// Only include bitfields on fresh leaves. On chain reversions, we want to make sure that
|
||||
// there will be at least one block, which cannot get disputed, so the chain can make progress.
|
||||
@@ -326,9 +375,21 @@ async fn send_inherent_data(
|
||||
select_availability_bitfields(&availability_cores, bitfields, &leaf.hash),
|
||||
LeafStatus::Stale => Vec::new(),
|
||||
};
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Selected bitfields"
|
||||
);
|
||||
let candidates =
|
||||
select_candidates(&availability_cores, &bitfields, candidates, leaf.hash, from_job).await?;
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Selected candidates"
|
||||
);
|
||||
|
||||
gum::debug!(
|
||||
target: LOG_TARGET,
|
||||
availability_cores_len = availability_cores.len(),
|
||||
@@ -342,6 +403,12 @@ async fn send_inherent_data(
|
||||
let inherent_data =
|
||||
ProvisionerInherentData { bitfields, backed_candidates: candidates, disputes };
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?leaf.hash,
|
||||
"Sending back inherent data to requesters."
|
||||
);
|
||||
|
||||
for return_sender in return_senders {
|
||||
return_sender
|
||||
.send(inherent_data.clone())
|
||||
@@ -765,6 +832,12 @@ async fn select_disputes(
|
||||
active
|
||||
};
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_parent = ?_leaf.hash,
|
||||
"Request recent disputes"
|
||||
);
|
||||
|
||||
// We use `RecentDisputes` instead of `ActiveDisputes` because redundancy is fine.
|
||||
// It's heavier than `ActiveDisputes` but ensures that everything from the dispute
|
||||
// window gets on-chain, unlike `ActiveDisputes`.
|
||||
@@ -773,6 +846,18 @@ async fn select_disputes(
|
||||
// If the active ones are already exceeding the bounds, randomly select a subset.
|
||||
let recent = request_disputes(sender, RequestType::Recent).await;
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Received recent disputes"
|
||||
);
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Request on chain disputes"
|
||||
);
|
||||
|
||||
// On chain disputes are fetched from the runtime. We want to prioritise the inclusion of unknown
|
||||
// disputes in the inherent data. The call relies on staging Runtime API. If the staging API is not
|
||||
// enabled in the binary an empty set is generated which doesn't affect the rest of the logic.
|
||||
@@ -788,6 +873,18 @@ async fn select_disputes(
|
||||
},
|
||||
};
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Received on chain disputes"
|
||||
);
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Filtering disputes"
|
||||
);
|
||||
|
||||
let disputes = if recent.len() > MAX_DISPUTES_FORWARDED_TO_RUNTIME {
|
||||
gum::warn!(
|
||||
target: LOG_TARGET,
|
||||
@@ -805,20 +902,34 @@ async fn select_disputes(
|
||||
recent
|
||||
};
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Calling `request_votes`"
|
||||
);
|
||||
|
||||
// Load all votes for all disputes from the coordinator.
|
||||
let dispute_candidate_votes = request_votes(sender, disputes).await;
|
||||
|
||||
gum::trace!(
|
||||
target: LOG_TARGET,
|
||||
relay_paent = ?_leaf.hash,
|
||||
"Finished `request_votes`"
|
||||
);
|
||||
|
||||
// Transform all `CandidateVotes` into `MultiDisputeStatementSet`.
|
||||
Ok(dispute_candidate_votes
|
||||
.into_iter()
|
||||
.map(|(session_index, candidate_hash, votes)| {
|
||||
let valid_statements =
|
||||
votes.valid.into_iter().map(|(s, i, sig)| (DisputeStatement::Valid(s), i, sig));
|
||||
let valid_statements = votes
|
||||
.valid
|
||||
.into_iter()
|
||||
.map(|(i, (s, sig))| (DisputeStatement::Valid(s), i, sig));
|
||||
|
||||
let invalid_statements = votes
|
||||
.invalid
|
||||
.into_iter()
|
||||
.map(|(s, i, sig)| (DisputeStatement::Invalid(s), i, sig));
|
||||
.map(|(i, (s, sig))| (DisputeStatement::Invalid(s), i, sig));
|
||||
|
||||
metrics.inc_valid_statements_by(valid_statements.len());
|
||||
metrics.inc_invalid_statements_by(invalid_statements.len());
|
||||
|
||||
@@ -571,8 +571,8 @@ mod select_disputes {
|
||||
let mut res = Vec::new();
|
||||
let v = CandidateVotes {
|
||||
candidate_receipt: test_helpers::dummy_candidate_receipt(leaf.hash.clone()),
|
||||
valid: vec![],
|
||||
invalid: vec![],
|
||||
valid: BTreeMap::new(),
|
||||
invalid: BTreeMap::new(),
|
||||
};
|
||||
for r in disputes.iter() {
|
||||
res.push((r.0, r.1, v.clone()));
|
||||
|
||||
Reference in New Issue
Block a user