Consolidate subsystem spans so they are all children of the leaf-activated root span (#6458)

* Pass the PerLeafSpan as mutable reference to handle_new_head function

* cargo +nightly fmt --all

* Add mock span for test

* cargo +nightly fmt --all

* add new-blocks-hashes to span

* ref span in match statement, set span to disabled if not passed

* remove second match clause, make handle_new_head_span mutable

* cargo +nightly fmt --all

* improve tag on error and warning

* add imported blocks and info span

* cargo +nightly fmt --all

* Improve error for imported_blocks_and_info trace

* format tags on get_header_span

* add lost-to-finality tag

* add missing bracket

* - Add bitfield child span
- Add block db insertion span

* - fix update-bitfield span tag

* - Fix type conversion to u64
- Add missing argument

* - Cargo fmt

* - Test add_follows_from

* - Revert as  relationship between spans not working correctly

* - use drop to test if parent-child relationship can be re-established

* - remove bitfield span, check if parent-child relationship can be reestablished

* - Remove dangling bitfield span which is not used, to see if parent-child relationship can be re-established

* Another dangling bitfield span

* cargo fmt

* - add imported blocks and info span
- add candidate span per candidate

* add tags before moving block_header to push scope

* - Add db-insertion span

* cargo fmt

* fix types

* * Pass mutable reference to span in handle_new_head
* Change get-header-span tags in handle_new_head
* Create cache-session-info span in handle_new_head
* Create optional argument in determine_new_blocks
* Pass mutable reference to handle_new_head_span in determine_new_blocks in handle_new_head function
* Add candidate-hash, candidate-number, lost-to-finality tags to candidate_span in handle_new_head function
* Manually drop db_insertion_span and remove superfluous tags  to it, only keeping approved-bitfields tag
* Add ApprovalVoting stage in jaeger

* * Pass mutable reference to jaeger::Span in stead of PerLeafSpan
* Add block-import span

* *Pass optional_span (optional argument) to determine_new_blocks util function

* * Add num-candidates int tag to block_import_span

* * Add head tag to cache_session_span

* * Create PerLeafSpan in handle_from_overseer (this is required to establish parent-child relationship between approval-voting span, and leaf-activated root span)

* * Add candidate-import-span as child of block-import-span
* Add candidate-hash and num-approval tags to candidate-import-span

* * Fix num-candidate tag to bitvec-len tag in candidate-import-span

* *Fix imported_blocKs_and_info span to create new-block-span as not dealing with candidates

* Consider the future::select! block

* Use HashMap<Hash, jaeger::PerLeafSpan>

* Remove Stage 9

* Add missing spans

* cargo +nightly fmt --all

* Remove optional span argument for determine_new_blocks

* * Remove no-longer needed default PerLeafSpan implementation
* Remove no-longer necessary mock span given re-factoring of handle_new_head() no longer neeing mutable span
* Split validation-result and request-data (availability and validation code) spans into two by dropping request_validation_data_spans
* Remove drop statements for cache_session_info_span
*

* Remove unnecessary span

* Remove another excessively spammy span

* Add missing spans from State in import tests

* Use functional approach to get spans

* - Add functional approach for the approval-voting span
- Add doc on block_numbers given labelling ambiguity
- Add span pruning logic
- Use .add_para_id on validation_result_span

* Replace for hash_set in hash_set_iter with map closure

* cargo +nightly fmt --all

* Change from unconsumed `map` to `.for_each`

* cargo +nightly fmt --all

* Refactor add_para_id to validation_result_span

* cargo +nightly fmt --all

* Remove duplicate tag

* Add missing tag to handle-approved-ancestor span

* Refactor span pruning to only invoke retain once

* Typo in span name

* - Replace unwrap_or with unwrap_or_else due to lazy evaluation of trace-identifier in polkadot_node_jaeger
- Remove some redundant spans

* Add approval-distribution spans

* - Add unwrap_or_else on note-approved-in-chain-selection
- Use child_with_trace_id to add traceID string tag on span (note this does not change the traceID, but just adds a tag)

* cargo +nightly fmt --all

* - Add traceID tags were necessary in approval-voting and availability-distribution
- Always use block-hash tag in stead of relay-parent tag in approval-distribution

* Remove schedule-wakeup span as it will duplicate spans on existing wakeups (which should be a no-op)

* Remove a couple of warnings related to mutability

* Fix failing tests in availability distribution

* Add traceID tag to launch-approval and validation-result

* Reshuffle the validation and validation result spans to where more appropriate and add block-hash tag

* - Add tranche and should-trigger tag to process-wakeup span
- Add candidate-hash and traceID to check-and-import-approval span

* cargo fmt

* - Adjustments after PR comments

* Move span pruning after other pruning logic

* Remove DerefMut - no longer needed

* Relabel request-chunk spans

* - Fix typo in span label
- Add docs for drops

* Add new approval-voting span pruning logic

* Undo removal of !

* cargo fmt
This commit is contained in:
Mattia L.V. Bradascio
2023-03-31 16:54:19 +01:00
committed by GitHub
parent 9fe528d5c7
commit 713f6625fa
12 changed files with 349 additions and 90 deletions
@@ -140,7 +140,18 @@ impl FetchTaskConfig {
sender: mpsc::Sender<FromFetchTask>,
metrics: Metrics,
session_info: &SessionInfo,
span: jaeger::Span,
) -> Self {
let span = span
.child("fetch-task-config")
.with_trace_id(core.candidate_hash)
.with_string_tag("leaf", format!("{:?}", leaf))
.with_validator_index(session_info.our_index)
.with_uint_tag("group-index", core.group_responsible.0 as u64)
.with_relay_parent(core.candidate_descriptor.relay_parent)
.with_string_tag("pov-hash", format!("{:?}", core.candidate_descriptor.pov_hash))
.with_stage(jaeger::Stage::AvailabilityDistribution);
let live_in = vec![leaf].into_iter().collect();
// Don't run tasks for our backing group:
@@ -148,9 +159,6 @@ impl FetchTaskConfig {
return FetchTaskConfig { live_in, prepared_running: None }
}
let span = jaeger::Span::new(core.candidate_hash, "availability-distribution")
.with_stage(jaeger::Stage::AvailabilityDistribution);
let prepared_running = RunningTask {
session_index: session_info.session_index,
group_index: core.group_responsible,
@@ -251,20 +259,18 @@ impl RunningTask {
let mut bad_validators = Vec::new();
let mut succeeded = false;
let mut count: u32 = 0;
let mut _span = self
.span
.child("fetch-task")
.with_chunk_index(self.request.index.0)
.with_relay_parent(self.relay_parent);
let mut span = self.span.child("run-fetch-chunk-task").with_relay_parent(self.relay_parent);
// Try validators in reverse order:
while let Some(validator) = self.group.pop() {
let _try_span = _span.child("try");
// Report retries:
if count > 0 {
self.metrics.on_retry();
}
count += 1;
let _chunk_fetch_span = span
.child("fetch-chunk-request")
.with_chunk_index(self.request.index.0)
.with_stage(jaeger::Stage::AvailabilityDistribution);
// Send request:
let resp = match self.do_request(&validator).await {
Ok(resp) => resp,
@@ -281,6 +287,12 @@ impl RunningTask {
continue
},
};
// We drop the span here, so that the span is not active while we recombine the chunk.
drop(_chunk_fetch_span);
let _chunk_recombine_span = span
.child("recombine-chunk")
.with_chunk_index(self.request.index.0)
.with_stage(jaeger::Stage::AvailabilityDistribution);
let chunk = match resp {
ChunkFetchingResponse::Chunk(resp) => resp.recombine_into_chunk(&self.request),
ChunkFetchingResponse::NoSuchChunk => {
@@ -298,6 +310,12 @@ impl RunningTask {
continue
},
};
// We drop the span so that the span is not active whilst we validate and store the chunk.
drop(_chunk_recombine_span);
let _chunk_validate_and_store_span = span
.child("validate-and-store-chunk")
.with_chunk_index(self.request.index.0)
.with_stage(jaeger::Stage::AvailabilityDistribution);
// Data genuine?
if !self.validate_chunk(&validator, &chunk) {
@@ -308,10 +326,9 @@ impl RunningTask {
// Ok, let's store it and be happy:
self.store_chunk(chunk).await;
succeeded = true;
_span.add_string_tag("success", "true");
break
}
_span.add_int_tag("tries", count as _);
span.add_int_tag("tries", count as _);
if succeeded {
self.metrics.on_fetch(SUCCEEDED);
self.conclude(bad_validators).await;
@@ -33,6 +33,7 @@ use futures::{
};
use polkadot_node_subsystem::{
jaeger,
messages::{ChainApiMessage, RuntimeApiMessage},
overseer, ActivatedLeaf, ActiveLeavesUpdate, LeafStatus,
};
@@ -100,14 +101,22 @@ impl Requester {
ctx: &mut Context,
runtime: &mut RuntimeInfo,
update: ActiveLeavesUpdate,
spans: &HashMap<Hash, jaeger::PerLeafSpan>,
) -> Result<()> {
gum::trace!(target: LOG_TARGET, ?update, "Update fetching heads");
let ActiveLeavesUpdate { activated, deactivated } = update;
// Stale leaves happen after a reversion - we don't want to re-run availability there.
if let Some(leaf) = activated.filter(|leaf| leaf.status == LeafStatus::Fresh) {
let span = spans
.get(&leaf.hash)
.map(|span| span.child("update-fetching-heads"))
.unwrap_or_else(|| jaeger::Span::new(&leaf.hash, "update-fetching-heads"))
.with_string_tag("leaf", format!("{:?}", leaf.hash))
.with_stage(jaeger::Stage::AvailabilityDistribution);
// Order important! We need to handle activated, prior to deactivated, otherwise we might
// cancel still needed jobs.
self.start_requesting_chunks(ctx, runtime, leaf).await?;
self.start_requesting_chunks(ctx, runtime, leaf, &span).await?;
}
self.stop_requesting_chunks(deactivated.into_iter());
@@ -123,7 +132,13 @@ impl Requester {
ctx: &mut Context,
runtime: &mut RuntimeInfo,
new_head: ActivatedLeaf,
span: &jaeger::Span,
) -> Result<()> {
let mut span = span
.child("request-chunks-new-head")
.with_string_tag("leaf", format!("{:?}", new_head.hash))
.with_stage(jaeger::Stage::AvailabilityDistribution);
let sender = &mut ctx.sender().clone();
let ActivatedLeaf { hash: leaf, .. } = new_head;
let (leaf_session_index, ancestors_in_session) = get_block_ancestors_in_same_session(
@@ -133,8 +148,15 @@ impl Requester {
Self::LEAF_ANCESTRY_LEN_WITHIN_SESSION,
)
.await?;
span.add_uint_tag("ancestors-in-session", ancestors_in_session.len() as u64);
// Also spawn or bump tasks for candidates in ancestry in the same session.
for hash in std::iter::once(leaf).chain(ancestors_in_session) {
let span = span
.child("request-chunks-ancestor")
.with_string_tag("leaf", format!("{:?}", hash.clone()))
.with_stage(jaeger::Stage::AvailabilityDistribution);
let cores = get_occupied_cores(sender, hash).await?;
gum::trace!(
target: LOG_TARGET,
@@ -148,7 +170,7 @@ impl Requester {
// The next time the subsystem receives leaf update, some of spawned task will be bumped
// to be live in fresh relay parent, while some might get dropped due to the current leaf
// being deactivated.
self.add_cores(ctx, runtime, leaf, leaf_session_index, cores).await?;
self.add_cores(ctx, runtime, leaf, leaf_session_index, cores, span).await?;
}
Ok(())
@@ -178,15 +200,24 @@ impl Requester {
leaf: Hash,
leaf_session_index: SessionIndex,
cores: impl IntoIterator<Item = OccupiedCore>,
span: jaeger::Span,
) -> Result<()> {
for core in cores {
let mut span = span
.child("check-fetch-candidate")
.with_trace_id(core.candidate_hash)
.with_string_tag("leaf", format!("{:?}", leaf))
.with_candidate(core.candidate_hash)
.with_stage(jaeger::Stage::AvailabilityDistribution);
match self.fetches.entry(core.candidate_hash) {
Entry::Occupied(mut e) =>
// Just book keeping - we are already requesting that chunk:
{
span.add_string_tag("already-requested-chunk", "true");
e.get_mut().add_leaf(leaf);
},
Entry::Vacant(e) => {
span.add_string_tag("already-requested-chunk", "false");
let tx = self.tx.clone();
let metrics = self.metrics.clone();
@@ -201,7 +232,7 @@ impl Requester {
// be fetchable by the state trie.
leaf,
leaf_session_index,
|info| FetchTaskConfig::new(leaf, &core, tx, metrics, info),
|info| FetchTaskConfig::new(leaf, &core, tx, metrics, info, span),
)
.await
.map_err(|err| {
@@ -14,6 +14,8 @@
// You should have received a copy of the GNU General Public License
// along with Polkadot. If not, see <http://www.gnu.org/licenses/>.
use std::collections::HashMap;
use std::{future::Future, sync::Arc};
use futures::FutureExt;
@@ -196,7 +198,7 @@ fn check_ancestry_lookup_in_same_session() {
test_harness(test_state.clone(), |mut ctx| async move {
let chain = &test_state.relay_chain;
let spans: HashMap<Hash, jaeger::PerLeafSpan> = HashMap::new();
let block_number = 1;
let update = ActiveLeavesUpdate {
activated: Some(ActivatedLeaf {
@@ -209,7 +211,7 @@ fn check_ancestry_lookup_in_same_session() {
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;
@@ -229,7 +231,7 @@ fn check_ancestry_lookup_in_same_session() {
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;
@@ -255,7 +257,7 @@ fn check_ancestry_lookup_in_same_session() {
deactivated: vec![chain[1], chain[2]].into(),
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;
@@ -283,7 +285,7 @@ fn check_ancestry_lookup_in_different_sessions() {
test_harness(test_state.clone(), |mut ctx| async move {
let chain = &test_state.relay_chain;
let spans: HashMap<Hash, jaeger::PerLeafSpan> = HashMap::new();
let block_number = 3;
let update = ActiveLeavesUpdate {
activated: Some(ActivatedLeaf {
@@ -296,7 +298,7 @@ fn check_ancestry_lookup_in_different_sessions() {
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;
@@ -314,7 +316,7 @@ fn check_ancestry_lookup_in_different_sessions() {
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;
@@ -332,7 +334,7 @@ fn check_ancestry_lookup_in_different_sessions() {
};
requester
.update_fetching_heads(&mut ctx, &mut runtime, update)
.update_fetching_heads(&mut ctx, &mut runtime, update, &spans)
.await
.expect("Leaf processing failed");
let fetch_tasks = &requester.fetches;