* crate skeleton and type definitions
* add ChainSelectionMessage
* add error type
* run loop
* fix overseer
* simplify determine_new_blocks API
* write an overlay struct and fetch new blocks
* add new function to overlay
* more flow
* add leaves to overlay and add a strong type around leaves-set
* add is_parent_viable
* implement block import, ignoring reversions
* add stagnant-at to overlay
* add stagnant
* add revert consensus log
* flow for reversions
* extract and import block reversions
* recursively update viability
* remove redundant parameter from WriteBlockEntry
* do some removal of viable leaves
* address grumbles
* refactor
* address grumbles
* add comment about non-monotonicity
* extract backend to submodule
* begin the hunt for viable leaves
* viability pivots for updating the active leaves
* remove LeafSearchFrontier
* partially -> explicitly viable and untwist some booleans
* extract tree to submodule
* implement block finality update
* Implement block approval routine
* implement stagnant detection
* ensure blocks pruned on finality are removed from the active leaves set
* write down some planned test cases
* floww
* leaf loading
* implement best_leaf_containing
* write down a few more tests to do
* remove dependence of tree on header
* guide: ChainApiMessage::BlockWeight
* node: BlockWeight ChainAPI
* fix compile issue
* note a few TODOs for the future
* fetch block weight using new BlockWeight ChainAPI
* implement unimplemented
* sort leaves by block number after weight
* remove warnings and add more TODOs
* create test module
* storage for test backend
* wrap inner in mutex
* add write waker query to test backend
* Add OverseerSignal -> FromOverseer conversion
* add test harnes
* add no-op test
* add some more test helpers
* the first test
* more progress on tests
* test two subtrees
* determine-new-blocks: cleaner genesis avoidance and tighter ancestry requests
* don't make ancestry requests when asking for one block
* add a couple more tests
* add to AllMessages in guide
* remove bad spaces from bridge
* compact iterator
* test import with gaps
* more reversion tests
* test finalization pruning subtrees
* fixups
* test clobbering and fix bug in overlay
* exhaustive backend state after finalizaiton tested
* more finality tests
* leaf tests
* test approval
* test ChainSelectionMessage::Leaves thoroughly
* remove TODO
* avoid Ordering::is_ne so CI can build
* comment algorithmic complexity
* Update node/core/chain-selection/src/lib.rs
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* guide: reversion safety
* guide: manage reversion safety in subsystems
* add leaf status to ActivatedLeaf
* add an LRU-cache to overseer for staleness detection
* update ActivatedLeaf usages in tests to contain status field
* add variant where missed accidentally
* add some helpers to LeafStatus
* address grumbles
* overseer: pass messages directly between subsystems
* test that message is held on to
* Update node/overseer/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* give every subsystem an unbounded sender too
* remove metered_channel::name
1. we don't provide good names
2. these names are never used anywhere
* unused mut
* remove unnecessary &mut
* subsystem unbounded_send
* remove unused MaybeTimer
We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.
* remove comment
* split up senders and receivers
* update metrics
* fix tests
* fix test subsystem context
* fix flaky test
* fix docs
* doc
* use select_biased to favor signals
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
* add number to `ActivatedLeavesUpdate`
* update subsystem util and overseer
* use new ActivatedLeaf everywhere
* sort view
* sorted and limited view in network bridge
* use live block hash only if it's newer
* grumples
* Add one Jaeger span per relay parent
This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.
* Fix doc tests
* Replace `PerLeaveSpan` to `PerLeafSpan`
* More renaming
* Moare
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Skip the spans
* Increase `spec_version`
Co-authored-by: Andronik Ordian <write@reusable.software>
* refactor View to include finalized_number
* guide: update the NetworkBridge on BlockFinalized
* av-store: fix the tests
* actually fix tests
* grumbles
* ignore macro doctest
* use Hash::repeat_bytes more consistently
* broadcast empty leaves updates as well
* fix issuing view updates on empty leaves updates
* remove low information density error doc comments
* another round of error dancing
* fix compilation
* remove stale `None` argument
* adjust test, minor slip in command
* only add AvailabilityError for full node features
* another None where none shuld be
* Some code cleanup in overseer
- Switches to select! in the overseer run loop to be more fair about
message processing between the different sources.
- Added a check to only send `ActiveLeaves` if the update actually
contains any data.
* Move the check
* Restore old behavior
* Simplify message sending and signal sending to subsystems
* Update node/subsystem/src/lib.rs
* drop in tracing to replace log
* add structured logging to trace messages
* add structured logging to debug messages
* add structured logging to info messages
* add structured logging to warn messages
* add structured logging to error messages
* normalize spacing and Display vs Debug
* add instrumentation to the various 'fn run'
* use explicit tracing module throughout
* fix availability distribution test
* don't double-print errors
* remove further redundancy from logs
* fix test errors
* fix more test errors
* remove unused kv_log_macro
* fix unused variable
* add tracing spans to collation generation
* add tracing spans to av-store
* add tracing spans to backing
* add tracing spans to bitfield-signing
* add tracing spans to candidate-selection
* add tracing spans to candidate-validation
* add tracing spans to chain-api
* add tracing spans to provisioner
* add tracing spans to runtime-api
* add tracing spans to availability-distribution
* add tracing spans to bitfield-distribution
* add tracing spans to network-bridge
* add tracing spans to collator-protocol
* add tracing spans to pov-distribution
* add tracing spans to statement-distribution
* add tracing spans to overseer
* cleanup
* start working on building the real overseer
Unfortunately, this fails to compile right now due to an upstream
failure to compile which is probably brought on by a recent upgrade
to rustc v1.47.
* fill in AllSubsystems internal constructors
* replace fn make_metrics with Metrics::attempt_to_register
* update to account for #1740
* remove Metrics::register, rename Metrics::attempt_to_register
* add 'static bounds to real_overseer type params
* pass authority_discovery and network_service to real_overseer
It's not straightforwardly obvious that this is the best way to handle
the case when there is no authority discovery service, but it seems
to be the best option available at the moment.
* select a proper database configuration for the availability store db
* use subdirectory for av-store database path
* apply Basti's patch which avoids needing to parameterize everything on Block
* simplify path extraction
* get all tests to compile
* Fix Prometheus double-registry error
for debugging purposes, added this to node/subsystem-util/src/lib.rs:472-476:
```rust
Some(registry) => Self::try_register(registry).map_err(|err| {
eprintln!("PrometheusError calling {}::register: {:?}", std::any::type_name::<Self>(), err);
err
}),
```
That pointed out where the registration was failing, which led to
this fix. The test still doesn't pass, but it now fails in a new
and different way!
* authorities must have authority discovery, but not necessarily overseer handlers
* fix broken SpawnedSubsystem impls
detailed logging determined that using the `Box::new` style of
future generation, the `self.run` method was never being called,
leading to dropped receivers / closed senders for those subsystems,
causing the overseer to shut down immediately.
This is not the final fix needed to get things working properly,
but it's a good start.
* use prometheus properly
Prometheus lets us register simple counters, which aren't very
interesting. It also allows us to register CounterVecs, which are.
With a CounterVec, you can provide a set of labels, which can
later be used to filter the counts.
We were using them wrong, though. This pattern was repeated in a
variety of places in the code:
```rust
// panics with an cardinality mismatch
let my_counter = register(CounterVec::new(opts, &["succeeded", "failed"])?, registry)?;
my_counter.with_label_values(&["succeeded"]).inc()
```
The problem is that the labels provided in the constructor are not
the set of legal values which can be annotated, but a set of individual
label names which can have individual, arbitrary values.
This commit fixes that.
* get av-store subsystem to actually run properly and not die on first signal
* typo fix: incomming -> incoming
* don't disable authority discovery in test nodes
* Fix rococo-v1 missing session keys
* Update node/core/av-store/Cargo.toml
* try dummying out av-store on non-full-nodes
* overseer and subsystems are required only for full nodes
* Reduce the amount of warnings on browser target
* Fix two more warnings
* InclusionInherent should actually have an Inherent module on rococo
* Ancestry: don't return genesis' parent hash
* Update Cargo.lock
* fix broken test
* update test script: specify chainspec as script argument
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Update node/service/src/lib.rs
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* node/service/src/lib: Return error via ? operator
* post-merge blues
* add is_collator flag
* prevent occasional av-store test panic
* simplify fix; expand application
* run authority_discovery in Role::Discover when collating
* distinguish between proposer closed channel errors
* add IsCollator enum, remove is_collator CLI flag
* improve formatting
* remove nop loop
* Fix some stuff
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Robert Habermeier <robert@Roberts-MBP.lan1>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* Make `AllSubsystems` usage easier in tests
This makes the usage of `AllSubsystems` easier in tests by introducing
new methods.
- `dummy` initializes `AllSubsystems` with all systems set to dummy
- `replace_*` to replace any subsystem
Besides that this pr adds a `ForwardSubsystem` that is also useful for
tests. This subsystem will forward all incoming messages to the given channel.
* Update node/overseer/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Move ForwardSubsystem and add a test
* Break some lines
Co-authored-by: Andronik Ordian <write@reusable.software>
* service-new: cosmetic changes
* overseer: draft of prometheus metrics
* metrics: update active_leaves metrics
* metrics: extract into functions
* metrics: resolve XXX
* metrics: it's ugly, but it works
* Bump Substrate
* metrics: move a bunch of code around
* Bumb substrate again
* metrics: fix a warning
* fix a warning in runtime
* metrics: statements signed
* metrics: statements impl RegisterMetrics
* metrics: refactor Metrics trait
* metrics: add Metrics assoc type to JobTrait
* metrics: move Metrics trait to util
* metrics: fix overseer
* metrics: fix backing
* metrics: fix candidate validation
* metrics: derive Default
* metrics: docs
* metrics: add stubs for other subsystems
* metrics: add more stubs and fix compilation
* metrics: fix doctest
* metrics: move to subsystem
* metrics: fix candidate validation
* metrics: bitfield signing
* metrics: av store
* metrics: chain API
* metrics: runtime API
* metrics: stub for avad
* metrics: candidates seconded
* metrics: ok I gave up
* metrics: provisioner
* metrics: remove a clone by requiring Metrics: Sync
* metrics: YAGNI
* metrics: remove another TODO
* metrics: for later
* metrics: add parachain_ prefix
* metrics: s/signed_statement/signed_statements
* utils: add a comment for job metrics
* metrics: address review comments
* metrics: oops
* metrics: make sure to save files before commit 😅
* use _total suffix for requests metrics
Co-authored-by: Max Inden <mail@max-inden.de>
* metrics: add tests for overseer
* update Cargo.lock
* overseer: add a test for CollationGeneration
* collation-generation: impl metrics
* collation-generation: use kebab-case for name
* collation-generation: add a constructor
Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* Initial commit
* WIP
* Make atomic transactions
* Remove pruning code
* Fix build and add a Nop to bridge
* Fixes from review
* Move config struct around for clarity
* Rename constructor and warn on missing docs
* Fix a test and rename a message
* Fix some more reviews
* Obviously failed to rebase cleanly
* add ActiveLeavesUpdate, remove StartWork, StopWork
* replace StartWork, StopWork in subsystem crate tests
* mechanically update OverseerSignal in other modules
* convert overseer to take advantage of new multi-hash update abilities
Note: this does not yet convert the tests; some of the tests now freeze:
test tests::overseer_start_stop_works ... test tests::overseer_start_stop_works has been running for over 60 seconds
test tests::overseer_finalize_works ... test tests::overseer_finalize_works has been running for over 60 seconds
* fix broken overseer tests
* manually impl PartialEq for ActiveLeavesUpdate, rm trait Equivalent
This cleans up the code a bit and makes it easier in the future to
do the right thing when comparing ALUs.
* use target in all network bridge logging
* reduce spamming of and
* get conclude signal working properly; don't allocate a vector
* wip: add test suite / example / explanation for using utility subsystem
Unfortunately, the test fails right now for reasons which seem
very odd. Just have to keep poking at it.
* explicitly import everything
* fix subsystem-util test
The root problem here was two-fold:
- there was a circular dependency from subsystem -> test-helpers/subsystem ->
subsystem
- cfg(test) doesn't propagate between crates
The solution: move the subsystem test helpers into a sub-module
within subsystem. Publicly export them from the previous location
so no other code breaks.
Doing this has an additional benefit: it ensures that no production
code can ever accidentally use the subsystem helpers, as they are compile-
gated on cfg(test).
* fully commit to moving test helpers into a subsystem module
* add some more tests
* get rid of log tests in favor of real error forwarding
It's not obvious whether we'll ever really want to chase down
these errors outside a testing context, but having the capability
won't hurt.
* fix issue which caused test to hang on osx
* only require that job errors are PartialEq when testing
also fix polkadot-node-core-backing tests
* get rid of any notion of partialeq
* rethink testing
Combine tests of starting and stopping job: leaving a test executor
with a job running was pretty clearly the cause of the sometimes-hang.
Also, add a timeout so tests _can't_ hang anymore; they just fail
after a while.
* rename fwd_errors -> forward_errors
* warn on error propagation failure
* fix unused import leftover from merge
* derive eq for subsystemerror
* Add subsystem-util crate.
Start by moving the JobCanceler here.
* copy utility functions for requesting runtime data; generalize
* convert subsystem-util from crate to module in subsystem
The point of making a sub-crate is to ensure that only the necessary
parts of a program get compiled; if a dependent package needed only
subsystem-util, and not subsystem, then subsystem wouldn't need to
be compiled.
However, that will never happen: subsystem-util depends on
subsystem::messages, so subsystem will always be compiled.
Therefore, it makes more sense to add it as a module in the existing
crate than as a new and distinct crate.
* make runtime request sender type generic
* candidate backing subsystem uses util for api requests
* add struct Validator representing the local validator
This struct can be constructed when the local node is a validator;
the constructor fails otherwise. It stores a bit of local data, and
provides some utility methods.
* add alternate constructor for better efficiency
* refactor candidate backing to use utility methods
* fix test breakage caused by reordering tests
* restore test which accidentally got deleted during merge
* start extracting jobs management into helper traits + structs
* use util::{JobHandle, Jobs} in CandidateBackingSubsystem
* implement generic job-manager subsystem impl
This means that the work of implementing a subsystem boils down
to implementing the job, and then writing an appropriate
type definition, i.e.
pub type CandidateBackingSubsystem<Spawner, Context> =
util::JobManager<Spawner, Context, CandidateBackingJob>;
* add hash-extraction helper to messages
* fix errors caused by improper rebase
* doc improvement
* simplify conversion from overseer communication to job message
* document fn hash for all messages
* rename fn hash() -> fn relay_parent
* gracefully shut down running futures on Conclude
* ensure we're validating with the proper validator index
* rename: handle_unhashed_msg -> handle_orphan_msg
* impl Stream for Jobs<Spawner, Job>
This turns out to be relatively complicated and requires some
unsafe code, so we'll want either detailed review, or to choose
to revert this commit.
* add missing documentation for public items
* use pin-project to eliminate unsafe code from this codebase
* rename SenderMessage -> FromJob
* reenvision the subsystem requests as an extension trait
This works within `util.rs`, but fails in `core/backing/src/lib.rs`,
because we don't actually create the struct soon enough. Continuing
down this path would imply substantial rewriting.
* Revert "reenvision the subsystem requests as an extension trait"
This reverts commit a5639e36017a72656b478caddcaa30e2d4e6112a.
The fact is, the new API is more complicated to no real benefit.
* apply suggested futuresunordered join_all impl
* CandidateValidationMessage variants have no top-level relay parents
* rename handle_orphan_msg -> handle_unanchored_msg
* make most node-core-backing types private
Now the only public types exposed in that module are
CandidateBackingSubsystem and ToJob. While ideally we could reduce
the public interface to only the former type, that doesn't work
because ToJob appears in the public interface of CandidateBackingSubsystem.
This also involves changing the definition of CandidateBackingSubsystem;
it is no longer a typedef, but a struct wrapping the job manager.
* create a v1 primitives module
* Improve guide on availability types
* punctuate
* new parachains runtime uses new primitives
* tests of new runtime now use new primitives
* add ErasureChunk to guide
* export erasure chunk from v1 primitives
* subsystem crate uses v1 primitives
* node-primitives uses new v1 primitives
* port overseer to new primitives
* new-proposer uses v1 primitives (no ParachainHost anymore)
* fix no-std compilation for primitives
* service-new uses v1 primitives
* network-bridge uses new primitives
* statement distribution uses v1 primitives
* PoV distribution uses v1 primitives; add PoV::hash fn
* move parachain to v0
* remove inclusion_inherent module and place into v1
* remove everything from primitives crate root
* remove some unused old types from v0 primitives
* point everything else at primitives::v0
* squanch some warns up
* add RuntimeDebug import to no-std as well
* port over statement-table and validation
* fix final errors in validation and node-primitives
* add dummy Ord impl to committed candidate receipt
* guide: update CandidateValidationMessage
* add primitive for validationoutputs
* expand CandidateValidationMessage further
* bikeshed
* add some impls to omitted-validation-data and available-data
* expand CandidateValidationMessage
* make erasure-coding generic over v1/v0
* update usages of erasure-coding
* implement commitments.hash()
* use Arc<Pov> for CandidateValidation
* improve new erasure-coding method names
* fix up candidate backing
* update docs a bit
* fix most tests and add short-circuiting to make_pov_available
* fix remainder of candidate backing tests
* squanching warns
* squanch it up
* some fallout
* overseer fallout
* free from polkadot-test-service hell
* overseer: introduce a utility typemap
* it's ugly but it compiles
* move DummySubsystem to subsystem crate
* fix tests fallout
* use a struct for all subsystems
* more tests fallout
* add missing pov_distribution subsystem
* remove unused imports and bounds
* fix minimal-example
* network bridge skeleton
* move some primitives around and add debug impls
* protocol registration glue & abstract network interface
* add send_msgs to subsystemctx
* select logic
* transform different events into actions and handle
* implement remaining network bridge state machine
* start test skeleton
* make network methods asynchronous
* extract subsystem out to subsystem crate
* port over overseer to subsystem context trait
* fix minimal example
* fix overseer doc test
* update network-bridge crate
* write a subsystem test-helpers crate
* write a network test helper for network-bridge
* set up (broken) view test
* Revamp network to be more async-friendly and not require Sync
* fix spacing
* fix test compilation
* insert side-channel for actions
* Add some more message types to AllMessages
* introduce a test harness
* add some tests
* ensure service compiles and passes tests
* fix typo
* fix service-new compilation
* Subsystem test helpers send messages synchronously
* remove smelly action inspector
* remove superfluous let binding
* fix warnings
* Update node/network/bridge/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* fix compilation
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>