* start working on building the real overseer
Unfortunately, this fails to compile right now due to an upstream
failure to compile which is probably brought on by a recent upgrade
to rustc v1.47.
* fill in AllSubsystems internal constructors
* replace fn make_metrics with Metrics::attempt_to_register
* update to account for #1740
* remove Metrics::register, rename Metrics::attempt_to_register
* add 'static bounds to real_overseer type params
* pass authority_discovery and network_service to real_overseer
It's not straightforwardly obvious that this is the best way to handle
the case when there is no authority discovery service, but it seems
to be the best option available at the moment.
* select a proper database configuration for the availability store db
* use subdirectory for av-store database path
* apply Basti's patch which avoids needing to parameterize everything on Block
* simplify path extraction
* get all tests to compile
* Fix Prometheus double-registry error
for debugging purposes, added this to node/subsystem-util/src/lib.rs:472-476:
```rust
Some(registry) => Self::try_register(registry).map_err(|err| {
eprintln!("PrometheusError calling {}::register: {:?}", std::any::type_name::<Self>(), err);
err
}),
```
That pointed out where the registration was failing, which led to
this fix. The test still doesn't pass, but it now fails in a new
and different way!
* authorities must have authority discovery, but not necessarily overseer handlers
* fix broken SpawnedSubsystem impls
detailed logging determined that using the `Box::new` style of
future generation, the `self.run` method was never being called,
leading to dropped receivers / closed senders for those subsystems,
causing the overseer to shut down immediately.
This is not the final fix needed to get things working properly,
but it's a good start.
* use prometheus properly
Prometheus lets us register simple counters, which aren't very
interesting. It also allows us to register CounterVecs, which are.
With a CounterVec, you can provide a set of labels, which can
later be used to filter the counts.
We were using them wrong, though. This pattern was repeated in a
variety of places in the code:
```rust
// panics with an cardinality mismatch
let my_counter = register(CounterVec::new(opts, &["succeeded", "failed"])?, registry)?;
my_counter.with_label_values(&["succeeded"]).inc()
```
The problem is that the labels provided in the constructor are not
the set of legal values which can be annotated, but a set of individual
label names which can have individual, arbitrary values.
This commit fixes that.
* get av-store subsystem to actually run properly and not die on first signal
* typo fix: incomming -> incoming
* don't disable authority discovery in test nodes
* Fix rococo-v1 missing session keys
* Update node/core/av-store/Cargo.toml
* try dummying out av-store on non-full-nodes
* overseer and subsystems are required only for full nodes
* Reduce the amount of warnings on browser target
* Fix two more warnings
* InclusionInherent should actually have an Inherent module on rococo
* Ancestry: don't return genesis' parent hash
* Update Cargo.lock
* fix broken test
* update test script: specify chainspec as script argument
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Update node/service/src/lib.rs
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* node/service/src/lib: Return error via ? operator
* post-merge blues
* add is_collator flag
* prevent occasional av-store test panic
* simplify fix; expand application
* run authority_discovery in Role::Discover when collating
* distinguish between proposer closed channel errors
* add IsCollator enum, remove is_collator CLI flag
* improve formatting
* remove nop loop
* Fix some stuff
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Robert Habermeier <robert@Roberts-MBP.lan1>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* update primitives
* correct parent_head field
* make hrmp field pub
* refactor validation data: runtime
* refactor validation data: messages
* add arguments to full_validation_data runtime API
* port runtime API
* mostly port over candidate validation
* remove some parameters from ValidationParams
* guide: update candidate validation
* update candidate outputs
* update ValidationOutputs in primitives
* port over candidate validation
* add a new test for no-transient behavior
* update util runtime API wrappers
* candidate backing
* fix missing imports
* change some fields of validation data around
* runtime API impl
* update candidate validation
* fix backing tests
* grumbles from review
* fix av-store tests
* fix some more crates
* fix provisioner tests
* fix availability distribution tests
* port collation-generation to new validation data
* fix overseer tests
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* service-new: cosmetic changes
* overseer: draft of prometheus metrics
* metrics: update active_leaves metrics
* metrics: extract into functions
* metrics: resolve XXX
* metrics: it's ugly, but it works
* Bump Substrate
* metrics: move a bunch of code around
* Bumb substrate again
* metrics: fix a warning
* fix a warning in runtime
* metrics: statements signed
* metrics: statements impl RegisterMetrics
* metrics: refactor Metrics trait
* metrics: add Metrics assoc type to JobTrait
* metrics: move Metrics trait to util
* metrics: fix overseer
* metrics: fix backing
* metrics: fix candidate validation
* metrics: derive Default
* metrics: docs
* metrics: add stubs for other subsystems
* metrics: add more stubs and fix compilation
* metrics: fix doctest
* metrics: move to subsystem
* metrics: fix candidate validation
* metrics: bitfield signing
* metrics: av store
* metrics: chain API
* metrics: runtime API
* metrics: stub for avad
* metrics: candidates seconded
* metrics: ok I gave up
* metrics: provisioner
* metrics: remove a clone by requiring Metrics: Sync
* metrics: YAGNI
* metrics: remove another TODO
* metrics: for later
* metrics: add parachain_ prefix
* metrics: s/signed_statement/signed_statements
* utils: add a comment for job metrics
* metrics: address review comments
* metrics: oops
* metrics: make sure to save files before commit 😅
* use _total suffix for requests metrics
Co-authored-by: Max Inden <mail@max-inden.de>
* metrics: add tests for overseer
* update Cargo.lock
* overseer: add a test for CollationGeneration
* collation-generation: impl metrics
* collation-generation: use kebab-case for name
* collation-generation: add a constructor
Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* sketch out provisioner basics
* handle provisionable data
* stub out select_inherent_data
* split runtime APIs into sub-chapters to improve linkability
* explain SignedAvailabilityBitfield semantics
* add internal link to further documentation
* some more work figuring out how the provisioner can do its thing
* fix broken link
* don't import enum variants where it's one layer deep
* make request_availability_cores a free fn in util
* document more precisely what should happen on block production
* finish first-draft implementation of provisioner
* start working on the full and proper backed candidate selection rule
* Pass number of block under construction via RequestInherentData
* Revert "Pass number of block under construction via RequestInherentData"
This reverts commit 850fe62cc0dfb04252580c21a985962000e693c8.
That initially looked like the better approach--it spent the time
budget for fetching the block number in the proposer, instead of
the provisioner, and that felt more appropriate--but it turns out
not to be obvious how to get the block number of the block under
construction from within the proposer. The Chain API may be less
ideal, but it should be easier to implement.
* wip: get the block under production from the Chain API
* add ChainApiMessage to AllMessages
* don't break the run loop if a provisionable data channel closes
* clone only those backed candidates which are coherent
* propagate chain_api subsystem through various locations
* add delegated_subsystem! macro to ease delegating subsystems
Unfortunately, it doesn't work right:
```
error[E0446]: private type `CandidateBackingJob` in public interface
--> node/core/backing/src/lib.rs:775:1
|
86 | struct CandidateBackingJob {
| - `CandidateBackingJob` declared as private
...
775 | delegated_subsystem!(CandidateBackingJob as CandidateBackingSubsystem);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak private type
```
I'm not sure precisely what's going wrong, here; I suspect the problem is
the use of `$job as JobTrait>::RunArgs` and `::ToJob`; the failure would be
that it's not reifying the types to verify that the actual types are public,
but instead referring to them via `CandidateBackingJob`, which is in fact private;
that privacy is the point.
Going to see if I can generic my way out of this, but we may be headed for a
quick revert here.
* fix delegated_subsystem
The invocation is a bit more verbose than I'd prefer, but it's also
more explicit about what types need to be public. I'll take it as a win.
* add provisioning subsystem; reduce public interface of provisioner
* deny missing docs in provisioner
* refactor core selection per code review suggestion
This is twice as much code when measured by line, but IMO it is
in fact somewhat clearer to read, so overall a win.
Also adds an improved rule for selecting availability bitfields,
which (unlike the previous implementation) guarantees that the
appropriate postconditions hold there.
* fix bad merge double-declaration
* update guide with (hopefully) complete provisioner candidate selection procedure
* clarify candidate selection algorithm
* Revert "clarify candidate selection algorithm"
This reverts commit c68a02ac9cf42b3a4a28eb197d38633a40d0e3e6.
* clarify candidate selection algorithm
* update provisioner to implement candidate selection per the guide
* add test that no more than one bitfield is selected per validator
* add test that each selected bitfield corresponds to an occupied core
* add test that more set bits win conflicts
* add macro for specializing runtime requests; specailize all runtime requests
* add tests harness for select_candidates tests
* add first real select_candidates test, fix test_harness
* add mock overseer and test that success is possible
* add test that the candidate selection algorithm picks the right ones
* make candidate selection test somewhat more stringent