Commit Graph

161 Commits

Author SHA1 Message Date
Robert Habermeier 1f48725c54 Backing fixes (#1897)
* Commit my changes

* some backing fixes

* indentation

* fix backing tests

* tweak includability rules

* comment

* Update node/core/backing/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/backing/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/backing/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/backing/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2020-11-02 18:37:50 +00:00
Peter Goodspeed-Niklaus 1a25c41277 start working on building the real overseer (#1795)
* start working on building the real overseer

Unfortunately, this fails to compile right now due to an upstream
failure to compile which is probably brought on by a recent upgrade
to rustc v1.47.

* fill in AllSubsystems internal constructors

* replace fn make_metrics with Metrics::attempt_to_register

* update to account for #1740

* remove Metrics::register, rename Metrics::attempt_to_register

* add 'static bounds to real_overseer type params

* pass authority_discovery and network_service to real_overseer

It's not straightforwardly obvious that this is the best way to handle
the case when there is no authority discovery service, but it seems
to be the best option available at the moment.

* select a proper database configuration for the availability store db

* use subdirectory for av-store database path

* apply Basti's patch which avoids needing to parameterize everything on Block

* simplify path extraction

* get all tests to compile

* Fix Prometheus double-registry error

for debugging purposes, added this to node/subsystem-util/src/lib.rs:472-476:

```rust
Some(registry) => Self::try_register(registry).map_err(|err| {
	eprintln!("PrometheusError calling {}::register: {:?}", std::any::type_name::<Self>(), err);
	err
}),
```

That pointed out where the registration was failing, which led to
this fix. The test still doesn't pass, but it now fails in a new
and different way!

* authorities must have authority discovery, but not necessarily overseer handlers

* fix broken SpawnedSubsystem impls

detailed logging determined that using the `Box::new` style of
future generation, the `self.run` method was never being called,
leading to dropped receivers / closed senders for those subsystems,
causing the overseer to shut down immediately.

This is not the final fix needed to get things working properly,
but it's a good start.

* use prometheus properly

Prometheus lets us register simple counters, which aren't very
interesting. It also allows us to register CounterVecs, which are.
With a CounterVec, you can provide a set of labels, which can
later be used to filter the counts.

We were using them wrong, though. This pattern was repeated in a
variety of places in the code:

```rust
// panics with an cardinality mismatch
let my_counter = register(CounterVec::new(opts, &["succeeded", "failed"])?, registry)?;
my_counter.with_label_values(&["succeeded"]).inc()
```

The problem is that the labels provided in the constructor are not
the set of legal values which can be annotated, but a set of individual
label names which can have individual, arbitrary values.

This commit fixes that.

* get av-store subsystem to actually run properly and not die on first signal

* typo fix: incomming -> incoming

* don't disable authority discovery in test nodes

* Fix rococo-v1 missing session keys

* Update node/core/av-store/Cargo.toml

* try dummying out av-store on non-full-nodes

* overseer and subsystems are required only for full nodes

* Reduce the amount of warnings on browser target

* Fix two more warnings

* InclusionInherent should actually have an Inherent module on rococo

* Ancestry: don't return genesis' parent hash

* Update Cargo.lock

* fix broken test

* update test script: specify chainspec as script argument

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/service/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* node/service/src/lib: Return error via ? operator

* post-merge blues

* add is_collator flag

* prevent occasional av-store test panic

* simplify fix; expand application

* run authority_discovery in Role::Discover when collating

* distinguish between proposer closed channel errors

* add IsCollator enum, remove is_collator CLI flag

* improve formatting

* remove nop loop

* Fix some stuff

Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Robert Habermeier <robert@Roberts-MBP.lan1>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
2020-10-28 10:26:50 +00:00
Bernhard Schuster f345123748 introduce errors with info (#1834) 2020-10-27 08:10:03 +01:00
Rakan Alhneiti bd75a4ce18 Update to work with async keystore – Companion PR for #7000 (#1740)
* Fix keystore types

* Use SyncCryptoStorePtr

* Borrow keystore

* Fix unused imports

* Fix polkadot service

* Fix bitfield-distribution tests

* Fix indentation

* Fix backing tests

* Fix tests

* Fix provisioner tests

* Removed SyncCryptoStorePtr

* Fix services

* Address PR feedback

* Address PR feedback - 2

* Update CryptoStorePtr imports to be from sp_keystore

* Typo

* Fix CryptoStore import

* Document the reason behind using filesystem keystore

* Remove VALIDATORS

* Fix duplicate dependency

* Mark sp-keystore as optional

* Fix availability distribution

* Fix call to sign_with

* Fix keystore usage

* Remove tokio and fix parachains Cargo config

* Typos

* Fix keystore dereferencing

* Fix CryptoStore import

* Fix provisioner

* Fix node backing

* Update services

* Cleanup dependencies

* Use sync_keystore

* Fix node service

* Fix node service - 2

* Fix node service - 3

* Rename CryptoStorePtr to SyncCryptoStorePtr

* "Update Substrate"

* Apply suggestions from code review

* Update node/core/backing/Cargo.toml

* Update primitives/src/v0.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Fix wasm build

* Update Cargo.lock

Co-authored-by: parity-processbot <>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2020-10-09 10:54:03 +00:00
Andronik Ordian 579614d127 implement remaining subsystem metrics (#1770)
* overseer metrics: messages relayed

* provisioner metrics: cosmetic changes

* candidate selection metrics: cosmetic changes

* availability bitfields metrics

* availability distribution metrics

* PoV distribution metrics

* statement-distribution: small simplification

* statement-distribution: extract log target into a const

* statement-distribution: metrics

* address review nits
2020-10-01 12:08:03 +02:00
Andronik Ordian de05bec4d6 move Metrics to utils (#1765) 2020-09-29 11:42:20 +00:00
Andronik Ordian f9b15f654b provisioner tests: remove tokio from dev-dependencies (#1745)
* provisioner: remove tokio from dev-dependencies

* provisioner: use futures_timer instead
2020-09-23 20:21:03 +02:00
Robert Habermeier 262574fc49 Implement validation data refactor (#1585)
* update primitives

* correct parent_head field

* make hrmp field pub

* refactor validation data: runtime

* refactor validation data: messages

* add arguments to full_validation_data runtime API

* port runtime API

* mostly port over candidate validation

* remove some parameters from ValidationParams

* guide: update candidate validation

* update candidate outputs

* update ValidationOutputs in primitives

* port over candidate validation

* add a new test for no-transient behavior

* update util runtime API wrappers

* candidate backing

* fix missing imports

* change some fields of validation data around

* runtime API impl

* update candidate validation

* fix backing tests

* grumbles from review

* fix av-store tests

* fix some more crates

* fix provisioner tests

* fix availability distribution tests

* port collation-generation to new validation data

* fix overseer tests

* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-08-18 14:41:40 +02:00
Andronik Ordian e7ead40255 initial prometheus metrics (#1536)
* service-new: cosmetic changes

* overseer: draft of prometheus metrics

* metrics: update active_leaves metrics

* metrics: extract into functions

* metrics: resolve XXX

* metrics: it's ugly, but it works

* Bump Substrate

* metrics: move a bunch of code around

* Bumb substrate again

* metrics: fix a warning

* fix a warning in runtime

* metrics: statements signed

* metrics: statements impl RegisterMetrics

* metrics: refactor Metrics trait

* metrics: add Metrics assoc type to JobTrait

* metrics: move Metrics trait to util

* metrics: fix overseer

* metrics: fix backing

* metrics: fix candidate validation

* metrics: derive Default

* metrics: docs

* metrics: add stubs for other subsystems

* metrics: add more stubs and fix compilation

* metrics: fix doctest

* metrics: move to subsystem

* metrics: fix candidate validation

* metrics: bitfield signing

* metrics: av store

* metrics: chain API

* metrics: runtime API

* metrics: stub for avad

* metrics: candidates seconded

* metrics: ok I gave up

* metrics: provisioner

* metrics: remove a clone by requiring Metrics: Sync

* metrics: YAGNI

* metrics: remove another TODO

* metrics: for later

* metrics: add parachain_ prefix

* metrics: s/signed_statement/signed_statements

* utils: add a comment for job metrics

* metrics: address review comments

* metrics: oops

* metrics: make sure to save files before commit 😅

* use _total suffix for requests metrics

Co-authored-by: Max Inden <mail@max-inden.de>

* metrics: add tests for overseer

* update Cargo.lock

* overseer: add a test for CollationGeneration

* collation-generation: impl metrics

* collation-generation: use kebab-case for name

* collation-generation: add a constructor

Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
2020-08-18 11:18:54 +02:00
Peter Goodspeed-Niklaus 9fda6cb416 break out subsystem-util and subsystem-test-helpers into individual crates (#1553)
* break out subsystem-util and subsystem-test-helpers into individual crates

* cause all packages to check successfully
2020-08-07 16:51:36 +02:00
Peter Goodspeed-Niklaus 21cec309a4 implement provisioner (#1473)
* sketch out provisioner basics

* handle provisionable data

* stub out select_inherent_data

* split runtime APIs into sub-chapters to improve linkability

* explain SignedAvailabilityBitfield semantics

* add internal link to further documentation

* some more work figuring out how the provisioner can do its thing

* fix broken link

* don't import enum variants where it's one layer deep

* make request_availability_cores a free fn in util

* document more precisely what should happen on block production

* finish first-draft implementation of provisioner

* start working on the full and proper backed candidate selection rule

* Pass number of block under construction via RequestInherentData

* Revert "Pass number of block under construction via RequestInherentData"

This reverts commit 850fe62cc0dfb04252580c21a985962000e693c8.

That initially looked like the better approach--it spent the time
budget for fetching the block number in the proposer, instead of
the provisioner, and that felt more appropriate--but it turns out
not to be obvious how to get the block number of the block under
construction from within the proposer. The Chain API may be less
ideal, but it should be easier to implement.

* wip: get the block under production from the Chain API

* add ChainApiMessage to AllMessages

* don't break the run loop if a provisionable data channel closes

* clone only those backed candidates which are coherent

* propagate chain_api subsystem through various locations

* add delegated_subsystem! macro to ease delegating subsystems

Unfortunately, it doesn't work right:

```
error[E0446]: private type `CandidateBackingJob` in public interface
   --> node/core/backing/src/lib.rs:775:1
    |
86  | struct CandidateBackingJob {
    | - `CandidateBackingJob` declared as private
...
775 | delegated_subsystem!(CandidateBackingJob as CandidateBackingSubsystem);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak private type
```

I'm not sure precisely what's going wrong, here; I suspect the problem is
the use of `$job as JobTrait>::RunArgs` and `::ToJob`; the failure would be
that it's not reifying the types to verify that the actual types are public,
but instead referring to them via `CandidateBackingJob`, which is in fact private;
that privacy is the point.

Going to see if I can generic my way out of this, but we may be headed for a
quick revert here.

* fix delegated_subsystem

The invocation is a bit more verbose than I'd prefer, but it's also
more explicit about what types need to be public. I'll take it as a win.

* add provisioning subsystem; reduce public interface of provisioner

* deny missing docs in provisioner

* refactor core selection per code review suggestion

This is twice as much code when measured by line, but IMO it is
in fact somewhat clearer to read, so overall a win.

Also adds an improved rule for selecting availability bitfields,
which (unlike the previous implementation) guarantees that the
appropriate postconditions hold there.

* fix bad merge double-declaration

* update guide with (hopefully) complete provisioner candidate selection procedure

* clarify candidate selection algorithm

* Revert "clarify candidate selection algorithm"

This reverts commit c68a02ac9cf42b3a4a28eb197d38633a40d0e3e6.

* clarify candidate selection algorithm

* update provisioner to implement candidate selection per the guide

* add test that no more than one bitfield is selected per validator

* add test that each selected bitfield corresponds to an occupied core

* add test that more set bits win conflicts

* add macro for specializing runtime requests; specailize all runtime requests

* add tests harness for select_candidates tests

* add first real select_candidates test, fix test_harness

* add mock overseer and test that success is possible

* add test that the candidate selection algorithm picks the right ones

* make candidate selection test somewhat more stringent
2020-08-06 18:12:15 +02:00