* Move NetworkBridgeEvent to subsystem::messages.
It is not protocol related at all, it is in fact only part of the
subsystem communication as it gets wrapped into messages of each
subsystem.
* Request/response infrastructure is taking shape.
WIP: Does not compile.
* Multiplexer variant not supported by Rusts type system.
* request_response::request type checks.
* Cleanup.
* Minor fixes for request_response.
* Implement request sending + move multiplexer.
Request multiplexer is moved to bridge, as there the implementation is
more straight forward as we can specialize on `AllMessages` for the
multiplexing target.
Sending of requests is mostly complete, apart from a few `From`
instances. Receiving is also almost done, initializtion needs to be
fixed and the multiplexer needs to be invoked.
* Remove obsolete multiplexer.
* Initialize bridge with multiplexer.
* Finish generic request sending/receiving.
Subsystems are now able to receive and send requests and responses via
the overseer.
* Doc update.
* Fixes.
* Link issue for not yet implemented code.
* Fixes suggested by @ordian - thanks!
- start encoding at 0
- don't crash on zero protocols
- don't panic on not yet implemented request handling
* Update node/network/protocol/src/request_response/v1.rs
Use index 0 instead of 1.
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/network/protocol/src/request_response.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Fix existing tests.
* Better avoidance of division by zoro errors.
* Doc fixes.
* send_request -> start_request.
* Fix missing renamings.
* Update substrate.
* Pass TryConnect instead of true.
* Actually import `IfDisconnected`.
* Fix wrong import.
* Update node/network/bridge/src/lib.rs
typo
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
* Update node/network/bridge/src/multiplexer.rs
Remove redundant import.
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
* Stop doing tracing from within `From` instance.
Thanks for the catch @tomaka!
* Get rid of redundant import.
* Formatting cleanup.
* Fix tests.
* Add link to issue.
* Clarify comments some more.
* Fix tests.
* Formatting fix.
* tabs
* Fix link
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Use map_err.
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Improvements inspired by suggestions by @drahnr.
- Channel size is now determined by function.
- Explicitely scope NetworkService::start_request.
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* PVD: `block_number`->`relay_parent_number`
* ValidationParams: `relay_chain_height`->`relay_parent_number`
* Expose DMQ MQC hash as a well-known-key
This way the relay storage merkle proofs will be able to obtain the DMQ
MQC hash and we will be able to remove the it from the
PersistedValidationData struct.
* PersistedValidationData: Remove HRMP MQC heads
* PersistedValidationData: Remove `dmq_mqc_head`
* Expose the HRMP ingress channel index as a well-known-key
This way a parachain (PVF and collator) can find all the parachains that
have an outbound channel to the given one. That allows in turn to find
all the inbound channels for the given para.
Having access to that allows the parachain to get the same information
as the hrmp_mqc_heads now provide.
* Rename `relay_storage_root` to `relay_parent_storage_root`
* Do not send empty view updates to peers
It happened that we send empty view updates to our peers, because we
only updated our finalized block. This could lead to situations where we
overwhelmed sub systems with too many messages. On Rococo this lead to
constant restarts of our nodes, because some node apparently was
finalizing a lot of blocks.
To prevent this, the pr is doing the following:
1. If a peer sends us an empty view, we report this peer and decrease it
reputation.
2. We ensure that we only send a view update when the `heads` changed
and not only the `finalized_number`.
3. We do not send empty `ActiveLeavesUpdates` from the overseer, as this
makes no sense to send these empty updates. If some subsystem is relying
on the finalized block, it needs to listen for the overseer signal.
* Update node/network/bridge/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Don't work if they're are no added heads
* Fix test
* Ahhh
* More fixes
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Store all chunks and in a single transaction
* Adds chunks LRU to store
* Add pruning records metrics
* Use honest cache instead of LRU
* Remove unnecessary optional cache
* Fix review nits that are fixable
* Add one Jaeger span per relay parent
This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.
* Fix doc tests
* Replace `PerLeaveSpan` to `PerLeafSpan`
* More renaming
* Moare
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Skip the spans
* Increase `spec_version`
Co-authored-by: Andronik Ordian <write@reusable.software>
* Cont.: Implement the state root obtaining during inclusion
During inclusion now we obtain the storage root by passing it through
the inclusion_inherent.
* Fix tests
* Bump rococo spec version
* Reorder the parent header into the end
of the inclusion inherent.
When the parent header is in the beginning, it shifts the other two
fields, so that a previous version won't be able to decode that. If
we put the parent header in the end, the other two fields will stay
at their positions, thus make it possible to decode with the previous
version.
That allows us to perform upgrade of rococo runtime without needing of
simultanuous upgrade of nodes and runtime, or restart of the network.
* Squash a stray tab
* guide: add candidate information to OccupiedCore
* add descriptor and hash to occupied core type
* guide: add candidate hash to inclusion
* runtime: return candidate info in core state
* bitfield signing: stop querying runtime as much
* minimize going to runtime in availability distribution
* fix availability distribution tests
* guide: remove para ID from Occupied core
* get all crates compiling
* Fix bug and further optimizations in availability distribution
- There was a bug that resulted in only getting one candidate per block
as the candidates were put into the hashmap with the relay block hash as
key. The solution for this is to use the candidate hash and the relay
block hash as key.
- We stored received/sent messages with the candidate hash and chunk
index as key. The candidate hash wasn't required in this case, as the
messages are already stored per candidate.
* Update node/core/bitfield-signing/src/lib.rs
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
* Remove the reverse map
* major refactor of receipts & query_live
* finish refactoring
remove ancestory mapping,
improve relay-parent cleanup & receipts-cache cleanup,
add descriptor to `PerCandidate`
* rename and rewrite query_pending_availability
* add a bunch of consistency tests
* Add some last changes
* xy
* fz
* Make it compile again
* Fix one test
* Fix logging
* Remove some buggy code
* Make tests work again
* Move stuff around
* Remove dbg
* Remove state from test_harness
* More refactor and new test
* New test and fixes
* Move metric
* Remove "duplicated code"
* Fix tests
* New test
* Change break to continue
* Update node/core/av-store/src/lib.rs
* Update node/core/av-store/src/lib.rs
* Update node/core/bitfield-signing/src/lib.rs
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
* update guide to match live_candidates changes
* add comment
* fix bitfield signing
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
* refactor View to include finalized_number
* guide: update the NetworkBridge on BlockFinalized
* av-store: fix the tests
* actually fix tests
* grumbles
* ignore macro doctest
* use Hash::repeat_bytes more consistently
* broadcast empty leaves updates as well
* fix issuing view updates on empty leaves updates
* use snake_case for log targets
* remove unused continue
* validator_discovery: when disconnecting, use all addresses
* validator_discovery: simplify request revokation
* fix a typo
* reexport prometheus-super for ease of use of other subsystems
* add some prometheus timers for collation generation subsystem
* add timing metrics to av-store
* add metrics to candidate backing
* add timing metric to bitfield signing
* add timing metrics to candidate selection
* add timing metrics to candidate-validation
* add timing metrics to chain-api
* add timing metrics to provisioner
* add timing metrics to runtime-api
* add timing metrics to availability-distribution
* add timing metrics to bitfield-distribution
* add timing metrics to collator protocol: collator side
* add timing metrics to collator protocol: validator side
* fix candidate validation test failures
* add timing metrics to pov distribution
* add timing metrics to statement-distribution
* use substrate_prometheus_endpoint prometheus reexport instead of prometheus_super
* don't include JOB_DELAY in bitfield-signing metrics
* give adder-collator ability to easily export its genesis-state and validation code
* wip: adder-collator pushbutton script
* don't attempt to register the adder-collator automatically
Instead, get these values with
```sh
target/release/adder-collator export-genesis-state
target/release/adder-collator export-genesis-wasm
```
And then register the parachain on https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A9944#/explorer
To collect prometheus data, after running the script, create `prometheus.yml` per the instructions
at https://www.notion.so/paritytechnologies/Setting-up-Prometheus-locally-835cb3a9df7541a781c381006252b5ff
and then run:
```sh
docker run -v `pwd`/prometheus.yml:/etc/prometheus/prometheus.yml:z --network host prom/prometheus
```
Demonstrates that data makes it across to prometheus, though it is likely to be useful in the future
to tweak the buckets.
* Update parachain/test-parachains/adder/collator/src/cli.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* use the grandpa-pause parameter
* skip metrics in tracing instrumentation
* remove unnecessary grandpa_pause cli param
Co-authored-by: Andronik Ordian <write@reusable.software>
* drop in tracing to replace log
* add structured logging to trace messages
* add structured logging to debug messages
* add structured logging to info messages
* add structured logging to warn messages
* add structured logging to error messages
* normalize spacing and Display vs Debug
* add instrumentation to the various 'fn run'
* use explicit tracing module throughout
* fix availability distribution test
* don't double-print errors
* remove further redundancy from logs
* fix test errors
* fix more test errors
* remove unused kv_log_macro
* fix unused variable
* add tracing spans to collation generation
* add tracing spans to av-store
* add tracing spans to backing
* add tracing spans to bitfield-signing
* add tracing spans to candidate-selection
* add tracing spans to candidate-validation
* add tracing spans to chain-api
* add tracing spans to provisioner
* add tracing spans to runtime-api
* add tracing spans to availability-distribution
* add tracing spans to bitfield-distribution
* add tracing spans to network-bridge
* add tracing spans to collator-protocol
* add tracing spans to pov-distribution
* add tracing spans to statement-distribution
* add tracing spans to overseer
* cleanup
We need to distribute the PoV after we have seconded it. Other nodes
that will receive our `Secondded` statement and want to validate the
candidate another time will request this PoV from us.
* Make `CandidateHash` a real type
This pr adds a new type `CandidateHash` that is used instead of the
opaque `Hash` type. This helps to ensure on the type system level that
we are passing the correct types.
This pr also fixes wrong usage of `relay_parent` as `candidate_hash`
when communicating with the av storage.
* Update core-primitives/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Wrap the lines
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* DMP: data structures and plumbing
* DMP: Implement DMP logic in the router module
DMP: Integrate DMP parts into the inclusion module
* DMP: Introduce the max size limit for the size of a downward message
* DMP: Runtime API for accessing inbound messages
* OCD
Small clean ups
* DMP: fix the naming of the error
* DMP: add caution about a non-existent recipient
* start working on building the real overseer
Unfortunately, this fails to compile right now due to an upstream
failure to compile which is probably brought on by a recent upgrade
to rustc v1.47.
* fill in AllSubsystems internal constructors
* replace fn make_metrics with Metrics::attempt_to_register
* update to account for #1740
* remove Metrics::register, rename Metrics::attempt_to_register
* add 'static bounds to real_overseer type params
* pass authority_discovery and network_service to real_overseer
It's not straightforwardly obvious that this is the best way to handle
the case when there is no authority discovery service, but it seems
to be the best option available at the moment.
* select a proper database configuration for the availability store db
* use subdirectory for av-store database path
* apply Basti's patch which avoids needing to parameterize everything on Block
* simplify path extraction
* get all tests to compile
* Fix Prometheus double-registry error
for debugging purposes, added this to node/subsystem-util/src/lib.rs:472-476:
```rust
Some(registry) => Self::try_register(registry).map_err(|err| {
eprintln!("PrometheusError calling {}::register: {:?}", std::any::type_name::<Self>(), err);
err
}),
```
That pointed out where the registration was failing, which led to
this fix. The test still doesn't pass, but it now fails in a new
and different way!
* authorities must have authority discovery, but not necessarily overseer handlers
* fix broken SpawnedSubsystem impls
detailed logging determined that using the `Box::new` style of
future generation, the `self.run` method was never being called,
leading to dropped receivers / closed senders for those subsystems,
causing the overseer to shut down immediately.
This is not the final fix needed to get things working properly,
but it's a good start.
* use prometheus properly
Prometheus lets us register simple counters, which aren't very
interesting. It also allows us to register CounterVecs, which are.
With a CounterVec, you can provide a set of labels, which can
later be used to filter the counts.
We were using them wrong, though. This pattern was repeated in a
variety of places in the code:
```rust
// panics with an cardinality mismatch
let my_counter = register(CounterVec::new(opts, &["succeeded", "failed"])?, registry)?;
my_counter.with_label_values(&["succeeded"]).inc()
```
The problem is that the labels provided in the constructor are not
the set of legal values which can be annotated, but a set of individual
label names which can have individual, arbitrary values.
This commit fixes that.
* get av-store subsystem to actually run properly and not die on first signal
* typo fix: incomming -> incoming
* don't disable authority discovery in test nodes
* Fix rococo-v1 missing session keys
* Update node/core/av-store/Cargo.toml
* try dummying out av-store on non-full-nodes
* overseer and subsystems are required only for full nodes
* Reduce the amount of warnings on browser target
* Fix two more warnings
* InclusionInherent should actually have an Inherent module on rococo
* Ancestry: don't return genesis' parent hash
* Update Cargo.lock
* fix broken test
* update test script: specify chainspec as script argument
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Update node/service/src/lib.rs
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* node/service/src/lib: Return error via ? operator
* post-merge blues
* add is_collator flag
* prevent occasional av-store test panic
* simplify fix; expand application
* run authority_discovery in Role::Discover when collating
* distinguish between proposer closed channel errors
* add IsCollator enum, remove is_collator CLI flag
* improve formatting
* remove nop loop
* Fix some stuff
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Robert Habermeier <robert@Roberts-MBP.lan1>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* Initial commit
* Move tests to separate module
* Move timestamps to the newtype
* Change idx name
* Use Duration for consts and update chunk records
* Ordering::Equal
* Update node/core/av-store/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* put_ methods do the array sorting
* Fix get_block_number
* Change StoreChunk message type and relay parent method
* Add chunk tests
* Fix block number computation for StoreChunk
* Duration instead of u64 everywhere
* Add a clarifying comment
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* WIP
* The initial implementation of the collator side.
* Improve comments
* Multiple collation requests
* Add more tests and comments to validator side
* Add comments, remove dead code
* Apply suggestions from code review
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Fix build after suggested changes
* Also connect to the next validator group
* Remove a Future impl and move TimeoutExt to util
* Minor nits
* Fix build
* Change FetchCollations back to FetchCollation
* Try this
* Final fixes
* Fix build
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* update primitives
* correct parent_head field
* make hrmp field pub
* refactor validation data: runtime
* refactor validation data: messages
* add arguments to full_validation_data runtime API
* port runtime API
* mostly port over candidate validation
* remove some parameters from ValidationParams
* guide: update candidate validation
* update candidate outputs
* update ValidationOutputs in primitives
* port over candidate validation
* add a new test for no-transient behavior
* update util runtime API wrappers
* candidate backing
* fix missing imports
* change some fields of validation data around
* runtime API impl
* update candidate validation
* fix backing tests
* grumbles from review
* fix av-store tests
* fix some more crates
* fix provisioner tests
* fix availability distribution tests
* port collation-generation to new validation data
* fix overseer tests
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* service-new: cosmetic changes
* overseer: draft of prometheus metrics
* metrics: update active_leaves metrics
* metrics: extract into functions
* metrics: resolve XXX
* metrics: it's ugly, but it works
* Bump Substrate
* metrics: move a bunch of code around
* Bumb substrate again
* metrics: fix a warning
* fix a warning in runtime
* metrics: statements signed
* metrics: statements impl RegisterMetrics
* metrics: refactor Metrics trait
* metrics: add Metrics assoc type to JobTrait
* metrics: move Metrics trait to util
* metrics: fix overseer
* metrics: fix backing
* metrics: fix candidate validation
* metrics: derive Default
* metrics: docs
* metrics: add stubs for other subsystems
* metrics: add more stubs and fix compilation
* metrics: fix doctest
* metrics: move to subsystem
* metrics: fix candidate validation
* metrics: bitfield signing
* metrics: av store
* metrics: chain API
* metrics: runtime API
* metrics: stub for avad
* metrics: candidates seconded
* metrics: ok I gave up
* metrics: provisioner
* metrics: remove a clone by requiring Metrics: Sync
* metrics: YAGNI
* metrics: remove another TODO
* metrics: for later
* metrics: add parachain_ prefix
* metrics: s/signed_statement/signed_statements
* utils: add a comment for job metrics
* metrics: address review comments
* metrics: oops
* metrics: make sure to save files before commit 😅
* use _total suffix for requests metrics
Co-authored-by: Max Inden <mail@max-inden.de>
* metrics: add tests for overseer
* update Cargo.lock
* overseer: add a test for CollationGeneration
* collation-generation: impl metrics
* collation-generation: use kebab-case for name
* collation-generation: add a constructor
Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* update networking types
* port over overseer-protocol message types
* Add the collation protocol to network bridge
* message sending
* stub for ConnectToValidators
* add some helper traits and methods to protocol types
* add collator protocol message
* leaves-updating
* peer connection and disconnection
* add utilities for dispatching multiple events
* implement message handling
* add an observedrole enum with equality and no sentry nodes
* derive partial-eq on network bridge event
* add PartialEq impls for network message types
* add Into implementation for observedrole
* port over existing network bridge tests
* add some more tests
* port bitfield distribution
* port over bitfield distribution tests
* add codec indices
* port PoV distribution
* port over PoV distribution tests
* port over statement distribution
* port over statement distribution tests
* update overseer and service-new
* address review comments
* port availability distribution
* port over availability distribution tests