* Add one Jaeger span per relay parent
This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.
* Fix doc tests
* Replace `PerLeaveSpan` to `PerLeafSpan`
* More renaming
* Moare
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Skip the spans
* Increase `spec_version`
Co-authored-by: Andronik Ordian <write@reusable.software>
* refactor View to include finalized_number
* guide: update the NetworkBridge on BlockFinalized
* av-store: fix the tests
* actually fix tests
* grumbles
* ignore macro doctest
* use Hash::repeat_bytes more consistently
* broadcast empty leaves updates as well
* fix issuing view updates on empty leaves updates
* Rework `ConnectionsRequests`
Instead of implementing the `Stream` trait, this struct now provides a
function `next()`. This enables us to encode into the type system that
it will always return a value or block indefinitely.
* Review feedback
* guide: fix formatting for SessionInfo module
* primitives: SessionInfo type
* punt on approval keys
* ah, revert the type alias
* session info runtime module skeleton
* update the guide
* runtime/configuration: sync with the guide
* runtime/configuration: setters for newly added fields
* runtime/configuration: set codec indexes
* runtime/configuration: update test
* primitives: fix SessionInfo definition
* runtime/session_info: initial impl
* runtime/session_info: use initializer for session handling (wip)
* runtime/session_info: mock authority discovery trait
* guide: update the initializer's order
* runtime/session_info: tests skeleton
* runtime/session_info: store n_delay_tranches in Configuration
* runtime/session_info: punt on approval keys
* runtime/session_info: add some basic tests
* Update primitives/src/v1.rs
* small fixes
* remove codec index annotation on structs
* fix off-by-one error
* validator_discovery: accept a session index
* runtime: replace validator_discovery api with session_info
* Update runtime/parachains/src/session_info.rs
Co-authored-by: Sergei Shulepov <sergei@parity.io>
* runtime/session_info: add a comment about missing entries
* runtime/session_info: define the keys
* util: expose connect_to_past_session_validators
* util: allow session_info requests for jobs
* runtime-api: add mock test for session_info
* collator-protocol: add session_index to test state
* util: fix error message for runtime error
* fix compilation
* fix tests after merge with master
Co-authored-by: Sergei Shulepov <sergei@parity.io>
* Initial commit
* Remove unnecessary struct
* Some review nits
* Update node/network/pov-distribution/src/lib.rs
* Update parachain/test-parachains/adder/collator/tests/integration.rs
* Review nits
* notify_all_we_are_awaiting
* Both ways of peers connections should work the same
* Add mod-level docs to error.rs
* Avoid multiple connection requests at same parent
* Dont bail on errors
* FusedStream for ConnectionRequests
* Fix build after merge
* Improve error handling
* Remove whitespace formatting
* reexport prometheus-super for ease of use of other subsystems
* add some prometheus timers for collation generation subsystem
* add timing metrics to av-store
* add metrics to candidate backing
* add timing metric to bitfield signing
* add timing metrics to candidate selection
* add timing metrics to candidate-validation
* add timing metrics to chain-api
* add timing metrics to provisioner
* add timing metrics to runtime-api
* add timing metrics to availability-distribution
* add timing metrics to bitfield-distribution
* add timing metrics to collator protocol: collator side
* add timing metrics to collator protocol: validator side
* fix candidate validation test failures
* add timing metrics to pov distribution
* add timing metrics to statement-distribution
* use substrate_prometheus_endpoint prometheus reexport instead of prometheus_super
* don't include JOB_DELAY in bitfield-signing metrics
* give adder-collator ability to easily export its genesis-state and validation code
* wip: adder-collator pushbutton script
* don't attempt to register the adder-collator automatically
Instead, get these values with
```sh
target/release/adder-collator export-genesis-state
target/release/adder-collator export-genesis-wasm
```
And then register the parachain on https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A9944#/explorer
To collect prometheus data, after running the script, create `prometheus.yml` per the instructions
at https://www.notion.so/paritytechnologies/Setting-up-Prometheus-locally-835cb3a9df7541a781c381006252b5ff
and then run:
```sh
docker run -v `pwd`/prometheus.yml:/etc/prometheus/prometheus.yml:z --network host prom/prometheus
```
Demonstrates that data makes it across to prometheus, though it is likely to be useful in the future
to tweak the buckets.
* Update parachain/test-parachains/adder/collator/src/cli.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* use the grandpa-pause parameter
* skip metrics in tracing instrumentation
* remove unnecessary grandpa_pause cli param
Co-authored-by: Andronik Ordian <write@reusable.software>
* drop in tracing to replace log
* add structured logging to trace messages
* add structured logging to debug messages
* add structured logging to info messages
* add structured logging to warn messages
* add structured logging to error messages
* normalize spacing and Display vs Debug
* add instrumentation to the various 'fn run'
* use explicit tracing module throughout
* fix availability distribution test
* don't double-print errors
* remove further redundancy from logs
* fix test errors
* fix more test errors
* remove unused kv_log_macro
* fix unused variable
* add tracing spans to collation generation
* add tracing spans to av-store
* add tracing spans to backing
* add tracing spans to bitfield-signing
* add tracing spans to candidate-selection
* add tracing spans to candidate-validation
* add tracing spans to chain-api
* add tracing spans to provisioner
* add tracing spans to runtime-api
* add tracing spans to availability-distribution
* add tracing spans to bitfield-distribution
* add tracing spans to network-bridge
* add tracing spans to collator-protocol
* add tracing spans to pov-distribution
* add tracing spans to statement-distribution
* add tracing spans to overseer
* cleanup
We need to distribute the PoV after we have seconded it. Other nodes
that will receive our `Secondded` statement and want to validate the
candidate another time will request this PoV from us.
* backing: extract log target
* bitfield-signing: extract log target
* utils: fix a typo
* provisioner: extract log target
* candidate selection: remove unused error variant
* bitfield-distribution: change the return type of run
* pov-distribution: extract log target
* collator-protocol: simplify runtime request
* collation-generation: do not exit early on error
* collation-generation: do not exit on double init
* collator-protocol: do not exit on errors and rename LOG_TARGET
* collator-protocol: a workaround for ununused imports warning
* Update node/network/bitfield-distribution/src/lib.rs
* collation-generation: elevate warn! to error!
* collator-protocol: fix imports
* post merge fix
* fix compilation
* service-new: cosmetic changes
* overseer: draft of prometheus metrics
* metrics: update active_leaves metrics
* metrics: extract into functions
* metrics: resolve XXX
* metrics: it's ugly, but it works
* Bump Substrate
* metrics: move a bunch of code around
* Bumb substrate again
* metrics: fix a warning
* fix a warning in runtime
* metrics: statements signed
* metrics: statements impl RegisterMetrics
* metrics: refactor Metrics trait
* metrics: add Metrics assoc type to JobTrait
* metrics: move Metrics trait to util
* metrics: fix overseer
* metrics: fix backing
* metrics: fix candidate validation
* metrics: derive Default
* metrics: docs
* metrics: add stubs for other subsystems
* metrics: add more stubs and fix compilation
* metrics: fix doctest
* metrics: move to subsystem
* metrics: fix candidate validation
* metrics: bitfield signing
* metrics: av store
* metrics: chain API
* metrics: runtime API
* metrics: stub for avad
* metrics: candidates seconded
* metrics: ok I gave up
* metrics: provisioner
* metrics: remove a clone by requiring Metrics: Sync
* metrics: YAGNI
* metrics: remove another TODO
* metrics: for later
* metrics: add parachain_ prefix
* metrics: s/signed_statement/signed_statements
* utils: add a comment for job metrics
* metrics: address review comments
* metrics: oops
* metrics: make sure to save files before commit 😅
* use _total suffix for requests metrics
Co-authored-by: Max Inden <mail@max-inden.de>
* metrics: add tests for overseer
* update Cargo.lock
* overseer: add a test for CollationGeneration
* collation-generation: impl metrics
* collation-generation: use kebab-case for name
* collation-generation: add a constructor
Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
* update networking types
* port over overseer-protocol message types
* Add the collation protocol to network bridge
* message sending
* stub for ConnectToValidators
* add some helper traits and methods to protocol types
* add collator protocol message
* leaves-updating
* peer connection and disconnection
* add utilities for dispatching multiple events
* implement message handling
* add an observedrole enum with equality and no sentry nodes
* derive partial-eq on network bridge event
* add PartialEq impls for network message types
* add Into implementation for observedrole
* port over existing network bridge tests
* add some more tests
* port bitfield distribution
* port over bitfield distribution tests
* add codec indices
* port PoV distribution
* port over PoV distribution tests
* port over statement distribution
* port over statement distribution tests
* update overseer and service-new
* address review comments
* port availability distribution
* port over availability distribution tests
* polkadot-subsystem: update runtime API message types
* update all networking subsystems to use fallible runtime APIs
* fix bitfield-signing and make it use new runtime APIs
* port candidate-backing to handle runtime API errors and new types
* remove old runtime API messages
* remove unused imports
* fix grumbles
* fix backing tests
* Initial commit
* WIP
* Make atomic transactions
* Remove pruning code
* Fix build and add a Nop to bridge
* Fixes from review
* Move config struct around for clarity
* Rename constructor and warn on missing docs
* Fix a test and rename a message
* Fix some more reviews
* Obviously failed to rebase cleanly
* add ActiveLeavesUpdate, remove StartWork, StopWork
* replace StartWork, StopWork in subsystem crate tests
* mechanically update OverseerSignal in other modules
* convert overseer to take advantage of new multi-hash update abilities
Note: this does not yet convert the tests; some of the tests now freeze:
test tests::overseer_start_stop_works ... test tests::overseer_start_stop_works has been running for over 60 seconds
test tests::overseer_finalize_works ... test tests::overseer_finalize_works has been running for over 60 seconds
* fix broken overseer tests
* manually impl PartialEq for ActiveLeavesUpdate, rm trait Equivalent
This cleans up the code a bit and makes it easier in the future to
do the right thing when comparing ALUs.
* use target in all network bridge logging
* reduce spamming of and
* get conclude signal working properly; don't allocate a vector
* wip: add test suite / example / explanation for using utility subsystem
Unfortunately, the test fails right now for reasons which seem
very odd. Just have to keep poking at it.
* explicitly import everything
* fix subsystem-util test
The root problem here was two-fold:
- there was a circular dependency from subsystem -> test-helpers/subsystem ->
subsystem
- cfg(test) doesn't propagate between crates
The solution: move the subsystem test helpers into a sub-module
within subsystem. Publicly export them from the previous location
so no other code breaks.
Doing this has an additional benefit: it ensures that no production
code can ever accidentally use the subsystem helpers, as they are compile-
gated on cfg(test).
* fully commit to moving test helpers into a subsystem module
* add some more tests
* get rid of log tests in favor of real error forwarding
It's not obvious whether we'll ever really want to chase down
these errors outside a testing context, but having the capability
won't hurt.
* fix issue which caused test to hang on osx
* only require that job errors are PartialEq when testing
also fix polkadot-node-core-backing tests
* get rid of any notion of partialeq
* rethink testing
Combine tests of starting and stopping job: leaving a test executor
with a job running was pretty clearly the cause of the sometimes-hang.
Also, add a timeout so tests _can't_ hang anymore; they just fail
after a while.
* rename fwd_errors -> forward_errors
* warn on error propagation failure
* fix unused import leftover from merge
* derive eq for subsystemerror
* create a README on Runtime APIs
* add ParaId type
* write up runtime APIs
* more preamble
* rename
* rejig runtime APIs
* add occupied_since to `BlockNumber`
* skeleton crate for runtime API subsystem
* improve group_for_core
* improve docs on availability cores runtime API
* guide: freed -> free
* add primitives for runtime APIs
* create a v1 ParachainHost API trait
* guide: make validation code return `Option`al.
* skeleton runtime API helpers
* make parachain-host runtime-generic
* skeleton for most runtime API implementation functions
* guide: add runtime API helper methods
* implement new helpers of the inclusion module
* guide: remove retries check, as it is unneeded
* implement helpers for scheduler module for Runtime APIs
* clean up `validator_groups` implementation
* implement next_rotation_at and last_rotation_at
* guide: more helpers on GroupRotationInfo
* almost finish implementing runtime APIs
* add explicit block parameter to runtime API fns
* guide: generalize number parameter
* guide: add group_responsible to occupied-core
* update primitives due to guide changes
* finishing touches on runtime API implementation; squash warnings
* break out runtime API impl to separate file
* add tests for next_up logic
* test group rotation info
* point to filed TODO
* remove unused TODO [now]
* indentation
* guide: para -> para_id
* rename para field to para_id for core meta
* remove reference to outdated AvailabilityCores type
* add an event in `inclusion` for candidates being included or timing out
* guide: candidate events
* guide: adjust language
* Candidate events type from guide and adjust inclusion event
* implement `candidate_events` runtime API
* fix runtime test compilation
* max -> min
* fix typos
* guide: add `RuntimeAPIRequest::CandidateEvents`
* create a v1 primitives module
* Improve guide on availability types
* punctuate
* new parachains runtime uses new primitives
* tests of new runtime now use new primitives
* add ErasureChunk to guide
* export erasure chunk from v1 primitives
* subsystem crate uses v1 primitives
* node-primitives uses new v1 primitives
* port overseer to new primitives
* new-proposer uses v1 primitives (no ParachainHost anymore)
* fix no-std compilation for primitives
* service-new uses v1 primitives
* network-bridge uses new primitives
* statement distribution uses v1 primitives
* PoV distribution uses v1 primitives; add PoV::hash fn
* move parachain to v0
* remove inclusion_inherent module and place into v1
* remove everything from primitives crate root
* remove some unused old types from v0 primitives
* point everything else at primitives::v0
* squanch some warns up
* add RuntimeDebug import to no-std as well
* port over statement-table and validation
* fix final errors in validation and node-primitives
* add dummy Ord impl to committed candidate receipt
* guide: update CandidateValidationMessage
* add primitive for validationoutputs
* expand CandidateValidationMessage further
* bikeshed
* add some impls to omitted-validation-data and available-data
* expand CandidateValidationMessage
* make erasure-coding generic over v1/v0
* update usages of erasure-coding
* implement commitments.hash()
* use Arc<Pov> for CandidateValidation
* improve new erasure-coding method names
* fix up candidate backing
* update docs a bit
* fix most tests and add short-circuiting to make_pov_available
* fix remainder of candidate backing tests
* squanching warns
* squanch it up
* some fallout
* overseer fallout
* free from polkadot-test-service hell
* introduce candidatedescriptor type
* add PoVDistribution message type
* loosen bound on PoV Distribution to account for equivocations
* re-export some types from the messages module
* begin PoV Distribution subsystem
* remove redundant index from PoV distribution
* define state machine for pov distribution
* handle overseer signals
* set up control flow
* remove `ValidatorStatement` section
* implement PoV fetching
* implement distribution logic
* add missing `
* implement some network bridge event handlers
* stub for message processing, handle our view change
* control flow for handling messages
* handle `awaiting` message
* handle any incoming PoVs and redistribute
* actually provide a subsystem implementation
* remove set-builder notation
* begin testing PoV distribution
* test that we send awaiting messages only to peers with same view
* ensure we distribute awaited PoVs to peers on view changes
* test that peers can complete fetch and are rewarded
* test some reporting logic
* ensure peer is reported for flooding
* test punishing peers diverging from awaited protocol
* test that we eagerly complete peers' awaited PoVs based on what we receive
* test that we prune the awaited set after receiving
* expand pov-distribution in guide to match a change I made
* remove unneeded import