* subsystems have an unbounded channel to the overseer
* Update node/overseer/src/lib.rs
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* bump Cargo.lock
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Do not send empty view updates to peers
It happened that we send empty view updates to our peers, because we
only updated our finalized block. This could lead to situations where we
overwhelmed sub systems with too many messages. On Rococo this lead to
constant restarts of our nodes, because some node apparently was
finalizing a lot of blocks.
To prevent this, the pr is doing the following:
1. If a peer sends us an empty view, we report this peer and decrease it
reputation.
2. We ensure that we only send a view update when the `heads` changed
and not only the `finalized_number`.
3. We do not send empty `ActiveLeavesUpdates` from the overseer, as this
makes no sense to send these empty updates. If some subsystem is relying
on the finalized block, it needs to listen for the overseer signal.
* Update node/network/bridge/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Don't work if they're are no added heads
* Fix test
* Ahhh
* More fixes
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* Fuse receive stream in Context
* Revert "Fuse receive stream in Context"
This reverts commit ddd26fa98f0ca1afbc22064e93010e4193a058b2.
* Exit on node shutdown from av-store loop
* Filter only context error
* Store all chunks and in a single transaction
* Adds chunks LRU to store
* Add pruning records metrics
* Use honest cache instead of LRU
* Remove unnecessary optional cache
* Fix review nits that are fixable
* Companion PR for refactoring priority groups
* Fix non reserved node
* Try fix tests
* Missing import
* Fix warning
* Change protocols order
* Fix test
* Renames
* Update syn dependency to make it compile again after merging master
* "Update Substrate"
Co-authored-by: parity-processbot <>
* add timing setup to OverseerSubsystemContext
* figure out how to initialize the rng
* attach a timer to a portion of the messages traveling to the Overseer
This timer only exists / logs a fraction of the time (configurable
by `MESSAGE_TIMER_METRIC_CAPTURE_RATE`). When it exists, it tracks
the span between the `OverSubsystemContext` receiving the message
and its receipt in `Overseer::run`.
* propagate message timing to the start of route_message
This should be more accurate; it ensures that the timer runs
at least as long as that function. As `route_message` is async,
it may not actually run for some time after it is called (or ever).
* fix failing test
* rand_chacha apparently implicitly has getrandom feature
* change rng initialization
The previous impl using `from_entropy` depends on the `getrandom`
crate, which uses the system entropy source, and which does not
work on `wasm32-unknown-unknown` because it wants to fall back to
a JS implementation which we can't assume exists.
This impl depends only on `rand::thread_rng`, which has no documentation
stating that it's similarly limited.
* remove randomness in favor of a simpler 1 of N procedure
This deserves a bit of explanation, as the motivating issue explicitly
requested randomness. In short, it's hard to get randomness to compile
for `wasm32-unknown-unknown` because that is explicitly intended to be
as deterministic as practical. Additionally, even though it would never
be used for consensus purposes, it still felt offputting to intentionally
introduce randomness into a node's operations. Except, it wasn't really
random, either: it was a deterministic PRNG varying only in its state,
and getting the state to work right for that target would have required
initializing from a constant.
Given that it was a deterministic sequence anyway, it seemed much simpler
and more explicit to simply select one of each N messages instead of
attempting any kind of realistic randomness.
* reinstate randomness for better statistical properties
This partially reverts commit 0ab8594c328b3f9ce1f696fe405556d4000630e9.
`oorandom` is much lighter than the previous `rand`-based implementation,
which makes this easier to work with.
This implementation gives each subsystem and each child RNG a distinct
increment, which should ensure they produce distinct streams of values.
* Add one Jaeger span per relay parent
This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.
* Fix doc tests
* Replace `PerLeaveSpan` to `PerLeafSpan`
* More renaming
* Moare
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Skip the spans
* Increase `spec_version`
Co-authored-by: Andronik Ordian <write@reusable.software>
* Cont.: Implement the state root obtaining during inclusion
During inclusion now we obtain the storage root by passing it through
the inclusion_inherent.
* Fix tests
* Bump rococo spec version
* Reorder the parent header into the end
of the inclusion inherent.
When the parent header is in the beginning, it shifts the other two
fields, so that a previous version won't be able to decode that. If
we put the parent header in the end, the other two fields will stay
at their positions, thus make it possible to decode with the previous
version.
That allows us to perform upgrade of rococo runtime without needing of
simultanuous upgrade of nodes and runtime, or restart of the network.
* Squash a stray tab
This pr adds support to change the session length of a Rococo chain at
genesis. This is rather useful because Rococo has a session length of
1 hour, while on rococo-local you will now get 1 minute. This improves
the dev experience, because a parachain is only going live at the
start of a new session.
* add candidate hash statement circulation span
* add relay-parent to hash-span
* Some typos and misspellings in docs I found, during my studies. (#2144)
* Fix stale link to overseer docs
* Some typos and mispellings in docs/comments
I found during studying how Polkadot works.
* Rococo V1 (#2141)
* Update to latest master and use 30 minutes sessions
* add bootnodes to chainspec
* Update Substrate
* Update chain-spec
* Update Cargo.lock
* GENESIS
* Change session length to one hour
* Bump spec_version to not fuck anything up ;)
Co-authored-by: Erin Grasmick <erin@parity.io>
* avoid creating duplicate unbacked spans when we see extra statements (#2145)
* improve jaeger spans for statement distribution
* tweak and add failing test for repropagation
* make a change that gets the test passing
* guide: clarify
* remove semicolon
Co-authored-by: Robert Klotzner <eskimor@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Erin Grasmick <erin@parity.io>
* guide: add candidate information to OccupiedCore
* add descriptor and hash to occupied core type
* guide: add candidate hash to inclusion
* runtime: return candidate info in core state
* bitfield signing: stop querying runtime as much
* minimize going to runtime in availability distribution
* fix availability distribution tests
* guide: remove para ID from Occupied core
* get all crates compiling
* Fix bug and further optimizations in availability distribution
- There was a bug that resulted in only getting one candidate per block
as the candidates were put into the hashmap with the relay block hash as
key. The solution for this is to use the candidate hash and the relay
block hash as key.
- We stored received/sent messages with the candidate hash and chunk
index as key. The candidate hash wasn't required in this case, as the
messages are already stored per candidate.
* Update node/core/bitfield-signing/src/lib.rs
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
* Remove the reverse map
* major refactor of receipts & query_live
* finish refactoring
remove ancestory mapping,
improve relay-parent cleanup & receipts-cache cleanup,
add descriptor to `PerCandidate`
* rename and rewrite query_pending_availability
* add a bunch of consistency tests
* Add some last changes
* xy
* fz
* Make it compile again
* Fix one test
* Fix logging
* Remove some buggy code
* Make tests work again
* Move stuff around
* Remove dbg
* Remove state from test_harness
* More refactor and new test
* New test and fixes
* Move metric
* Remove "duplicated code"
* Fix tests
* New test
* Change break to continue
* Update node/core/av-store/src/lib.rs
* Update node/core/av-store/src/lib.rs
* Update node/core/bitfield-signing/src/lib.rs
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
* update guide to match live_candidates changes
* add comment
* fix bitfield signing
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
* refactor View to include finalized_number
* guide: update the NetworkBridge on BlockFinalized
* av-store: fix the tests
* actually fix tests
* grumbles
* ignore macro doctest
* use Hash::repeat_bytes more consistently
* broadcast empty leaves updates as well
* fix issuing view updates on empty leaves updates
Right now if the collation is not happening one will have to sprinkle
log statements and then recompile the code. It's doubly annoying if that
happens when working with Cumulus: that means one has to resort to
.cargo/config's `paths` or `diener`, which both are not ideal.
This just adds some verbose logging to save the investigators some time
when looking why the collations are not happening