We are awaiting on the oneshot anyways, so we have back pressure. By
using the unbounded channel make log messages like the following less
likely (due to higher priority):
2022-05-30 13:46:38
2022-05-30 11:46:38.565 WARN tokio-runtime-worker parachain::provisioner: failed to assemble or send inherent data err=CanceledBackedCandidates(Canceled)
* Add some meaningful logging to the force approval to understand why it fails
* Add original block into the log to simplify logs lurking
* Update node/core/approval-voting/src/import.rs
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* Revert approval-voting subsystem
* Approval voting revert encapsulated within 'ops' module
* use 'get_stored_blocks' to get lower block height
* Fix error message
* Optionally shrink/delete stored blocks range
* range end number is last block number plus 1
* Apply code review suggestions
* Use tristate enum for block range in backend overlay
* Add clarification comment
* Add comments to private struct
* Increase message channel size to 2048
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
* Use unbounded channel for reading data
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
* split metrics from collation generation
* move metrics to separate file out of backing
* split bitfield signing metrics
* split candidate validation metrics
* split chain api metrics
* split metrics from runtime API
* util is not used in backed metrics mod
* fmt
* missing types
* sure
* gossip-support: be explicit about dimensions
* some guide updates
* update network-bridge to distinguish x and y dimensions
* get everything to compile
* beginnings
* some TODOs
* polkadot runtime: use relevant_authorities
* make gossip topologies per-session
* better formatting
* gossip support: use current session validators
* expand in comment
* adjust tests and fix index bug
* add past/present/future connection test and clean up code
* fmt
* network bridge: updated types
* update protocols to new gossip topology message
* guide updates
* add session to BlockApprovalMeta
* add session to block info
* refactor knowledge and remove most unify logic
* start replacing gossip_peers with new SessionTopologies
* add routing information to message state
* add some utilities to SessionTopology
* implement new gossip topology logic
* re-implement unify_with_peer
* distribute assignments according to topology
* finish grid topology implementation
* refactor network bridge slightly
* issue connection requests on all past/present/future
* fmt
* address grumbles
* tighten invariants in unify_with_peer
* implement random propagation
* refactor: extract required routing adjustment logic
* some block-age logic
* aggressively propagate messages when finality is slow
* overhaul aggression system to have 3 levels
* add aggression metrics
* remove aggression L3
* reduce random circulation
* remove PeerData
* get approval tests compiling
* use btree_map in known_by to make deterministic
* Revert "use btree_map in known_by to make deterministic"
This reverts commit 330d65343a7bb6fe4dd0f24bd8dbc15c0cbdbd9d.
* test XY grid propagation
* remove stray println
* test unshared dimension propagation
* add random gossip check
* test unify_with_peer better
* test sending after getting gossip topology
* test L1 aggression on originator
* test L1 aggression for non-originators
* test non-originator aggression L2
* fnt
* ~spellcheck
* fix statement-distribution tests
* fix flaky test
* fix metrics typo
* re-send periodically
* test resending
* typo
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* add more metrics about apd messages
* add back unify_with_peer logs
* make Resend an enum
* be more explicit when resending
* fmt
* fix error
* add a TODO for refactoring
* remove debug metrics
* add some guide stuff
* fmt
* update runtime API in test-runtim
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Move `trait ParachainHost` to a separate version independent module
`trait ParachainHost` is no longer part of a specific primitives
version. Instead there is a single trait for stable and staging api
versions. The trait contains stable AND staging methods. The latter are
explicitly marked as unstable.
* Fix `use` primitives
`polkadot_primitives::v2` becomes `polkadot_primitives::runtime_api`
* Staging API declaration and stubs
Introduces the concept for 'staging functions' in runtime API. These
functions are still in testing and they are meant to be used only
within test networks (Westend).
They coexist with the stable calls for technical reasons - maintaining
different runtime APIs for different networks is hard to implement.
Check the doc comments in source files for more details how the staging
API should be used.
* Add new staging method - get_session_disputes()
Add `staging_get_session_disputes` to `ParachainHost` as the first
method of the staging API.
* Hide vstaging runtime api implementations behind feature flag
* Fix test runtime
* fn staging_get_session_disputes() is renamed to fn staging_get_disputes()
The PVF host is designed to avoid spawning tasks to minimize knowledge
of outer code. Using `async_std::task::spawn` (or Tokio's counterpart)
deemed unacceptable, `SpawnNamed` undesirable. Instead there is only one
task returned that is spawned by the candidate-validation subsystem.
The tasks from the sub-components are polled by that root task.
However, the way the tasks are bundled was incorrect. There was a giant
select that was polling those tasks. Particularly, that implies that as soon as
one of the arms of that select goes into await those sub-tasks stop
getting polled. This is a recipe for a deadlock which indeed happened
here.
Specifically, the deadlock happened during sending messages to the
execute queue by calling
[`send_execute`](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L601).
When the channel to the queue reaches the capacity, the control flow is
suspended until the queue handles those messages. Since this code is
essentially reached from [one of the select
arms](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L371),
the queue won't be given the control and thus no further progress can be
made.
This problem is solved by bundling the tasks one level higher instead,
by `selecting` over those long-running tasks.
We also stop treating returning from those long-running tasks as error
conditions, since that can happen during legit shutdown.
* Rename to BagError
* Additional parameter for 'revert' command
* Set aux revert param to None
* Align to changes in how the WASM executor is configured in `substrate`
* update lockfile for {"substrate"}
* update lockfile for {"substrate"}
* Update substrate
* Update substrate
Co-authored-by: Keith Yeung <kungfukeith11@gmail.com>
Co-authored-by: Davide Galassi <davxy@datawok.net>
Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
Co-authored-by: parity-processbot <>
* Don't wait for dispute coordinator
in backing and approval-voting - we are single threaded there, so this
is blocking everything.
* Add missing import.
* Don't warn on dropped receiver.