This PR should supersede
https://github.com/paritytech/polkadot-sdk/pull/2814 and accomplish the
same with less changes. It's needed to run sync strategies in parallel,
like running `ChainSync` and `GapSync` as independent strategies, and
running `ChainSync` and Sync 2.0 alongside each other.
The difference with https://github.com/paritytech/polkadot-sdk/pull/2814
is that we allow simultaneous requests to remote peers initiated by
different strategies, as this is not tracked on the remote node in any
way. Therefore, `PeerPool` is not needed.
CC @skunert
---------
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Changes (partial https://github.com/paritytech/polkadot-sdk/issues/994):
- Set log to `0.4.20` everywhere
- Lift `log` to the workspace
Starting with a simpler one after seeing
https://github.com/paritytech/polkadot-sdk/pull/2065 from @jsdw.
This sets the `default-features` to `false` in the root and then
overwrites that in each create to its original value. This is necessary
since otherwise the `default` features are additive and its impossible
to disable them in the crate again once they are enabled in the
workspace.
I am using a tool to do this, so its mostly a test to see that it works
as expected.
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
This is a rather big change in jsonrpsee, the major things in this bump
are:
- Server backpressure (the subscription impls are modified to deal with
that)
- Allow custom error types / return types (remove jsonrpsee::core::Error
and jsonrpee::core::CallError)
- Bug fixes (graceful shutdown in particular not used by substrate
anyway)
- Less dependencies for the clients in particular
- Return type requires Clone in method call responses
- Moved to tokio channels
- Async subscription API (not used in this PR)
Major changes in this PR:
- The subscriptions are now bounded and if subscription can't keep up
with the server it is dropped
- CLI: add parameter to configure the jsonrpc server bounded message
buffer (default is 64)
- Add our own subscription helper to deal with the unbounded streams in
substrate
The most important things in this PR to review is the added helpers
functions in `substrate/client/rpc/src/utils.rs` and the rest is pretty
much chore.
Regarding the "bounded buffer limit" it may cause the server to handle
the JSON-RPC calls
slower than before.
The message size limit is bounded by "--rpc-response-size" thus "by
default 10MB * 64 = 640MB"
but the subscription message size is not covered by this limit and could
be capped as well.
Hopefully the last release prior to 1.0, sorry in advance for a big PR
Previous attempt: https://github.com/paritytech/substrate/pull/13992
Resolves https://github.com/paritytech/polkadot-sdk/issues/748, resolves
https://github.com/paritytech/polkadot-sdk/issues/627
Extract `WarpSync` (and `StateSync` as part of warp sync) from
`ChainSync` as independent syncing strategy called by `SyncingEngine`.
Introduce `SyncingStrategy` enum as a proxy between `SyncingEngine` and
specific syncing strategies.
## Limitations
Gap sync is kept in `ChainSync` for now because it shares the same set
of peers as block syncing implementation in `ChainSync`. Extraction of a
common context responsible for peer management in syncing strategies
able to run in parallel is planned for a follow-up PR.
## Further improvements
A possibility of conversion of `SyncingStartegy` into a trait should be
evaluated. The main stopper for this is that different strategies need
to communicate different actions to `SyncingEngine` and respond to
different events / provide different APIs (e.g., requesting
justifications is only possible via `ChainSync` and not through
`WarpSync`; `SendWarpProofRequest` action is only relevant to
`WarpSync`, etc.)
---------
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
Previously, it was only possible to retry the same request on a
different protocol name that had the exact same binary payloads.
Introduce a way of trying a different request on a different protocol if
the first one fails with Unsupported protocol.
This helps with adding new req-response versions in polkadot while
preserving compatibility with unupgraded nodes.
The way req-response protocols were bumped previously was that they were
bundled with some other notifications protocol upgrade, like for async
backing (but that is more complicated, especially if the feature does
not require any changes to a notifications protocol). Will be needed for
implementing https://github.com/polkadot-fellows/RFCs/pull/47
TODO:
- [x] add tests
- [x] add guidance docs in polkadot about req-response protocol
versioning
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
# Description
Trivial change that resolves
https://github.com/paritytech/polkadot-sdk/issues/2185.
Since there was a mix of `who` and `peer_id` argument names nearby I
changed them all to `peer_id`.
# Checklist
- [x] My PR includes a detailed description as outlined in the
"Description" section above
- [x] My PR follows the [labeling requirements](CONTRIBUTING.md#Process)
of this project (at minimum one label for `T`
required)
- [x] I have made corresponding changes to the documentation (if
applicable)
- [ ] I have added tests that prove my fix is effective or that my
feature works (if applicable)
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
Using taplo, fixes all our broken and inconsistent toml formatting and
adds CI to keep them tidy.
If people want we can customise the format rules as described here
https://taplo.tamasfe.dev/configuration/formatter-options.html
@ggwpez, I suggest zepter is used only for checking features are
propagated, and leave formatting for taplo to avoid duplicate work and
conflicts.
TODO
- [x] Use `exclude = [...]` syntax in taplo file to ignore zombienet
tests instead of deleting the dir
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
This commit introduces a new concept called `NotificationService` which
allows Polkadot protocols to communicate with the underlying
notification protocol implementation directly, without routing events
through `NetworkWorker`. This implies that each protocol has its own
service which it uses to communicate with remote peers and that each
`NotificationService` is unique with respect to the underlying
notification protocol, meaning `NotificationService` for the transaction
protocol can only be used to send and receive transaction-related
notifications.
The `NotificationService` concept introduces two additional benefits:
* allow protocols to start using custom handshakes
* allow protocols to accept/reject inbound peers
Previously the validation of inbound connections was solely the
responsibility of `ProtocolController`. This caused issues with light
peers and `SyncingEngine` as `ProtocolController` would accept more
peers than `SyncingEngine` could accept which caused peers to have
differing views of their own states. `SyncingEngine` would reject excess
peers but these rejections were not properly communicated to those peers
causing them to assume that they were accepted.
With `NotificationService`, the local handshake is not sent to remote
peer if peer is rejected which allows it to detect that it was rejected.
This commit also deprecates the use of `NetworkEventStream` for all
notification-related events and going forward only DHT events are
provided through `NetworkEventStream`. If protocols wish to follow each
other's events, they must introduce additional abtractions, as is done
for GRANDPA and transactions protocols by following the syncing protocol
through `SyncEventStream`.
Fixes https://github.com/paritytech/polkadot-sdk/issues/512
Fixes https://github.com/paritytech/polkadot-sdk/issues/514
Fixes https://github.com/paritytech/polkadot-sdk/issues/515
Fixes https://github.com/paritytech/polkadot-sdk/issues/554
Fixes https://github.com/paritytech/polkadot-sdk/issues/556
---
These changes are transferred from
https://github.com/paritytech/substrate/pull/14197 but there are no
functional changes compared to that PR
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Get rid of public `ChainSync::..._requests()` functions and return all
requests as actions.
---------
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
All `ChainSync` actions that `SyncingEngine` should perform are unified
under one `ChainSyncAction`. Processing of these actions put into a
single place after `select!` in `SyncingEngine::run` instead of multiple
places where calling `ChainSync` methods.
The `BlockBuilderProvider` was a trait that was defined in
`sc-block-builder`. The trait was implemented for `Client`. This
basically meant that you needed to import `sc-block-builder` any way to
have access to the block builder. So, this trait was not providing any
real value. This pull request is removing the said trait. Instead of the
trait it introduces a builder for creating a `BlockBuilder`. The builder
currently has the quite fabulous name `BlockBuilderBuilder` (I'm open to
any better name 😅). The rest of the pull request is about
replacing the old trait with the new builder.
# Downstream code changes
If you used `new_block` or `new_block_at` before you now need to switch
it over to the new `BlockBuilderBuilder` pattern:
```rust
// `new` requires a type that implements `CallApiAt`.
let mut block_builder = BlockBuilderBuilder::new(client)
// Then you need to specify the hash of the parent block the block will be build on top of
.on_parent_block(at)
// The block builder also needs the block number of the parent block.
// Here it is fetched from the given `client` using the `HeaderBackend`
// However, there also exists `with_parent_block_number` for directly passing the number
.fetch_parent_block_number(client)
.unwrap()
// Enable proof recording if required. This call is optional.
.enable_proof_recording()
// Pass the digests. This call is optional.
.with_inherent_digests(digests)
.build()
.expect("Creates new block builder");
```
---------
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Co-authored-by: command-bot <>
This PR moves syncing-related code from `sc-network-common` to
`sc-network-sync`.
Unfortunately, some parts are tightly integrated with networking, so
they were left in `sc-network-common` for now:
1. `SyncMode` in `common/src/sync.rs` (used in `NetworkConfiguration`).
2. `BlockAnnouncesHandshake`, `BlockRequest`, `BlockResponse`, etc. in
`common/src/sync/message.rs` (used in `src/protocol.rs` and
`src/protocol/message.rs`).
More substantial refactoring is needed to decouple syncing and
networking completely, including getting rid of the hardcoded sync
protocol.
## Release notes
Move syncing-related code from `sc-network-common` to `sc-network-sync`.
Delete `ChainSync` trait as it's never used (the only implementation is
accessed directly from `SyncingEngine` and exposes a lot of public
methods that are not part of the trait). Some new trait(s) for syncing
will likely be introduced as part of Sync 2.0 refactoring to represent
syncing strategies.
The change adds a test to show the failure scenario that caused #1812 to
be rolled back (more context:
https://github.com/paritytech/polkadot-sdk/issues/493#issuecomment-1772009924)
Summary of the scenario:
1. Node has finished downloading up to block 1000 from the peers, from
the canonical chain.
2. Peers are undergoing re-org around this time. One of the peers has
switched to a non-canonical chain, announces block 1001 from that chain
3. Node downloads 1001 from the peer, and tries to import which would
fail (as we don't have the parent block 1000 from the other chain)
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
When retrieving the ready blocks, verify that the parent of the first
ready block is on chain. If the parent is not on chain, we are
downloading from a fork. In this case, keep downloading until we have a
parent on chain (common ancestor).
Resolves https://github.com/paritytech/polkadot-sdk/issues/493.
---------
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
Submit the outstanding PRs from the old repos(these were already
reviewed and approved before the repo rorg, but not yet submitted):
Main PR: https://github.com/paritytech/substrate/pull/14014
Companion PRs: https://github.com/paritytech/polkadot/pull/7134,
https://github.com/paritytech/cumulus/pull/2489
The changes in the PR:
1. ChainSync currently calls into the block request handler directly.
Instead, move the block request handler behind a trait. This allows new
protocols to be plugged into ChainSync.
2. BuildNetworkParams is changed so that custom relay protocol
implementations can be (optionally) passed in during network creation
time. If custom protocol is not specified, it defaults to the existing
block handler
3. BlockServer and BlockDownloader traits are introduced for the
protocol implementation. The existing block handler has been changed to
implement these traits
4. Other changes:
[X] Make TxHash serializable. This is needed for exchanging the
serialized hash in the relay protocol messages
[X] Clean up types no longer used(OpaqueBlockRequest,
OpaqueBlockResponse)
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: command-bot <>
* Make peer evictions less aggressive
The original implementation of peer eviction prioritized aliveness over
connection stability which made the peer count unstable for some users.
As this may cause discomfort or infrastructure alerts if stability is
tracked, adjust the eviction to be less aggressive by only evicting
peers when the node has fully stalled. This causes the node to have some
peers who are inactive and won't send any block announcements.
These nodes are removed if the local node is able to receive at least
one block announcement from one of its peers as the inactivity of the
substream is detected when a notification is sent.
If the node won't send or receive any block annoucements for 30 seconds,
it's considered stalled and it will evict all peers,
causing `ProtocolController` to accept and establish connections from new
peers.
* Update client/network/sync/src/engine.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Track last send and received notification simultaneously
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: parity-processbot <>
* update libp2p to 0.52.0
* proto name now must implement `AsRef<str>`
* update libp2p version everywhere
* ToSwarm, FromBehaviour, ToBehaviour
also LocalProtocolsChange and RemoteProtocolsChange
* new NetworkBehaviour invariants
* replace `Vec<u8>` with `StreamProtocol`
* rename ConnectionHandlerEvent::Custom to NotifyBehaviour
* remove DialError & ListenError invariants
also fix pending_events
* use connection_limits::Behaviour
See https://github.com/libp2p/rust-libp2p/pull/3885
* impl `void::Void` for `BehaviourOut`
also use `Behaviour::with_codec`
* KademliaHandler no longer public
* fix StreamProtocol construction
* update libp2p-identify to 0.2.0
* remove non-existing methods from PollParameters
rename ConnectionHandlerUpgrErr to StreamUpgradeError
* `P2p` now contains `PeerId`, not `Multihash`
* use multihash-codetable crate
* update Cargo.lock
* reformat text
* comment out tests for now
* remove `.into()` from P2p
* confirm observed addr manually
See https://github.com/libp2p/rust-libp2p/blob/master/protocols/identify/CHANGELOG.md#0430
* remove SwarmEvent::Banned
since we're not using `ban_peer_id`, this can be safely removed.
we may want to introduce `libp2p::allow_block_list` module in the future.
* fix imports
* replace `libp2p` with smaller deps in network-gossip
* bring back tests
* finish rewriting tests
* uncomment handler tests
* Revert "uncomment handler tests"
This reverts commit 720a06815887f4e10767c62b58864a7ec3a48e50.
* add a fixme
* update Cargo.lock
* remove extra From
* make void uninhabited
* fix discovery test
* use autonat protocols
confirming external addresses manually is unsafe in open networks
* fix SyncNotificationsClogged invariant
* only set server mode manually in tests
doubt that we need to set it on node since we're adding public addresses
* address @dmitry-markin comments
* remove autonat
* removed unused var
* fix EOL
* update smallvec and sha2
in attempt to compile polkadot
* bump k256
in attempt to build cumulus
---------
Co-authored-by: parity-processbot <>
* Accept only `--in-peers` many inbound full nodes in `SyncingEngine`
Due to full and light nodes being stored in the same set, it's possible
that `SyncingEngine` accepts more than `--in-peers` many inbound full
nodes which leaves some of its outbound slots unoccupied.
`ProtocolController` still tries to occupy these slots by opening
outbound substreams. As these substreams are accepted by the remote peer,
the connection is relayed to `SyncingEngine` which rejects the node
because it's already full. This in turn results in the substream being
inactive and the peer getting evicted.
Fixing this properly would require relocating the light peer slot
allocation away from `ProtocolController` or alternatively moving entire
the substream validation there, both of which are epic refactorings and
not necessarily in line with other goals. As a temporary measure, verify
in `SyncingEngine` that it doesn't accept more than the specified amount
of inbound full peers.
* Fix tests
* Apply review comments
* Don't start evicting peers right after `SyncingEngine` is started
Parachain collators may need to wait to receive a relaychain block before
they can start producing blocks which can cause `SyncingEngine` to
incorrectly evict them.
When `SyncingEngine` is started, wait 2 minutes before the eviction is
activated to give collators a chance to produce a block.
* fix doc
* Use `continue` instead of `break`
* Trigger CI
---------
Co-authored-by: parity-processbot <>