# Description
Trivial change that resolves
https://github.com/paritytech/polkadot-sdk/issues/2185.
Since there was a mix of `who` and `peer_id` argument names nearby I
changed them all to `peer_id`.
# Checklist
- [x] My PR includes a detailed description as outlined in the
"Description" section above
- [x] My PR follows the [labeling requirements](CONTRIBUTING.md#Process)
of this project (at minimum one label for `T`
required)
- [x] I have made corresponding changes to the documentation (if
applicable)
- [ ] I have added tests that prove my fix is effective or that my
feature works (if applicable)
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
This commit introduces a new concept called `NotificationService` which
allows Polkadot protocols to communicate with the underlying
notification protocol implementation directly, without routing events
through `NetworkWorker`. This implies that each protocol has its own
service which it uses to communicate with remote peers and that each
`NotificationService` is unique with respect to the underlying
notification protocol, meaning `NotificationService` for the transaction
protocol can only be used to send and receive transaction-related
notifications.
The `NotificationService` concept introduces two additional benefits:
* allow protocols to start using custom handshakes
* allow protocols to accept/reject inbound peers
Previously the validation of inbound connections was solely the
responsibility of `ProtocolController`. This caused issues with light
peers and `SyncingEngine` as `ProtocolController` would accept more
peers than `SyncingEngine` could accept which caused peers to have
differing views of their own states. `SyncingEngine` would reject excess
peers but these rejections were not properly communicated to those peers
causing them to assume that they were accepted.
With `NotificationService`, the local handshake is not sent to remote
peer if peer is rejected which allows it to detect that it was rejected.
This commit also deprecates the use of `NetworkEventStream` for all
notification-related events and going forward only DHT events are
provided through `NetworkEventStream`. If protocols wish to follow each
other's events, they must introduce additional abtractions, as is done
for GRANDPA and transactions protocols by following the syncing protocol
through `SyncEventStream`.
Fixes https://github.com/paritytech/polkadot-sdk/issues/512
Fixes https://github.com/paritytech/polkadot-sdk/issues/514
Fixes https://github.com/paritytech/polkadot-sdk/issues/515
Fixes https://github.com/paritytech/polkadot-sdk/issues/554
Fixes https://github.com/paritytech/polkadot-sdk/issues/556
---
These changes are transferred from
https://github.com/paritytech/substrate/pull/14197 but there are no
functional changes compared to that PR
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Get rid of public `ChainSync::..._requests()` functions and return all
requests as actions.
---------
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
All `ChainSync` actions that `SyncingEngine` should perform are unified
under one `ChainSyncAction`. Processing of these actions put into a
single place after `select!` in `SyncingEngine::run` instead of multiple
places where calling `ChainSync` methods.
The `BlockBuilderProvider` was a trait that was defined in
`sc-block-builder`. The trait was implemented for `Client`. This
basically meant that you needed to import `sc-block-builder` any way to
have access to the block builder. So, this trait was not providing any
real value. This pull request is removing the said trait. Instead of the
trait it introduces a builder for creating a `BlockBuilder`. The builder
currently has the quite fabulous name `BlockBuilderBuilder` (I'm open to
any better name 😅). The rest of the pull request is about
replacing the old trait with the new builder.
# Downstream code changes
If you used `new_block` or `new_block_at` before you now need to switch
it over to the new `BlockBuilderBuilder` pattern:
```rust
// `new` requires a type that implements `CallApiAt`.
let mut block_builder = BlockBuilderBuilder::new(client)
// Then you need to specify the hash of the parent block the block will be build on top of
.on_parent_block(at)
// The block builder also needs the block number of the parent block.
// Here it is fetched from the given `client` using the `HeaderBackend`
// However, there also exists `with_parent_block_number` for directly passing the number
.fetch_parent_block_number(client)
.unwrap()
// Enable proof recording if required. This call is optional.
.enable_proof_recording()
// Pass the digests. This call is optional.
.with_inherent_digests(digests)
.build()
.expect("Creates new block builder");
```
---------
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Co-authored-by: command-bot <>
This PR moves syncing-related code from `sc-network-common` to
`sc-network-sync`.
Unfortunately, some parts are tightly integrated with networking, so
they were left in `sc-network-common` for now:
1. `SyncMode` in `common/src/sync.rs` (used in `NetworkConfiguration`).
2. `BlockAnnouncesHandshake`, `BlockRequest`, `BlockResponse`, etc. in
`common/src/sync/message.rs` (used in `src/protocol.rs` and
`src/protocol/message.rs`).
More substantial refactoring is needed to decouple syncing and
networking completely, including getting rid of the hardcoded sync
protocol.
## Release notes
Move syncing-related code from `sc-network-common` to `sc-network-sync`.
Delete `ChainSync` trait as it's never used (the only implementation is
accessed directly from `SyncingEngine` and exposes a lot of public
methods that are not part of the trait). Some new trait(s) for syncing
will likely be introduced as part of Sync 2.0 refactoring to represent
syncing strategies.
The change adds a test to show the failure scenario that caused #1812 to
be rolled back (more context:
https://github.com/paritytech/polkadot-sdk/issues/493#issuecomment-1772009924)
Summary of the scenario:
1. Node has finished downloading up to block 1000 from the peers, from
the canonical chain.
2. Peers are undergoing re-org around this time. One of the peers has
switched to a non-canonical chain, announces block 1001 from that chain
3. Node downloads 1001 from the peer, and tries to import which would
fail (as we don't have the parent block 1000 from the other chain)
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
When retrieving the ready blocks, verify that the parent of the first
ready block is on chain. If the parent is not on chain, we are
downloading from a fork. In this case, keep downloading until we have a
parent on chain (common ancestor).
Resolves https://github.com/paritytech/polkadot-sdk/issues/493.
---------
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
Submit the outstanding PRs from the old repos(these were already
reviewed and approved before the repo rorg, but not yet submitted):
Main PR: https://github.com/paritytech/substrate/pull/14014
Companion PRs: https://github.com/paritytech/polkadot/pull/7134,
https://github.com/paritytech/cumulus/pull/2489
The changes in the PR:
1. ChainSync currently calls into the block request handler directly.
Instead, move the block request handler behind a trait. This allows new
protocols to be plugged into ChainSync.
2. BuildNetworkParams is changed so that custom relay protocol
implementations can be (optionally) passed in during network creation
time. If custom protocol is not specified, it defaults to the existing
block handler
3. BlockServer and BlockDownloader traits are introduced for the
protocol implementation. The existing block handler has been changed to
implement these traits
4. Other changes:
[X] Make TxHash serializable. This is needed for exchanging the
serialized hash in the relay protocol messages
[X] Clean up types no longer used(OpaqueBlockRequest,
OpaqueBlockResponse)
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: command-bot <>
* Make peer evictions less aggressive
The original implementation of peer eviction prioritized aliveness over
connection stability which made the peer count unstable for some users.
As this may cause discomfort or infrastructure alerts if stability is
tracked, adjust the eviction to be less aggressive by only evicting
peers when the node has fully stalled. This causes the node to have some
peers who are inactive and won't send any block announcements.
These nodes are removed if the local node is able to receive at least
one block announcement from one of its peers as the inactivity of the
substream is detected when a notification is sent.
If the node won't send or receive any block annoucements for 30 seconds,
it's considered stalled and it will evict all peers,
causing `ProtocolController` to accept and establish connections from new
peers.
* Update client/network/sync/src/engine.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Track last send and received notification simultaneously
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: parity-processbot <>
* Accept only `--in-peers` many inbound full nodes in `SyncingEngine`
Due to full and light nodes being stored in the same set, it's possible
that `SyncingEngine` accepts more than `--in-peers` many inbound full
nodes which leaves some of its outbound slots unoccupied.
`ProtocolController` still tries to occupy these slots by opening
outbound substreams. As these substreams are accepted by the remote peer,
the connection is relayed to `SyncingEngine` which rejects the node
because it's already full. This in turn results in the substream being
inactive and the peer getting evicted.
Fixing this properly would require relocating the light peer slot
allocation away from `ProtocolController` or alternatively moving entire
the substream validation there, both of which are epic refactorings and
not necessarily in line with other goals. As a temporary measure, verify
in `SyncingEngine` that it doesn't accept more than the specified amount
of inbound full peers.
* Fix tests
* Apply review comments
* Don't start evicting peers right after `SyncingEngine` is started
Parachain collators may need to wait to receive a relaychain block before
they can start producing blocks which can cause `SyncingEngine` to
incorrectly evict them.
When `SyncingEngine` is started, wait 2 minutes before the eviction is
activated to give collators a chance to produce a block.
* fix doc
* Use `continue` instead of `break`
* Trigger CI
---------
Co-authored-by: parity-processbot <>
* Prepare `sc-network` for `ProtocolController`/`NotificationService`
The upcoming notification protocol refactoring requires that protocols
are able to communicate with `sc-network` over unique and direct links.
This means that `sc-network` side of the link has to be created before
`sc-network` is initialized and that it is allowed to consume the object
as the receiver half of the link may not implement `Clone`.
Remove request-response and notification protocols from `NetworkConfiguration`
and create a new object that contains the configurations of these protocols
and which is consumable by `sc-network`. This is needed needed because, e.g.,
the receiver half of `NotificationService` is not clonable so `sc-network`
must consume it when it's initializing the protocols in `Notifications`.
Similar principe applies to `PeerStore`/`ProtocolController`: as per current
design, protocols are created before the network so `Protocol` cannot be
the one creating the `PeerStore` object. `FullNetworkConfiguration` will be
used to store the objects that `sc-network` will use to communicate with
protocols and it will also allow protocols to allocate handles so they
can directly communicate with `sc-network`.
* Fixes
* Update client/service/src/builder.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Updates
* Doc updates + cargo-fmt
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Pin ci-linux image for rust 1.69
* Update ui tests for rust 1.69
* Address new rust 1.69 clippy lints
* `derive_hash_xor_eq` has been renamed to `derived_hash_with_manual_eq`
* The new `extra-unused-type-parameters` complains about a bunch of
callsites where extraneous type parameters are used for consistency
with other functions.
Before this wasn't printing a warning for `VerificationFailed` when there wasn't a peer assigned to
the error message. However, if there happens an error on importing the state there wouldn't be any
error. This pr improves the situation to also print an error in this case. Besides that there are
some other cleanups.
* Evict inactive peers from `SyncingEngine`
If both halves of the block announce notification stream have been
inactive for 2 minutes, report the peer and disconnect it, allowing
`SyncingEngine` to free up a slot for some other peer that hopefully
is more active.
This needs to be done because the node may falsely believe it has open
connections to peers because the inbound substream can be closed without
any notification and closed outbound substream is noticed only when node
attempts to write to it which may not happen if the node has nothing to
send.
* zzz
* wip
* Evict peers only when timeout expires
* Use `debug!()`
---------
Co-authored-by: parity-processbot <>
* Keep track of the pending response for each peer individually
When peer disconnects or the syncing is restarted, remove the pending
response so syncing won't start sending duplicate requests/receive stale
responses from disconnected peers.
Before this commit pending responses where stored in `FuturesUnordered`
which made it hard to keep track of pending responses for each individual
peer.
* Update client/network/sync/src/lib.rs
Co-authored-by: Bastian Köcher <git@kchr.de>
* ".git/.scripts/commands/fmt/fmt.sh"
* Apply suggestions from code review
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
* Update client/network/sync/src/lib.rs
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: command-bot <>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
* Attempt to relieve pressure on `mpsc_network_worker`
`SyncingEngine` interacting with `NetworkWorker` can put a lot of strain
on the channel if the number of inbound connections is high. This is
because `SyncingEngine` is notified of each inbound substream which it
then can either accept or reject and this causes a lot of message
exchange on the already busy channel.
Use a direct channel pair between `Protocol` and `SyncingEngine`
to exchange notification events. It is a temporary change to alleviate
the problems caused by syncing being an independent protocol and the
fix will be removed once `NotificationService` is implemented.
* Apply review comments
* fixes
* trigger ci
* Fix tests
Verify that both peers have a connection now that the validation goes
through `SyncingEngine`. Depending on how the tasks are scheduled,
one of them might not have the peer registered in `SyncingEngine` at which
point the test won't make any progress because block announcement received
from an unknown peer is discarded.
Move polling of `ChainSync` at the end of the function so that if a block
announcement causes a block request to be sent, that can be sent in the
same call to `SyncingEngine::poll()`.
---------
Co-authored-by: parity-processbot <>
`Protocol` is not a reliable source for the information of connected
peers because it doesn't have real-time information of the actual
connectivity state because it's not resposible for accepting/rejecting
connections and gets that information with delay from `SyncinEngine`.
* Move service tests to `client/network/tests`
These tests depend on `sc-network` and `sc-network-sync` so they should
live outside the crate.
* Move some configs from `sc-network-common` to `sc-network`
* Move `NetworkService` traits to `sc-network`
* Move request-responses to `sc-network`
* Remove more stuff
* Remove rest of configs from `sc-network-common` to `sc-network`
* Remove more stuff
* Fix warnings
* Update client/network/src/request_responses.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Fix cargo doc
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Move import queue out of `sc-network`
Add supplementary asynchronous API for the import queue which means
it can be run as an independent task and communicated with through
the `ImportQueueService`.
This commit removes removes block and justification imports from
`sc-network` and provides `ChainSync` with a handle to import queue so
it can import blocks and justifications. Polling of the import queue is
moved complete out of `sc-network` and `sc_consensus::Link` is
implemented for `ChainSyncInterfaceHandled` so the import queue
can still influence the syncing process.
* Move stuff to SyncingEngine
* Move `ChainSync` instanation to `SyncingEngine`
Some of the tests have to be rewritten
* Move peer hashmap to `SyncingEngine`
* Let `SyncingEngine` to implement `ChainSyncInterface`
* Introduce `SyncStatusProvider`
* Move `sync_peer_(connected|disconnected)` to `SyncingEngine`
* Implement `SyncEventStream`
Remove `SyncConnected`/`SyncDisconnected` events from
`NetworkEvenStream` and provide those events through
`ChainSyncInterface` instead.
Modify BEEFY/GRANDPA/transactions protocol and `NetworkGossip` to take
`SyncEventStream` object which they listen to for incoming sync peer
events.
* Introduce `ChainSyncInterface`
This interface provides a set of miscellaneous functions that other
subsystems can use to query, for example, the syncing status.
* Move event stream polling to `SyncingEngine`
Subscribe to `NetworkStreamEvent` and poll the incoming notifications
and substream events from `SyncingEngine`.
The code needs refactoring.
* Make `SyncingEngine` into an asynchronous runner
This commits removes the last hard dependency of syncing from
`sc-network` meaning the protocol now lives completely outside of
`sc-network`, ignoring the hardcoded peerset entry which will be
addressed in the future.
Code needs a lot of refactoring.
* Fix warnings
* Code refactoring
* Use `SyncingService` for BEEFY
* Use `SyncingService` for GRANDPA
* Remove call delegation from `NetworkService`
* Remove `ChainSyncService`
* Remove `ChainSync` service tests
They were written for the sole purpose of verifying that `NetworWorker`
continues to function while the calls are being dispatched to
`ChainSync`.
* Refactor code
* Refactor code
* Update client/finality-grandpa/src/communication/tests.rs
Co-authored-by: Anton <anton.kalyaev@gmail.com>
* Fix warnings
* Apply review comments
* Fix docs
* Fix test
* cargo-fmt
* Update client/network/sync/src/engine.rs
Co-authored-by: Anton <anton.kalyaev@gmail.com>
* Update client/network/sync/src/engine.rs
Co-authored-by: Anton <anton.kalyaev@gmail.com>
* Add missing docs
* Refactor code
---------
Co-authored-by: Anton <anton.kalyaev@gmail.com>
* improve error message
* removed unused argument
* docs: disconnect_peer_inner no longer accepts `ban`
* remove redundant trace message
```
sync: Too many full nodes, rejecting 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ
sync: 12D3KooWSQAP2fh4qBkLXBW4mvCtbAiK8sqMnExWHHTZtVAxZ8bQ disconnected
```
is enough to understand that we've refused to connect to the given peer
* Revert "removed unused argument"
This reverts commit c87f755b1fd03494fb446b604fe25c2418da7c87.
* ban peer for 10s after disconnect
* do not accept incoming conns if peer was banned
* Revert "do not accept incoming conns if peer was banned"
This reverts commit 7e59d05975765f2547468e9dcfd1361516c41e06.
* Revert "ban peer for 10s after disconnect"
This reverts commit 3859201ced42a5b2d18c0600e29efd20962a7289.
* Revert "Revert "removed unused argument""
This reverts commit f1dc623646dc5a69e1822c35f428e90dffe34d95.
* format code
* Revert "remove redundant trace message"
This reverts commit a87e65f08553dbe69027e9aa4f7ca4779ccaa7f2.
* Change copyright year to 2023 from 2022
* Fix incorrect update of copyright year
* Remove years from copy right header
* Fix remaining files
* Fix typo in a header and remove update-copyright.sh
* `BlockId` removal: `BlockBuilderProvider::new_block_at`
It changes the arguments of `BlockBuilderProvider::new_block_at` from:
`BlockId<Block>` to: `Block::Hash`
* fmt
* fix
* more fixes
* Convert `NetworkWorker::poll()` into async `next_action()`
* Use `NetworkWorker::next_action` instead of `poll` in `sc-network-test`
* Revert "Use `NetworkWorker::next_action` instead of `poll` in `sc-network-test`"
This reverts commit 4b5d851ec864f78f9d083a18a618fbe117c896d2.
* Fix `sc-network-test` to poll `NetworkWorker::next_action`
* Fix `sc_network::service` tests to poll `NetworkWorker::next_action`
* Fix docs
* kick CI
* Factor out `next_worker_message()` & `next_swarm_event()`
* Error handling: replace `futures::pending!()` with `expect()`
* Simplify stream polling in `select!`
* Replace `NetworkWorker::next_action()` with `run()`
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <git@kchr.de>
* minor: comment
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <git@kchr.de>
* Print debug log when network future is shut down
* Evaluate `NetworkWorker::run()` future once before the loop
* Fix client code to match new `NetworkService` interfaces
* Make clippy happy
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <git@kchr.de>
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <git@kchr.de>
* Revert "Apply suggestions from code review"
This reverts commit 9fa646d0ed613e5f8623d3d37d1d59ec0a535850.
* Make `NetworkWorker::run()` consume `self`
* Terminate system RPC future if RPC rx stream has terminated.
* Rewrite with let-else
* Fix comments
* Get `best_seen_block` and call `on_block_finalized` via `ChainSync` instead of `NetworkService`
* rustfmt
* make clippy happy
* Tests: schedule wake if `next_action()` returned true
* minor: comment
* minor: fix `NetworkWorker` rustdoc
* minor: amend the rustdoc
* Fix bug that caused `on_demand_beefy_justification_sync` test to hang
* rustfmt
* Apply review suggestions
---------
Co-authored-by: Bastian Köcher <git@kchr.de>