This PR brings the fix
https://github.com/paritytech/substrate/pull/13396 to polkadot-sdk.
In the past, due to insufficient inbound slot count on polkadot &
kusama, this fix led to low peer count. The situation has improved since
then after changing the default ratio between `--in-peers` &
`--out-peers`.
Nevertheless, it's expected that the reported total peer count with this
fix is going to be lower than without it. This should be seen as the
correct number of working connections reported, as opposed to also
reporting already closed connections, and not as lower count of working
connections with peers.
This PR also removes the peer eviction mechanism, as closed substream
detection is a more granular way of detecting peers that stopped syncing
with us.
The burn-in has been already performed as part of testing these changes
in https://github.com/paritytech/polkadot-sdk/pull/3426.
---------
Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
This is a tiny PR to increase the time a peer remains banned.
A peer is banned when the reputation drops below a threshold.
With every second, the peer reputation is exponentially decayed towards
zero.
For the previous setup:
- decaying to zero from (i32::MAX or i32::MIN) would take 948 seconds
(15mins 48seconds)
- from i32::MIN to escaping the banned threshold would take 10 seconds
This means we are decaying reputation a bit too aggressive and
misbehaving peers can misbehave again in 10 seconds.
Another side effect of this is that we have encountered multiple
warnings caused by a few misbehaving peers.
In the new setup:
- decaying to zero from (i32::MAX or i32::MIN) would take 3544 seconds
(59 minutes)
- from i32::MIN to escaping the banned threshold would take ~69 seconds
This is a followup of:
- https://github.com/paritytech/polkadot-sdk/pull/4000.
### Testing Done
- Created a misbehaving client with
[subp2p-explorer](https://github.com/lexnv/subp2p-explorer), the client
is banned for approx 69seconds until it is allowed to connect again.
cc @paritytech/networking
---------
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
[litep2p](https://github.com/altonen/litep2p) is a libp2p-compatible P2P
networking library. It supports all of the features of `rust-libp2p`
that are currently being utilized by Polkadot SDK.
Compared to `rust-libp2p`, `litep2p` has a quite different architecture
which is why the new `litep2p` network backend is only able to use a
little of the existing code in `sc-network`. The design has been mainly
influenced by how we'd wish to structure our networking-related code in
Polkadot SDK: independent higher-levels protocols directly communicating
with the network over links that support bidirectional backpressure. A
good example would be `NotificationHandle`/`RequestResponseHandle`
abstractions which allow, e.g., `SyncingEngine` to directly communicate
with peers to announce/request blocks.
I've tried running `polkadot --network-backend litep2p` with a few
different peer configurations and there is a noticeable reduction in
networking CPU usage. For high load (`--out-peers 200`), networking CPU
usage goes down from ~110% to ~30% (80 pp) and for normal load
(`--out-peers 40`), the usage goes down from ~55% to ~18% (37 pp).
These should not be taken as final numbers because:
a) there are still some low-hanging optimization fruits, such as
enabling [receive window
auto-tuning](https://github.com/libp2p/rust-yamux/pull/176), integrating
`Peerset` more closely with `litep2p` or improving memory usage of the
WebSocket transport
b) fixing bugs/instabilities that incorrectly cause `litep2p` to do less
work will increase the networking CPU usage
c) verification in a more diverse set of tests/conditions is needed
Nevertheless, these numbers should give an early estimate for CPU usage
of the new networking backend.
This PR consists of three separate changes:
* introduce a generic `PeerId` (wrapper around `Multihash`) so that we
don't have use `NetworkService::PeerId` in every part of the code that
uses a `PeerId`
* introduce `NetworkBackend` trait, implement it for the libp2p network
stack and make Polkadot SDK generic over `NetworkBackend`
* implement `NetworkBackend` for litep2p
The new library should be considered experimental which is why
`rust-libp2p` will remain as the default option for the time being. This
PR currently depends on the master branch of `litep2p` but I'll cut a
new release for the library once all review comments have been
addresses.
---------
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: Alexandru Vasile <alexandru.vasile@parity.io>
Currently, all protocols use the same metric name for
`mpsc-notification-to-protocol` this is bad because we can't actually
tell which protocol might cause problems.
This patch proposes we derive the name of the metric from the protocol
name, so that we have separate metrics for each protocol and properly
detect which one is having problem processing its messages.
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
**Update:** Pushed additional changes based on the review comments.
**This pull request fixes various spelling mistakes in this
repository.**
Most of the changes are contained in the first **3** commits:
- `Fix spelling mistakes in comments and docs`
- `Fix spelling mistakes in test names`
- `Fix spelling mistakes in error messages, panic messages, logs and
tracing`
Other source code spelling mistakes are separated into individual
commits for easier reviewing:
- `Fix the spelling of 'authority'`
- `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`
- `Fix the spelling of 'prev_enqueud_messages'`
- `Fix the spelling of 'endpoint'`
- `Fix the spelling of 'children'`
- `Fix the spelling of 'PenpalSiblingSovereignAccount'`
- `Fix the spelling of 'PenpalSudoAccount'`
- `Fix the spelling of 'insufficient'`
- `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`
- `Fix the spelling of 'subtracted'`
- `Fix the spelling of 'CandidatePendingAvailability'`
- `Fix the spelling of 'exclusive'`
- `Fix the spelling of 'until'`
- `Fix the spelling of 'discriminator'`
- `Fix the spelling of 'nonexistent'`
- `Fix the spelling of 'subsystem'`
- `Fix the spelling of 'indices'`
- `Fix the spelling of 'committed'`
- `Fix the spelling of 'topology'`
- `Fix the spelling of 'response'`
- `Fix the spelling of 'beneficiary'`
- `Fix the spelling of 'formatted'`
- `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`
- `Fix the spelling of 'succeeded'`
- `Fix the spelling of 'reopened'`
- `Fix the spelling of 'proposer'`
- `Fix the spelling of 'InstantiationNonce'`
- `Fix the spelling of 'depositor'`
- `Fix the spelling of 'expiration'`
- `Fix the spelling of 'phantom'`
- `Fix the spelling of 'AggregatedKeyValue'`
- `Fix the spelling of 'randomness'`
- `Fix the spelling of 'defendant'`
- `Fix the spelling of 'AquaticMammal'`
- `Fix the spelling of 'transactions'`
- `Fix the spelling of 'PassingTracingSubscriber'`
- `Fix the spelling of 'TxSignaturePayload'`
- `Fix the spelling of 'versioning'`
- `Fix the spelling of 'descendant'`
- `Fix the spelling of 'overridden'`
- `Fix the spelling of 'network'`
Let me know if this structure is adequate.
**Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
`Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
as it is.
~~**Note:** In some places the term `Receival` is used to refer to
message reception, IMO `Reception` is the correct word here, but I left
it as it is.~~
~~**Note:** In some places the term `Overlayed` is used instead of the
more acceptable version `Overlaid` but I also left it as it is.~~
~~**Note:** In some places the term `Applyable` is used instead of the
correct version `Applicable` but I also left it as it is.~~
**Note:** Some usage of British vs American english e.g. `judgement` vs
`judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
are both present in different places, but I suppose that's
understandable given the number of contributors.
~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
triggers errors in CI when I make changes to it, so I left it as it
is.~~
Previously, it was only possible to retry the same request on a
different protocol name that had the exact same binary payloads.
Introduce a way of trying a different request on a different protocol if
the first one fails with Unsupported protocol.
This helps with adding new req-response versions in polkadot while
preserving compatibility with unupgraded nodes.
The way req-response protocols were bumped previously was that they were
bundled with some other notifications protocol upgrade, like for async
backing (but that is more complicated, especially if the feature does
not require any changes to a notifications protocol). Will be needed for
implementing https://github.com/polkadot-fellows/RFCs/pull/47
TODO:
- [x] add tests
- [x] add guidance docs in polkadot about req-response protocol
versioning
# Description
Trivial change that resolves
https://github.com/paritytech/polkadot-sdk/issues/2185.
Since there was a mix of `who` and `peer_id` argument names nearby I
changed them all to `peer_id`.
# Checklist
- [x] My PR includes a detailed description as outlined in the
"Description" section above
- [x] My PR follows the [labeling requirements](CONTRIBUTING.md#Process)
of this project (at minimum one label for `T`
required)
- [x] I have made corresponding changes to the documentation (if
applicable)
- [ ] I have added tests that prove my fix is effective or that my
feature works (if applicable)
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
This commit introduces a new concept called `NotificationService` which
allows Polkadot protocols to communicate with the underlying
notification protocol implementation directly, without routing events
through `NetworkWorker`. This implies that each protocol has its own
service which it uses to communicate with remote peers and that each
`NotificationService` is unique with respect to the underlying
notification protocol, meaning `NotificationService` for the transaction
protocol can only be used to send and receive transaction-related
notifications.
The `NotificationService` concept introduces two additional benefits:
* allow protocols to start using custom handshakes
* allow protocols to accept/reject inbound peers
Previously the validation of inbound connections was solely the
responsibility of `ProtocolController`. This caused issues with light
peers and `SyncingEngine` as `ProtocolController` would accept more
peers than `SyncingEngine` could accept which caused peers to have
differing views of their own states. `SyncingEngine` would reject excess
peers but these rejections were not properly communicated to those peers
causing them to assume that they were accepted.
With `NotificationService`, the local handshake is not sent to remote
peer if peer is rejected which allows it to detect that it was rejected.
This commit also deprecates the use of `NetworkEventStream` for all
notification-related events and going forward only DHT events are
provided through `NetworkEventStream`. If protocols wish to follow each
other's events, they must introduce additional abtractions, as is done
for GRANDPA and transactions protocols by following the syncing protocol
through `SyncEventStream`.
Fixes https://github.com/paritytech/polkadot-sdk/issues/512
Fixes https://github.com/paritytech/polkadot-sdk/issues/514
Fixes https://github.com/paritytech/polkadot-sdk/issues/515
Fixes https://github.com/paritytech/polkadot-sdk/issues/554
Fixes https://github.com/paritytech/polkadot-sdk/issues/556
---
These changes are transferred from
https://github.com/paritytech/substrate/pull/14197 but there are no
functional changes compared to that PR
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
This PR moves syncing-related code from `sc-network-common` to
`sc-network-sync`.
Unfortunately, some parts are tightly integrated with networking, so
they were left in `sc-network-common` for now:
1. `SyncMode` in `common/src/sync.rs` (used in `NetworkConfiguration`).
2. `BlockAnnouncesHandshake`, `BlockRequest`, `BlockResponse`, etc. in
`common/src/sync/message.rs` (used in `src/protocol.rs` and
`src/protocol/message.rs`).
More substantial refactoring is needed to decouple syncing and
networking completely, including getting rid of the hardcoded sync
protocol.
## Release notes
Move syncing-related code from `sc-network-common` to `sc-network-sync`.
Delete `ChainSync` trait as it's never used (the only implementation is
accessed directly from `SyncingEngine` and exposes a lot of public
methods that are not part of the trait). Some new trait(s) for syncing
will likely be introduced as part of Sync 2.0 refactoring to represent
syncing strategies.
See #1345, <https://github.com/paritytech/substrate/pull/14207>.
This adds all the necessary mixnet components, and puts them together in
the "kitchen-sink" node/runtime. The components added are:
- A pallet (`frame/mixnet`). This is responsible for determining the
current mixnet session and phase, and the mixnodes to use in each
session. It provides a function that validators can call to register a
mixnode for the next session. The logic of this pallet is very similar
to that of the `im-online` pallet.
- A service (`client/mixnet`). This implements the core mixnet logic,
building on the `mixnet` crate. The service communicates with other
nodes using notifications sent over the "mixnet" protocol.
- An RPC interface. This currently only supports sending transactions
over the mixnet.
---------
Co-authored-by: David Emett <dave@sp4m.net>
Co-authored-by: Javier Viola <javier@parity.io>
* substrate: peer_store: log error on disconnecting because of reputation
Disconnecting and banning a peer because of negative reputation is
usually an indicative of one of two things:
1. We've got a bug that forces disconnects.
2. We've got malicious peers that try to attack us.
We both cases I don't think we should hide this behind a trace log
and we should log errors, so that things are easy to notice and
debug/mitigated.
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* Move from error to warn
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* update libp2p to 0.52.0
* proto name now must implement `AsRef<str>`
* update libp2p version everywhere
* ToSwarm, FromBehaviour, ToBehaviour
also LocalProtocolsChange and RemoteProtocolsChange
* new NetworkBehaviour invariants
* replace `Vec<u8>` with `StreamProtocol`
* rename ConnectionHandlerEvent::Custom to NotifyBehaviour
* remove DialError & ListenError invariants
also fix pending_events
* use connection_limits::Behaviour
See https://github.com/libp2p/rust-libp2p/pull/3885
* impl `void::Void` for `BehaviourOut`
also use `Behaviour::with_codec`
* KademliaHandler no longer public
* fix StreamProtocol construction
* update libp2p-identify to 0.2.0
* remove non-existing methods from PollParameters
rename ConnectionHandlerUpgrErr to StreamUpgradeError
* `P2p` now contains `PeerId`, not `Multihash`
* use multihash-codetable crate
* update Cargo.lock
* reformat text
* comment out tests for now
* remove `.into()` from P2p
* confirm observed addr manually
See https://github.com/libp2p/rust-libp2p/blob/master/protocols/identify/CHANGELOG.md#0430
* remove SwarmEvent::Banned
since we're not using `ban_peer_id`, this can be safely removed.
we may want to introduce `libp2p::allow_block_list` module in the future.
* fix imports
* replace `libp2p` with smaller deps in network-gossip
* bring back tests
* finish rewriting tests
* uncomment handler tests
* Revert "uncomment handler tests"
This reverts commit 720a06815887f4e10767c62b58864a7ec3a48e50.
* add a fixme
* update Cargo.lock
* remove extra From
* make void uninhabited
* fix discovery test
* use autonat protocols
confirming external addresses manually is unsafe in open networks
* fix SyncNotificationsClogged invariant
* only set server mode manually in tests
doubt that we need to set it on node since we're adding public addresses
* address @dmitry-markin comments
* remove autonat
* removed unused var
* fix EOL
* update smallvec and sha2
in attempt to compile polkadot
* bump k256
in attempt to build cumulus
---------
Co-authored-by: parity-processbot <>
* Accept only `--in-peers` many inbound full nodes in `SyncingEngine`
Due to full and light nodes being stored in the same set, it's possible
that `SyncingEngine` accepts more than `--in-peers` many inbound full
nodes which leaves some of its outbound slots unoccupied.
`ProtocolController` still tries to occupy these slots by opening
outbound substreams. As these substreams are accepted by the remote peer,
the connection is relayed to `SyncingEngine` which rejects the node
because it's already full. This in turn results in the substream being
inactive and the peer getting evicted.
Fixing this properly would require relocating the light peer slot
allocation away from `ProtocolController` or alternatively moving entire
the substream validation there, both of which are epic refactorings and
not necessarily in line with other goals. As a temporary measure, verify
in `SyncingEngine` that it doesn't accept more than the specified amount
of inbound full peers.
* Fix tests
* Apply review comments
* frame-benchmarking-cli: Remove native dispatch requirement
No need for this, we can just use the `WasmExecutor` directly.
* Fixes
* Pass benchmarking host functions
* Ensure we can pass custom host functions
This improves the reporting of invalid boot nodes. First, it will only report each boot node once
as invalid and not every time we try to connect to the node. Second, the node will only report for
addresses that we added as startup and not for addresses of the boot node that the node learned from
other nodes.
Closes: https://github.com/paritytech/substrate/issues/13584
Closes: https://github.com/paritytech/polkadot/issues/7385
* expose kademlia replication factor through node CLI
* set default CLI flag value for kademlia_replication_factor
Co-authored-by: Bastian Köcher <git@kchr.de>
* wrap CLI value as Option
* make kademlia replication factor non-optional
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
* client/network: upgrade to libp2p 0.51.0
* make discovery.rs compile
* make peer_info.rs compile
* changes to notifications and request-response proto
* make service.rs compile
* towards making request_responses.rs compile
* make request_responses.rs compile
* make request_responses.rs compile
* fix notifications/behaviour.rs tests
* fix warnings
* remove old code
* allow deprecated code (temporary)
* upgrade to libp2p 0.51.1
* add TODO for behaviour tests
* return empty vec if peer_id is absent
https://github.com/paritytech/substrate/pull/13587#discussion_r1141695167
fyi: I don't really know what the old behaviour was.
* update comment to reflect new defaults
Closes#13338
* Revert "update comment to reflect new defaults"
This reverts commit 7a981abd69308e9d522ec94905f181439a1b1dba.
* remove config.rs (from wrong merge)
* upgrade to libp2p 0.51.2
* fix formatting
* use handle_pending_outbound_connection in networt_state RPC
* update deps
* use re-exports when we use other libp2p packages
* Apply suggestions from code review
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* format code
* handle potential errors in network_state RPC
* only update libp2p crate
* update libp2p-core
* fix docs
* use libp2p-identity instead of libp2p
where it's possible. libp2p-identity is much smaller, hence makes sense
to use it instead of larger libp2p crate.
* Update client/network/src/discovery.rs
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
* update Cargo.lock
* add comment for per_connection_event_buffer_size
current value is somewhat arbitrary and needs to be tweaked depending on
memory usage and network worker sleep stats.
* fix link format
* update Cargo.lock
* upgrade to libp2p 0.51.3
* deprecate mplex
* Revert "deprecate mplex"
This reverts commit 9e25820e706e464a0e962a8604861fcb2a7641eb.
* Revert "upgrade to libp2p 0.51.3"
This reverts commit 6544dd4138e2f89517bd7c7281fc78a638ec7040.
* use new libp2p version in `statement` crate
* pin version temporarily
* libp2p 0.51.3
* deprecate mplex
* deprecate legacy noise handshake
* fix build error
* update libp2p-identity
* enable libp2p-identity:ed25519 feature in sc-consensus
* enable ed25519 for peerset as well
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com>
Co-authored-by: parity-processbot <>
* Prepare `sc-network` for `ProtocolController`/`NotificationService`
The upcoming notification protocol refactoring requires that protocols
are able to communicate with `sc-network` over unique and direct links.
This means that `sc-network` side of the link has to be created before
`sc-network` is initialized and that it is allowed to consume the object
as the receiver half of the link may not implement `Clone`.
Remove request-response and notification protocols from `NetworkConfiguration`
and create a new object that contains the configurations of these protocols
and which is consumable by `sc-network`. This is needed needed because, e.g.,
the receiver half of `NotificationService` is not clonable so `sc-network`
must consume it when it's initializing the protocols in `Notifications`.
Similar principe applies to `PeerStore`/`ProtocolController`: as per current
design, protocols are created before the network so `Protocol` cannot be
the one creating the `PeerStore` object. `FullNetworkConfiguration` will be
used to store the objects that `sc-network` will use to communicate with
protocols and it will also allow protocols to allocate handles so they
can directly communicate with `sc-network`.
* Fixes
* Update client/service/src/builder.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Updates
* Doc updates + cargo-fmt
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Poll the substream validation before polling `Notifications`
In tests, it can happen that `Notifications` doesn't produce any events
which causes `poll()` to return `Poll::Pending` and the substream
validation futures won't get polled.
Poll the futures before calling `Notifications` so results for substream
validations are received even if `Notifications` is not producing any
events.
* Remove `pending_messages`
* Remove unused import
* Attempt to relieve pressure on `mpsc_network_worker`
`SyncingEngine` interacting with `NetworkWorker` can put a lot of strain
on the channel if the number of inbound connections is high. This is
because `SyncingEngine` is notified of each inbound substream which it
then can either accept or reject and this causes a lot of message
exchange on the already busy channel.
Use a direct channel pair between `Protocol` and `SyncingEngine`
to exchange notification events. It is a temporary change to alleviate
the problems caused by syncing being an independent protocol and the
fix will be removed once `NotificationService` is implemented.
* Apply review comments
* fixes
* trigger ci
* Fix tests
Verify that both peers have a connection now that the validation goes
through `SyncingEngine`. Depending on how the tasks are scheduled,
one of them might not have the peer registered in `SyncingEngine` at which
point the test won't make any progress because block announcement received
from an unknown peer is discarded.
Move polling of `ChainSync` at the end of the function so that if a block
announcement causes a block request to be sent, that can be sent in the
same call to `SyncingEngine::poll()`.
---------
Co-authored-by: parity-processbot <>
* Move service tests to `client/network/tests`
These tests depend on `sc-network` and `sc-network-sync` so they should
live outside the crate.
* Move some configs from `sc-network-common` to `sc-network`
* Move `NetworkService` traits to `sc-network`
* Move request-responses to `sc-network`
* Remove more stuff
* Remove rest of configs from `sc-network-common` to `sc-network`
* Remove more stuff
* Fix warnings
* Update client/network/src/request_responses.rs
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Fix cargo doc
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>