The authority-discovery mechanism has implemented a few exponential
timers for:
- publishing the authority records
- goes from 2 seconds (when freshly booted) to 1 hour if the node is
long-running
- set to 1 hour after successfully publishing the authority record
- discovering other authority records
- goes from 2 seconds (when freshly booted) to 10 minutes if the node is
long-running
This PR resets the exponential publishing and discovery interval to
defaults ensuring that long-running nodes:
- will retry publishing the authority records as aggressively as freshly
booted nodes
- Currently, if a long-running node fails to publish the DHT record when
the keys change (ie DhtEvent::ValuePutFailed), it will only retry after
1 hour
- will rediscover other authorities faster (since there is a chance that
other authority keys changed)
The subp2p-explorer has difficulties discovering the authorities when
the authority set changes in the first few hours. This might be entirely
due to the recursive nature of the DHT and the needed time to propagate
the records. However, there is a small chance that the authority
publishing failed and is only retried in 1h.
Let me know if this makes sense 🙏
cc @paritytech/networking
---------
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
* Change copyright year to 2023 from 2022
* Fix incorrect update of copyright year
* Remove years from copy right header
* Fix remaining files
* Fix typo in a header and remove update-copyright.sh
* Run cargo fmt on the whole code base
* Second run
* Add CI check
* Fix compilation
* More unnecessary braces
* Handle weights
* Use --all
* Use correct attributes...
* Fix UI tests
* AHHHHHHHHH
* 🤦
* Docs
* Fix compilation
* 🤷
* Please stop
* 🤦 x 2
* More
* make rustfmt.toml consistent with polkadot
Co-authored-by: André Silva <andrerfosilva@gmail.com>
* client/authority-discovery: Publish and query on exponential interval
When a node starts up publishing and querying might fail due to various
reasons, for example due to being not yet fully bootstrapped on the DHT.
Thus one should retry rather sooner than later. On the other hand, a
long running node is likely well connected and thus timely retries are
not needed. For this reasoning use an exponentially increasing interval
for `publish_interval`, `query_interval` and
`priority_group_set_interval` instead of a constant interval.
* client/authority-discovery/src/interval.rs: Add license header
* .maintain/gitlab: Ensure adder collator tests are run on CI