Original PR https://github.com/paritytech/substrate/pull/14746 --- ## Fixing stall ### Introduction I experienced an apparent stall downloading state from `https://rococo-try-runtime-node.parity-chains.parity.io:443` which was having networking difficulties only responding to my JSONRPC requests with 50-200KB/s of bandwidth. This PR fixes the issue causing the stall, and generally improves performance remote-ext when it downloads state by greatly reducing the chances of a timeout occuring. ### Description Introduces a new `REQUEST_DURATION_TARGET` constant and modifies `get_storage_data_dynamic_batch_size` to - Increase or decrease the batch size of the next request depending on whether the elapsed time of the last request was gt or lt the target - Reset the batch size to 1 if the request times out This fixes an issue on slow connections that can otherwise cause multiple timeouts and a stalled download when: 1. The batch size increases rapidly as remote-ext downloads keys with small associated storage values 2. remote-ext tries to process a large series of subsequent keys all with extremely large associated storage values (Rococo has a series of keys 1-5MB large) 3. The huge storage values download for 5 minutes until the request times out 4. The partially downloaded keys are thrown out and remote-ext tries again with a smaller batch size, but the batch size is still far too large and takes 5 minutes to be reduced again 5. The download will be essentially stalled for many hours while the above step cycles After this PR, the request size will - Not grow as large to begin with, as it is regulated downwards as the request duration exceeds the target - Drop immediately to 1 if the request times out. A timeout indicates the keys next in line to download have extremely large storage values compared to previously downloaded keys, and we need to reset the batch size to figure out what our new ideal batch size is. By not resetting down to 1, we risk the next request timing out again. ## Reducing memory As suggested by @bkchr, I adjusted `get_storage_data_dynamic_batch_size` from being recursive to a loop which allows removing a bunch of clones that were chewing through a lot of memory. I noticed actually it was using up to 50GB swap previously when downloading Polkadot keys on a slow connection, because it needed to recurse and clone a lot. After this change it uses only ~1.5GB memory.
Substrate
Substrate is a next-generation framework for blockchain innovation 🚀.
Getting Started
Head to docs.substrate.io and follow the installation
instructions. Then try out one of the tutorials. Refer to the Docker
instructions to quickly run Substrate, Substrate Node Template, Subkey, or to build a chain spec.
Community & Support
Join the highly active and supportive community on the Substrate Stack Exchange to ask questions about use and problems you run into using this software. Please do report bugs and issues here for anything you suspect requires action in the source.
Contributions & Code of Conduct
Please follow the contributions guidelines as outlined in
docs/CONTRIBUTING.md. In all
communications and contributions, this project follows the Contributor Covenant Code of
Conduct.
Security
The security policy and procedures can be found in
docs/SECURITY.md.
License
- Substrate Primitives (
sp-*), Frame (frame-*) and the pallets (pallets-*), binaries (/bin) and all other utilities are licensed under Apache 2.0. - Substrate Client (/client/*/sc-*) is licensed under GPL v3.0 with a classpath linking exception.
The reason for the split-licensing is to ensure that for the vast majority of teams using Substrate to create feature-chains, then all changes can be made entirely in Apache2-licensed code, allowing teams full freedom over what and how they release and giving licensing clarity to commercial teams.
In the interests of the community, we require any deeper improvements made to Substrate's core logic (e.g. Substrate's internal consensus, crypto or database code) to be contributed back so everyone can benefit.
