feat: initialize Kurdistan SDK - independent fork of Polkadot SDK

2025-12-13 15:44:15 +03:00
commit e4778b4576
6838 changed files with 1847450 additions and 0 deletions
@@ -0,0 +1,315 @@
+# Subsystem benchmark client
+
+Run teyrchain consensus stress and performance tests on your development machine or in CI.
+
+## Motivation
+
+The teyrchain consensus node implementation spans across many modules which we call subsystems. Each subsystem is
+responsible for a small part of logic of the teyrchain consensus pipeline, but in general the most load and
+performance issues are localized in just a few core subsystems like `availability-recovery`, `approval-voting` or
+`dispute-coordinator`. In the absence of such a tool, we would run large test nets to load/stress test these parts of
+the system. Setting up and making sense of the amount of data produced by such a large test is very expensive, hard
+to orchestrate and is a huge development time sink.
+
+This tool aims to solve the problem by making it easy to:
+
+- set up and run core subsystem load tests locally on your development machine
+- iterate and conclude faster when benchmarking new optimizations or comparing implementations
+- automate and keep track of performance regressions in CI runs
+- simulate various networking topologies, bandwidth and connectivity issues
+
+## Test environment setup
+
+`cargo build --profile=testnet --bin subsystem-bench -p pezkuwi-subsystem-bench`
+
+The output binary will be placed in `target/testnet/subsystem-bench`.
+
+### Test metrics
+
+Subsystem, CPU usage and network metrics are exposed via a prometheus endpoint during the test execution.
+A small subset of these collected metrics are displayed in the CLI, but for an in depth analysis of the test results,
+a local Grafana/Prometheus stack is needed.
+
+### Run Prometheus, Pyroscope and Graphana in Docker
+
+If docker is not usable, then follow the next sections to manually install Prometheus, Pyroscope and Graphana
+on your machine.
+
+```bash
+cd pezkuwi/node/subsystem-bench/docker
+docker compose up
+```
+
+### Install Prometheus
+
+Please follow the [official installation guide](https://prometheus.io/docs/prometheus/latest/installation/) for your
+platform/OS.
+
+After successfully installing and starting up Prometheus, we need to alter it's configuration such that it
+will scrape the benchmark prometheus endpoint `127.0.0.1:9999`. Please check the prometheus official documentation
+regarding the location of `prometheus.yml`. On MacOS for example the full path `/opt/homebrew/etc/prometheus.yml`
+
+prometheus.yml:
+
+```
+global:
+  scrape_interval: 5s
+
+scrape_configs:
+  - job_name: "prometheus"
+    static_configs:
+    - targets: ["localhost:9090"]
+  - job_name: "subsystem-bench"
+    scrape_interval: 0s500ms
+    static_configs:
+    - targets: ['localhost:9999']
+```
+
+To complete this step restart Prometheus server such that it picks up the new configuration.
+
+### Install Pyroscope
+
+To collect CPU profiling data, you must be running the Pyroscope server.
+Follow the [installation guide](https://grafana.com/docs/pyroscope/latest/get-started/)
+relevant to your operating system.
+
+### Install Grafana
+
+Follow the [installation guide](https://grafana.com/docs/grafana/latest/setup-grafana/installation/) relevant
+to your operating system.
+
+### Setup Grafana
+
+Once you have the installation up and running, configure the local Prometheus and Pyroscope (if needed)
+as data sources by following these guides:
+
+- [Prometheus](https://grafana.com/docs/grafana/latest/datasources/prometheus/configure-prometheus-data-source/)
+- [Pyroscope](https://grafana.com/docs/grafana/latest/datasources/grafana-pyroscope/)
+
+If you are running the servers in Docker, use the following URLs:
+
+- Prometheus `http://prometheus:9090/`
+- Pyroscope `http://pyroscope:4040/`
+
+#### Import dashboards
+
+Follow [this guide](https://grafana.com/docs/grafana/latest/dashboards/manage-dashboards/#export-and-import-dashboards)
+to import the dashboards from the repository `grafana` folder.
+
+### Standard test options
+
+```
+$ subsystem-bench --help
+Usage: subsystem-bench [OPTIONS] <PATH>
+
+Arguments:
+  <PATH>  Path to the test sequence configuration file
+
+Options:
+      --profile                                        Enable CPU Profiling with Pyroscope
+      --pyroscope-url <PYROSCOPE_URL>                  Pyroscope Server URL [default: http://localhost:4040]
+      --pyroscope-sample-rate <PYROSCOPE_SAMPLE_RATE>  Pyroscope Sample Rate [default: 113]
+      --cache-misses                                   Enable Cache Misses Profiling with Valgrind. Linux only, Valgrind must be in the PATH
+  -h, --help                                           Print help
+```
+
+## How to run a test
+
+To run a test, you need to use a path to a test objective:
+
+```
+target/testnet/subsystem-bench pezkuwi/node/subsystem-bench/examples/availability_read.yaml
+```
+
+Note: test objectives may be wrapped up into a test sequence.
+It is typically used to run a suite of tests like in this [example](examples/availability_read.yaml).
+
+### Understanding the test configuration
+
+A single test configuration `TestConfiguration` struct applies to a single run of a certain test objective.
+
+The configuration describes the following important parameters that influence the test duration and resource
+usage:
+
+- how many validators are on the emulated network (`n_validators`)
+- how many cores per block the subsystem will have to do work on (`n_cores`)
+- for how many blocks the test should run (`num_blocks`)
+
+From the perspective of the subsystem under test, this means that it will receive an `ActiveLeavesUpdate` signal
+followed by an arbitrary amount of messages. This process repeats itself for `num_blocks`. The messages are generally
+test payloads pre-generated before the test run, or constructed on pre-generated payloads. For example the
+`AvailabilityRecoveryMessage::RecoverAvailableData` message includes a `CandidateReceipt` which is generated before
+the test is started.
+
+### Example run
+
+Let's run an availability read test which will recover availability for 200 cores with max PoV size on a 1000
+node validator network.
+
+<!-- markdownlint-disable line-length -->
+
+```
+target/testnet/subsystem-bench pezkuwi/node/subsystem-bench/examples/availability_write.yaml
+[2024-02-19T14:10:32.981Z INFO  subsystem_bench] Sequence contains 1 step(s)
+[2024-02-19T14:10:32.981Z INFO  subsystem-bench::cli] Step 1/1
+[2024-02-19T14:10:32.981Z INFO  subsystem-bench::cli] [objective = DataAvailabilityWrite] n_validators = 1000, n_cores = 200, pov_size = 5120 - 5120, connectivity = 75, latency = Some(PeerLatency { mean_latency_ms: 30, std_dev: 2.0 })
+[2024-02-19T14:10:32.982Z INFO  subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
+[2024-02-19T14:10:33.106Z INFO  subsystem-bench::availability] Created test environment.
+[2024-02-19T14:10:33.106Z INFO  subsystem-bench::availability] Pre-generating 600 candidates.
+[2024-02-19T14:10:34.096Z INFO  subsystem-bench::network] Initializing emulation for a 1000 peer network.
+[2024-02-19T14:10:34.096Z INFO  subsystem-bench::network] connectivity 75%, latency Some(PeerLatency { mean_latency_ms: 30, std_dev: 2.0 })
+[2024-02-19T14:10:34.098Z INFO  subsystem-bench::network] Network created, connected validator count 749
+[2024-02-19T14:10:34.099Z INFO  subsystem-bench::availability] Seeding availability store with candidates ...
+[2024-02-19T14:10:34.100Z INFO  substrate_prometheus_endpoint] 〽️ Prometheus exporter started at 127.0.0.1:9999
+[2024-02-19T14:10:34.387Z INFO  subsystem-bench::availability] Done
+[2024-02-19T14:10:34.387Z INFO  subsystem-bench::availability] Current block #1
+[2024-02-19T14:10:34.389Z INFO  subsystem-bench::availability] Waiting for all emulated peers to receive their chunk from us ...
+[2024-02-19T14:10:34.625Z INFO  subsystem-bench::availability] All chunks received in 237ms
+[2024-02-19T14:10:34.626Z INFO  pezkuwi_subsystem_bench::availability] Waiting for 749 bitfields to be received and processed
+[2024-02-19T14:10:35.710Z INFO  subsystem-bench::availability] All bitfields processed
+[2024-02-19T14:10:35.710Z INFO  subsystem-bench::availability] All work for block completed in 1322ms
+[2024-02-19T14:10:35.710Z INFO  subsystem-bench::availability] Current block #2
+[2024-02-19T14:10:35.712Z INFO  subsystem-bench::availability] Waiting for all emulated peers to receive their chunk from us ...
+[2024-02-19T14:10:35.947Z INFO  subsystem-bench::availability] All chunks received in 236ms
+[2024-02-19T14:10:35.947Z INFO  pezkuwi_subsystem_bench::availability] Waiting for 749 bitfields to be received and processed
+[2024-02-19T14:10:37.038Z INFO  subsystem-bench::availability] All bitfields processed
+[2024-02-19T14:10:37.038Z INFO  subsystem-bench::availability] All work for block completed in 1328ms
+[2024-02-19T14:10:37.039Z INFO  subsystem-bench::availability] Current block #3
+[2024-02-19T14:10:37.040Z INFO  subsystem-bench::availability] Waiting for all emulated peers to receive their chunk from us ...
+[2024-02-19T14:10:37.276Z INFO  subsystem-bench::availability] All chunks received in 237ms
+[2024-02-19T14:10:37.276Z INFO  pezkuwi_subsystem_bench::availability] Waiting for 749 bitfields to be received and processed
+[2024-02-19T14:10:38.362Z INFO  subsystem-bench::availability] All bitfields processed
+[2024-02-19T14:10:38.362Z INFO  subsystem-bench::availability] All work for block completed in 1323ms
+[2024-02-19T14:10:38.362Z INFO  subsystem-bench::availability] All blocks processed in 3974ms
+[2024-02-19T14:10:38.362Z INFO  subsystem-bench::availability] Avg block time: 1324 ms
+[2024-02-19T14:10:38.362Z INFO  teyrchain::availability-store] received `Conclude` signal, exiting
+[2024-02-19T14:10:38.362Z INFO  teyrchain::bitfield-distribution] Conclude
+[2024-02-19T14:10:38.362Z INFO  subsystem-bench::network] Downlink channel closed, network interface task exiting
+
+pezkuwi/node/subsystem-bench/examples/availability_write.yaml #1 DataAvailabilityWrite
+
+Network usage, KiB                     total   per block
+Received from peers                12922.000    4307.333
+Sent to peers                      47705.000   15901.667
+
+CPU usage, seconds                     total   per block
+availability-distribution              0.045       0.015
+bitfield-distribution                  0.104       0.035
+availability-store                     0.304       0.101
+Test environment                       3.213       1.071
+```
+
+`Block time` in the current context has a different meaning. It measures the amount of time it
+took the subsystem to finish processing all of the messages sent in the context of the current test block.
+
+### Test logs
+
+You can select log target, subtarget and verbosity just like with PezkuwiChain node CLI, simply setting
+`RUST_LOOG="teyrchain=debug"` turns on debug logs for all teyrchain consensus subsystems in the test.
+
+### View test metrics
+
+Assuming the Grafana/Prometheus stack installation steps completed successfully, you should be able to
+view the test progress in real time by accessing [this link](http://localhost:3000/goto/SM5B8pNSR?orgId=1).
+
+Now run
+`target/testnet/subsystem-bench test-sequence --path pezkuwi/node/subsystem-bench/examples/availability_read.yaml`
+and view the metrics in real time and spot differences between different `n_validators` values.
+
+### Profiling cache misses
+
+Cache misses are profiled using Cachegrind, part of Valgrind. Cachegrind runs slowly, and its cache simulation is basic
+and unlikely to reflect the behavior of a modern machine. However, it still represents the general situation with cache
+usage, and more importantly it doesn't require a bare-metal machine to run on, which means it could be run in CI or in
+a remote virtual installation.
+
+To profile cache misses use the `--cache-misses` flag. Cache simulation of current runs tuned for Intel Ice Lake CPU.
+Since the execution will be very slow, it's recommended not to run it together with other profiling and not to take
+benchmark results into account. A report is saved in a file `cachegrind_report.txt`.
+
+Example run results:
+
+```
+$ target/testnet/subsystem-bench --cache-misses cache-misses-data-availability-read.yaml
+$ cat cachegrind_report.txt
+I refs:        64,622,081,485
+I1  misses:         3,018,168
+LLi misses:           437,654
+I1  miss rate:           0.00%
+LLi miss rate:           0.00%
+
+D refs:        12,161,833,115  (9,868,356,364 rd   + 2,293,476,751 wr)
+D1  misses:       167,940,701  (   71,060,073 rd   +    96,880,628 wr)
+LLd misses:        33,550,018  (   16,685,853 rd   +    16,864,165 wr)
+D1  miss rate:            1.4% (          0.7%     +           4.2%  )
+LLd miss rate:            0.3% (          0.2%     +           0.7%  )
+
+LL refs:          170,958,869  (   74,078,241 rd   +    96,880,628 wr)
+LL misses:         33,987,672  (   17,123,507 rd   +    16,864,165 wr)
+LL miss rate:             0.0% (          0.0%     +           0.7%  )
+```
+
+The results show that 1.4% of the L1 data cache missed, but the last level cache only missed 0.3% of the time.
+Instruction data of the L1 has 0.00%.
+
+Cachegrind writes line-by-line cache profiling information to a file named `cachegrind.out.<pid>`.
+This file is best interpreted with `cg_annotate --auto=yes cachegrind.out.<pid>`. For more information see the
+[cachegrind manual](https://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-40/Nice/RuleRefinement/bin/valgrind-3.2.0/docs/html/cg-manual.html).
+
+For finer profiling of cache misses, better use `perf` on a bare-metal machine.
+
+### Profile memory usage using jemalloc
+
+Bellow you can find instructions how to setup and run profiling with jemalloc, this is complementary
+with using other memory profiling tools like: <https://github.com/koute/bytehound?tab=readme-ov-file#basic-usage>.
+
+#### Prerequisites
+
+Install tooling with:
+
+```
+sudo apt install libjemalloc-dev graphviz
+```
+
+#### Generate memory usage snapshots
+
+Memory usage can be profiled by running any subsystem benchmark with `--features memprofile`, e.g:
+
+```
+RUSTFLAGS=-g cargo run -p pezkuwi-subsystem-bench --release --features memprofile -- pezkuwi/node/subsystem-bench/examples/approvals_throughput.yaml
+```
+
+#### Interpret the results
+
+After the benchmark ran the memory usage snapshots can be found in `/tmp/subsystem-bench*`, to extract the information
+from a snapshot you can use `jeprof` like this:
+
+```
+jeprof --text PATH_TO_EXECUTABLE_WITH_DEBUG_SYMBOLS /tmp/subsystem-bench.1222895.199.i199.heap > statistics.txt
+```
+
+Useful links:
+
+- Tutorial: <https://www.magiroux.com/rust-jemalloc-profiling/>
+- Jemalloc configuration options: <https://jemalloc.net/jemalloc.3.html>
+
+## Create new test objectives
+
+This tool is intended to make it easy to write new test objectives that focus individual subsystems,
+or even multiple subsystems (for example `approval-distribution` and `approval-voting`).
+
+A special kind of test objectives are performance regression tests for the CI pipeline. These should be sequences
+of tests that check the performance characteristics (such as CPU usage, speed) of the subsystem under test in both
+happy and negative scenarios (low bandwidth, network errors and low connectivity).
+
+### Reusable test components
+
+To faster write a new test objective you need to use some higher level wrappers and logic: `TestEnvironment`,
+`TestConfiguration`, `TestAuthorities`, `NetworkEmulator`. To create the `TestEnvironment` you will
+need to also build an `Overseer`, but that should be easy using the mockups for subsystems in `mock`.
+
+### Mocking
+
+Ideally we want to have a single mock implementation for subsystems that can be minimally configured to
+be used in different tests. A good example is `runtime-api` which currently only responds to session information
+requests based on static data. It can be easily extended to service other requests.