This commit is contained in:
bkchr
2024-01-09 00:53:13 +00:00
parent 4619ac005f
commit 86c13db5b4
4 changed files with 60 additions and 124 deletions
+29 -61
View File
@@ -3839,8 +3839,7 @@ modularized_registry.sort(|a, b| {
</tbody></table>
</div>
<h2 id="summary-15"><a class="header" href="#summary-15">Summary</a></h2>
<p>Propose a way of permuting the availability chunk indices assigned to validators for a given core and relay
chain block, in the context of
<p>Propose a way of permuting the availability chunk indices assigned to validators, in the context of
<a href="https://github.com/paritytech/polkadot-sdk/issues/598">recovering available data from systematic chunks</a>, with the
purpose of fairly distributing network bandwidth usage.</p>
<h2 id="motivation-15"><a class="header" href="#motivation-15">Motivation</a></h2>
@@ -3860,40 +3859,11 @@ resulting code.
<a href="https://github.com/paritytech/reed-solomon-novelpoly">The implementation of the erasure coding algorithm used for polkadot's availability data</a> is systematic.
Roughly speaking, the first N_VALIDATORS/3 chunks of data can be cheaply concatenated to retrieve the original data,
without running the resource-intensive and time-consuming reconstruction algorithm.</p>
<p>Here's the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
(minus error handling, for briefness):</p>
<pre><pre class="playground"><code class="language-rust"><span class="boring">#![allow(unused)]
</span><span class="boring">fn main() {
</span>pub fn reconstruct_from_systematic&lt;T: Decode&gt;(
n_validators: usize,
chunks: Vec&lt;&amp;[u8]&gt;,
) -&gt; T {
let threshold = systematic_threshold(n_validators);
let shard_len = chunks.iter().next().unwrap().len();
let mut systematic_bytes = Vec::with_capacity(shard_len * threshold);
for i in (0..shard_len).step_by(2) {
for chunk in chunks.iter().take(threshold) {
systematic_bytes.push(chunk[i]);
systematic_bytes.push(chunk[i + 1]);
}
}
Decode::decode(&amp;mut &amp;systematic_bytes[..]).unwrap()
}
fn systematic_threshold(n_validators: usize) -&gt; usize {
let mut threshold = (n_validators - 1) / 3;
if !is_power_of_two(threshold) {
threshold = next_lower_power_of_2(threshold);
}
threshold
}
<span class="boring">}</span></code></pre></pre>
<p>You can find the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
<a href="https://github.com/paritytech/reed-solomon-novelpoly/blob/be3751093e60adc20c19967f5443158552829011/reed-solomon-novelpoly/src/novel_poly_basis/mod.rs#L247">here</a></p>
<p>In a nutshell, it performs a column-wise concatenation with 2-byte chunks.
The output could be zero-padded at the end, so scale decoding must be aware of the expected length in bytes and ignore
trailing zeros.</p>
trailing zeros (this assertion is already being made for regular reconstruction).</p>
<h3 id="availability-recovery-at-present"><a class="header" href="#availability-recovery-at-present">Availability recovery at present</a></h3>
<p>According to the <a href="https://spec.polkadot.network/chapter-anv#sect-candidate-recovery">polkadot protocol spec</a>:</p>
<blockquote>
@@ -3924,19 +3894,16 @@ can be used as a backup to retrieve a couple of missing systematic chunks, befor
chunks.</p>
<h3 id="chunk-assignment-function"><a class="header" href="#chunk-assignment-function">Chunk assignment function</a></h3>
<h4 id="properties"><a class="header" href="#properties">Properties</a></h4>
<p>The function that decides the chunk index for a validator should be parameterized by at least
<code>(validator_index, block_number, core_index)</code>
<p>The function that decides the chunk index for a validator will be parameterized by at least
<code>(validator_index, core_index)</code>
and have the following properties:</p>
<ol>
<li>deterministic</li>
<li>relatively quick to compute and resource-efficient.</li>
<li>when considering the other params besides <code>validator_index</code> as fixed, the function should describe a permutation
of the chunk indices</li>
<li>considering <code>block_number</code> as a fixed argument, the validators that map to the first N/3 chunk indices should
have as little overlap as possible for different paras scheduled on that relay parent.</li>
<li>when considering a fixed <code>core_index</code>, the function should describe a permutation of the chunk indices</li>
<li>the validators that map to the first N/3 chunk indices should have as little overlap as possible for different cores.</li>
</ol>
<p>In other words, we want a uniformly distributed, deterministic mapping from <code>ValidatorIndex</code> to <code>ChunkIndex</code> per block
per core.</p>
<p>In other words, we want a uniformly distributed, deterministic mapping from <code>ValidatorIndex</code> to <code>ChunkIndex</code> per core.</p>
<p>It's desirable to not embed this function in the runtime, for performance and complexity reasons.
However, this means that the function needs to be kept very simple and with minimal or no external dependencies.
Any change to this function could result in parachains being stalled and needs to be coordinated via a runtime upgrade
@@ -3952,7 +3919,7 @@ or governance call.</p>
core_index: CoreIndex
) -&gt; ChunkIndex {
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let core_start_pos = abs(core_index - block_number) * threshold;
let core_start_pos = core_index * threshold;
(core_start_pos + validator_index) % n_validators
}
@@ -4002,20 +3969,21 @@ pub struct ChunkResponse {
}
<span class="boring">}</span></code></pre></pre>
<p>An important thing to note is that in version 1, the <code>ValidatorIndex</code> value is always equal to the <code>ChunkIndex</code>.
Until the feature is enabled, this will also be true for version 2. However, after the feature is enabled,
this will generally not be true.</p>
Until the chunk rotation feature is enabled, this will also be true for version 2. However, after the feature is
enabled, this will generally not be true.</p>
<p>The requester will send the request to validator with index <code>V</code>. The responder will map the <code>V</code> validator index to the
<code>C</code> chunk index and respond with the <code>C</code>-th chunk.</p>
<code>C</code> chunk index and respond with the <code>C</code>-th chunk. This mapping can be seamless, by having each validator store their
chunk by <code>ValidatorIndex</code> (just as before).</p>
<p>The protocol implementation MAY check the returned <code>ChunkIndex</code> against the expected mapping to ensure that
it received the right chunk.
In practice, this is desirable during availability-distribution and systematic chunk recovery. However, regular
recovery may not check this index, which is particularly useful when participating in disputes that don't allow
for easy access to the validator-&gt;chunk mapping. See <a href="proposed/0047-assignment-of-availability-chunks.html#appendix-a">Appendix A</a> for more details.</p>
<p>In any case, the requester MUST verify the chunk's proof using the provided index.</p>
<p>During availability-recovery, given that the requester may not know (if the mapping is not available) whether the received chunk corresponds to
the requested validator index, it has to keep track of received chunk indices and ignore duplicates. Such duplicates
should be considered the same as an invalid/garbage response (drop it and move on to the next validator - we can't
punish via reputation changes, because we don't know which validator misbehaved).</p>
<p>During availability-recovery, given that the requester may not know (if the mapping is not available) whether the
received chunk corresponds to the requested validator index, it has to keep track of received chunk indices and ignore
duplicates. Such duplicates should be considered the same as an invalid/garbage response (drop it and move on to the
next validator - we can't punish via reputation changes, because we don't know which validator misbehaved).</p>
<h3 id="upgrade-path"><a class="header" href="#upgrade-path">Upgrade path</a></h3>
<h4 id="step-1-enabling-new-network-protocol"><a class="header" href="#step-1-enabling-new-network-protocol">Step 1: Enabling new network protocol</a></h4>
<p>In the beginning, both <code>/req_chunk/1</code> and <code>/req_chunk/2</code> will be supported, until all validators and
@@ -4030,10 +3998,10 @@ functionality-wise.
It needs to be explicitly stated that after the governance enactment, validators that run older client versions that
don't support this mapping will not be able to participate in parachain consensus.</p>
<p>Additionally, an error will be logged when starting a validator with an older version, after the feature was enabled.</p>
<p>On the other hand, collators will not be required to upgrade in this step, as regular chunk recovery will work as before,
granted that version 1 of the networking protocol has been removed. Note that collators only perform
availability-recovery in rare, adversarial scenarios, so it is fine to not optimise for this case and let them upgrade
at their own pace.</p>
<p>On the other hand, collators will not be required to upgrade in this step (but are still require to upgrade for step 1),
as regular chunk recovery will work as before, granted that version 1 of the networking protocol has been removed.
Note that collators only perform availability-recovery in rare, adversarial scenarios, so it is fine to not optimise for
this case and let them upgrade at their own pace.</p>
<p>To support enabling this feature via the runtime, we will use the <code>NodeFeatures</code> bitfield of the <code>HostConfiguration</code>
struct (added in <code>https://github.com/paritytech/polkadot-sdk/pull/2177</code>). Adding and enabling a feature
with this scheme does not require a runtime upgrade, but only a referendum that issues a
@@ -4045,9 +4013,9 @@ validator-&gt;chunk mapping ceases to be a 1:1 mapping and systematic recovery m
very complicated (See <a href="proposed/0047-assignment-of-availability-chunks.html#appendix-a">appendix A</a>). This RFC assumes that availability-recovery processes initiated during
disputes will only use regular recovery, as before. This is acceptable since disputes are rare occurrences in practice
and is something that can be optimised later, if need be. Adding the <code>core_index</code> to the <code>CandidateReceipt</code> would
mitigate this problem and will likely be needed in the future for CoreJam.
<a href="https://forum.polkadot.network/t/pre-rfc-discussion-candidate-receipt-format-v2/3738">Related discussion about <code>CandidateReceipt</code></a></li>
<li>It's a breaking change that requires all validators and collators to upgrade their node version.</li>
mitigate this problem and will likely be needed in the future for CoreJam and/or Elastic scaling.
<a href="https://forum.polkadot.network/t/pre-rfc-discussion-candidate-receipt-format-v2/3738">Related discussion about updating <code>CandidateReceipt</code></a></li>
<li>It's a breaking change that requires all validators and collators to upgrade their node version at least once.</li>
</ul>
<h2 id="testing-security-and-privacy-14"><a class="header" href="#testing-security-and-privacy-14">Testing, Security, and Privacy</a></h2>
<p>Extensive testing will be conducted - both automated and manual.
@@ -4063,15 +4031,13 @@ halved and total POV recovery time decrease by 80% for large POVs. See more
<p>Not applicable.</p>
<h3 id="compatibility-10"><a class="header" href="#compatibility-10">Compatibility</a></h3>
<p>This is a breaking change. See <a href="proposed/0047-assignment-of-availability-chunks.html#upgrade-path">upgrade path</a> section above.
All validators need to have upgraded their node versions before the feature will be enabled via a runtime upgrade
All validators and collators need to have upgraded their node versions before the feature will be enabled via a
governance call.</p>
<h2 id="prior-art-and-references-14"><a class="header" href="#prior-art-and-references-14">Prior Art and References</a></h2>
<p>See comments on the <a href="https://github.com/paritytech/polkadot-sdk/issues/598">tracking issue</a> and the
<a href="https://github.com/paritytech/polkadot-sdk/pull/1644">in-progress PR</a></p>
<h2 id="unresolved-questions-13"><a class="header" href="#unresolved-questions-13">Unresolved Questions</a></h2>
<ul>
<li>Is there a better upgrade path that would preserve backwards compatibility?</li>
</ul>
<p>Not applicable.</p>
<h2 id="future-directions-and-related-material-11"><a class="header" href="#future-directions-and-related-material-11">Future Directions and Related Material</a></h2>
<p>This enables future optimisations for the performance of availability recovery, such as retrieving batched systematic
chunks from backers/approval-checkers.</p>
@@ -4114,6 +4080,8 @@ the reported core_index was indeed the one occupied by the candidate at the resp
<p>Another attempt could be to include in the message the relay block hash where the candidate was included.
This information would be used in order to query the runtime API and retrieve the core index that the candidate was
occupying. However, considering it's part of an unimported fork, the validator cannot call a runtime API on that block.</p>
<p>Adding the <code>core_index</code> to the <code>CandidateReceipt</code> would solve this problem and would enable systematic recovery for all
dispute scenarios.</p>
<div style="break-before: page; page-break-before: always;"></div><p><a href="https://github.com/polkadot-fellows/RFCs/pull/59">(source)</a></p>
<p><strong>Table of Contents</strong></p>
<ul>
@@ -216,8 +216,7 @@
</tbody></table>
</div>
<h2 id="summary"><a class="header" href="#summary">Summary</a></h2>
<p>Propose a way of permuting the availability chunk indices assigned to validators for a given core and relay
chain block, in the context of
<p>Propose a way of permuting the availability chunk indices assigned to validators, in the context of
<a href="https://github.com/paritytech/polkadot-sdk/issues/598">recovering available data from systematic chunks</a>, with the
purpose of fairly distributing network bandwidth usage.</p>
<h2 id="motivation"><a class="header" href="#motivation">Motivation</a></h2>
@@ -237,40 +236,11 @@ resulting code.
<a href="https://github.com/paritytech/reed-solomon-novelpoly">The implementation of the erasure coding algorithm used for polkadot's availability data</a> is systematic.
Roughly speaking, the first N_VALIDATORS/3 chunks of data can be cheaply concatenated to retrieve the original data,
without running the resource-intensive and time-consuming reconstruction algorithm.</p>
<p>Here's the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
(minus error handling, for briefness):</p>
<pre><pre class="playground"><code class="language-rust"><span class="boring">#![allow(unused)]
</span><span class="boring">fn main() {
</span>pub fn reconstruct_from_systematic&lt;T: Decode&gt;(
n_validators: usize,
chunks: Vec&lt;&amp;[u8]&gt;,
) -&gt; T {
let threshold = systematic_threshold(n_validators);
let shard_len = chunks.iter().next().unwrap().len();
let mut systematic_bytes = Vec::with_capacity(shard_len * threshold);
for i in (0..shard_len).step_by(2) {
for chunk in chunks.iter().take(threshold) {
systematic_bytes.push(chunk[i]);
systematic_bytes.push(chunk[i + 1]);
}
}
Decode::decode(&amp;mut &amp;systematic_bytes[..]).unwrap()
}
fn systematic_threshold(n_validators: usize) -&gt; usize {
let mut threshold = (n_validators - 1) / 3;
if !is_power_of_two(threshold) {
threshold = next_lower_power_of_2(threshold);
}
threshold
}
<span class="boring">}</span></code></pre></pre>
<p>You can find the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
<a href="https://github.com/paritytech/reed-solomon-novelpoly/blob/be3751093e60adc20c19967f5443158552829011/reed-solomon-novelpoly/src/novel_poly_basis/mod.rs#L247">here</a></p>
<p>In a nutshell, it performs a column-wise concatenation with 2-byte chunks.
The output could be zero-padded at the end, so scale decoding must be aware of the expected length in bytes and ignore
trailing zeros.</p>
trailing zeros (this assertion is already being made for regular reconstruction).</p>
<h3 id="availability-recovery-at-present"><a class="header" href="#availability-recovery-at-present">Availability recovery at present</a></h3>
<p>According to the <a href="https://spec.polkadot.network/chapter-anv#sect-candidate-recovery">polkadot protocol spec</a>:</p>
<blockquote>
@@ -301,19 +271,16 @@ can be used as a backup to retrieve a couple of missing systematic chunks, befor
chunks.</p>
<h3 id="chunk-assignment-function"><a class="header" href="#chunk-assignment-function">Chunk assignment function</a></h3>
<h4 id="properties"><a class="header" href="#properties">Properties</a></h4>
<p>The function that decides the chunk index for a validator should be parameterized by at least
<code>(validator_index, block_number, core_index)</code>
<p>The function that decides the chunk index for a validator will be parameterized by at least
<code>(validator_index, core_index)</code>
and have the following properties:</p>
<ol>
<li>deterministic</li>
<li>relatively quick to compute and resource-efficient.</li>
<li>when considering the other params besides <code>validator_index</code> as fixed, the function should describe a permutation
of the chunk indices</li>
<li>considering <code>block_number</code> as a fixed argument, the validators that map to the first N/3 chunk indices should
have as little overlap as possible for different paras scheduled on that relay parent.</li>
<li>when considering a fixed <code>core_index</code>, the function should describe a permutation of the chunk indices</li>
<li>the validators that map to the first N/3 chunk indices should have as little overlap as possible for different cores.</li>
</ol>
<p>In other words, we want a uniformly distributed, deterministic mapping from <code>ValidatorIndex</code> to <code>ChunkIndex</code> per block
per core.</p>
<p>In other words, we want a uniformly distributed, deterministic mapping from <code>ValidatorIndex</code> to <code>ChunkIndex</code> per core.</p>
<p>It's desirable to not embed this function in the runtime, for performance and complexity reasons.
However, this means that the function needs to be kept very simple and with minimal or no external dependencies.
Any change to this function could result in parachains being stalled and needs to be coordinated via a runtime upgrade
@@ -329,7 +296,7 @@ or governance call.</p>
core_index: CoreIndex
) -&gt; ChunkIndex {
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let core_start_pos = abs(core_index - block_number) * threshold;
let core_start_pos = core_index * threshold;
(core_start_pos + validator_index) % n_validators
}
@@ -379,20 +346,21 @@ pub struct ChunkResponse {
}
<span class="boring">}</span></code></pre></pre>
<p>An important thing to note is that in version 1, the <code>ValidatorIndex</code> value is always equal to the <code>ChunkIndex</code>.
Until the feature is enabled, this will also be true for version 2. However, after the feature is enabled,
this will generally not be true.</p>
Until the chunk rotation feature is enabled, this will also be true for version 2. However, after the feature is
enabled, this will generally not be true.</p>
<p>The requester will send the request to validator with index <code>V</code>. The responder will map the <code>V</code> validator index to the
<code>C</code> chunk index and respond with the <code>C</code>-th chunk.</p>
<code>C</code> chunk index and respond with the <code>C</code>-th chunk. This mapping can be seamless, by having each validator store their
chunk by <code>ValidatorIndex</code> (just as before).</p>
<p>The protocol implementation MAY check the returned <code>ChunkIndex</code> against the expected mapping to ensure that
it received the right chunk.
In practice, this is desirable during availability-distribution and systematic chunk recovery. However, regular
recovery may not check this index, which is particularly useful when participating in disputes that don't allow
for easy access to the validator-&gt;chunk mapping. See <a href="#appendix-a">Appendix A</a> for more details.</p>
<p>In any case, the requester MUST verify the chunk's proof using the provided index.</p>
<p>During availability-recovery, given that the requester may not know (if the mapping is not available) whether the received chunk corresponds to
the requested validator index, it has to keep track of received chunk indices and ignore duplicates. Such duplicates
should be considered the same as an invalid/garbage response (drop it and move on to the next validator - we can't
punish via reputation changes, because we don't know which validator misbehaved).</p>
<p>During availability-recovery, given that the requester may not know (if the mapping is not available) whether the
received chunk corresponds to the requested validator index, it has to keep track of received chunk indices and ignore
duplicates. Such duplicates should be considered the same as an invalid/garbage response (drop it and move on to the
next validator - we can't punish via reputation changes, because we don't know which validator misbehaved).</p>
<h3 id="upgrade-path"><a class="header" href="#upgrade-path">Upgrade path</a></h3>
<h4 id="step-1-enabling-new-network-protocol"><a class="header" href="#step-1-enabling-new-network-protocol">Step 1: Enabling new network protocol</a></h4>
<p>In the beginning, both <code>/req_chunk/1</code> and <code>/req_chunk/2</code> will be supported, until all validators and
@@ -407,10 +375,10 @@ functionality-wise.
It needs to be explicitly stated that after the governance enactment, validators that run older client versions that
don't support this mapping will not be able to participate in parachain consensus.</p>
<p>Additionally, an error will be logged when starting a validator with an older version, after the feature was enabled.</p>
<p>On the other hand, collators will not be required to upgrade in this step, as regular chunk recovery will work as before,
granted that version 1 of the networking protocol has been removed. Note that collators only perform
availability-recovery in rare, adversarial scenarios, so it is fine to not optimise for this case and let them upgrade
at their own pace.</p>
<p>On the other hand, collators will not be required to upgrade in this step (but are still require to upgrade for step 1),
as regular chunk recovery will work as before, granted that version 1 of the networking protocol has been removed.
Note that collators only perform availability-recovery in rare, adversarial scenarios, so it is fine to not optimise for
this case and let them upgrade at their own pace.</p>
<p>To support enabling this feature via the runtime, we will use the <code>NodeFeatures</code> bitfield of the <code>HostConfiguration</code>
struct (added in <code>https://github.com/paritytech/polkadot-sdk/pull/2177</code>). Adding and enabling a feature
with this scheme does not require a runtime upgrade, but only a referendum that issues a
@@ -422,9 +390,9 @@ validator-&gt;chunk mapping ceases to be a 1:1 mapping and systematic recovery m
very complicated (See <a href="#appendix-a">appendix A</a>). This RFC assumes that availability-recovery processes initiated during
disputes will only use regular recovery, as before. This is acceptable since disputes are rare occurrences in practice
and is something that can be optimised later, if need be. Adding the <code>core_index</code> to the <code>CandidateReceipt</code> would
mitigate this problem and will likely be needed in the future for CoreJam.
<a href="https://forum.polkadot.network/t/pre-rfc-discussion-candidate-receipt-format-v2/3738">Related discussion about <code>CandidateReceipt</code></a></li>
<li>It's a breaking change that requires all validators and collators to upgrade their node version.</li>
mitigate this problem and will likely be needed in the future for CoreJam and/or Elastic scaling.
<a href="https://forum.polkadot.network/t/pre-rfc-discussion-candidate-receipt-format-v2/3738">Related discussion about updating <code>CandidateReceipt</code></a></li>
<li>It's a breaking change that requires all validators and collators to upgrade their node version at least once.</li>
</ul>
<h2 id="testing-security-and-privacy"><a class="header" href="#testing-security-and-privacy">Testing, Security, and Privacy</a></h2>
<p>Extensive testing will be conducted - both automated and manual.
@@ -440,15 +408,13 @@ halved and total POV recovery time decrease by 80% for large POVs. See more
<p>Not applicable.</p>
<h3 id="compatibility"><a class="header" href="#compatibility">Compatibility</a></h3>
<p>This is a breaking change. See <a href="#upgrade-path">upgrade path</a> section above.
All validators need to have upgraded their node versions before the feature will be enabled via a runtime upgrade
All validators and collators need to have upgraded their node versions before the feature will be enabled via a
governance call.</p>
<h2 id="prior-art-and-references"><a class="header" href="#prior-art-and-references">Prior Art and References</a></h2>
<p>See comments on the <a href="https://github.com/paritytech/polkadot-sdk/issues/598">tracking issue</a> and the
<a href="https://github.com/paritytech/polkadot-sdk/pull/1644">in-progress PR</a></p>
<h2 id="unresolved-questions"><a class="header" href="#unresolved-questions">Unresolved Questions</a></h2>
<ul>
<li>Is there a better upgrade path that would preserve backwards compatibility?</li>
</ul>
<p>Not applicable.</p>
<h2 id="future-directions-and-related-material"><a class="header" href="#future-directions-and-related-material">Future Directions and Related Material</a></h2>
<p>This enables future optimisations for the performance of availability recovery, such as retrieving batched systematic
chunks from backers/approval-checkers.</p>
@@ -491,6 +457,8 @@ the reported core_index was indeed the one occupied by the candidate at the resp
<p>Another attempt could be to include in the message the relay block hash where the candidate was included.
This information would be used in order to query the runtime API and retrieve the core index that the candidate was
occupying. However, considering it's part of an unimported fork, the validator cannot call a runtime API on that block.</p>
<p>Adding the <code>core_index</code> to the <code>CandidateReceipt</code> would solve this problem and would enable systematic recovery for all
dispute scenarios.</p>
</main>
+1 -1
View File
File diff suppressed because one or more lines are too long
+1 -1
View File
File diff suppressed because one or more lines are too long