Files
pezkuwi-fellows/text/0027-corejam.md
T
2023-09-26 11:22:20 +01:00

53 KiB

RFC-0027: CoreJam

Start Date 11 September 2023
Description Parallelised, decentralised, permissionless state-machine based on a multistage Collect-Refine-Join-Accumulate model.
Authors Gavin Wood, Robert Habermeier, Bastian Köcher

Summary

This is a proposal to fundamentally alter the workload done on the Polkadot Relay-chain, both in terms of that which is done "on-chain", i.e. by all Relay Chain Validators (Validators) as well as that which is done "in-core", i.e. distributed among subsets of the Validators (Validator Groups). The target is to create a model which closely matches the underlying technical architecture and is both generic and permissionlessly extensible.

In the proposed model, code is stored on-chain with two entry-points. Workloads are collated and processed in-core (and thus parallelized) using one entry-point, whereas the refined outputs of this processing are gathered together and an on-chain state-machine progressed according to the other.

While somewhat reminiscent of the Map-Reduce paradigm, a comprehensive analogy cannot be taken: the in-core processing code does not transform a set of inputs, but is rather used to refine entirely arbitrary input data collected by some third-party. Instead, and in accordance, we term it Collect-Refine-Join-Accumulate.

Motivation

Polkadot was originally designed as a means of validating state transitions of Webassembly-defined state machines known as Parachain Validation Functions. These state machines were envisioned to be long-lived (of the order of years) and transitioning continuously, at the "full capacity" of modern single-threaded hardware held in consensus over the internet, and in isolation to any other such state machines.

Having actually built Polkadot, it became clear that the flexibility of the machinery implementing it allowed for a more diverse set of usage patterns and models. Parathreads, which came to be known as On-Demand Parachains (ODP) is one such model. This was underlined by other proposals to allow for a more decentralised administration of how the underlying Polkadot Core resource is procured, in particular Agile Coretime.

More recently, the idea of having small to medium size programs executing without its own surrounding blockchain using only Relay-chain resources has been discussed in detail primarily around the Coreplay proposal. It therefore seems short-sighted to assume other models could not exist for utilizing the Relay-chain's "Core" resource. Therefore in much the same way that Agile Coretime originally strived to provide the most general model of procuring Relay-chain's Core resource, it seems sensible to strive to find a similarly general model for utilizing this resource, one minimizing the difference between the valuable function of the Validators and the service offered by Polkadot.

Beyond delivering additional value through the increased potential for use-cases that this flexibility allows, our motivation extends to gaining stability: a future-proof platform allowing teams to build on it without fear of high maintenance burden, continuous bitrot or a technological rug-pull at some later date. Secondly, we are motivated by reducing barriers for new teams, allowing the Polkadot platform to harness the power of the crowd which permissionless systems uniquely enable.

Being extensible, the Relay-chain becomes far more open to experimentation within this paradigm than the classical Parachain Proof-of-Validity and Validation Function as is the case at present. Being permissionless opens Polkadot experimentation to individuals and teams beyond those core developers.

Requirements

In order of importance:

  1. The proposal must be compatible, in principle, with the preexisting parachain model.
  2. The proposal must facilitate the implementation of Coreplay.
  3. The proposal must be compatible with Agile Coretime, as detailed in RFC#0001.
  4. Implementation of the proposal should need minimal changes to all production logic.
  5. Utilization of Coretime must be accessible.
  6. Utilization of Coretime must be permissionless.
  7. The nature of Coretime should closely match the nature of resources generated by Polkadot.
  8. Minimal opinionation should be introduced over the format, nature and usage of Coretime.

Stakeholders

  1. Anyone with exposure to the DOT token economy.
  2. Anyone wanting to create decentralised/unstoppable/resilient applications.
  3. Teams already building on Polkadot.

Explanation

CoreJam is a general model for utilization of Polkadot Cores. It is a mechanism by which Work Packages are communicated, authorized, computed and verified, and their results gathered, combined and accumulated into particular parts of the Relay-chain's state.

The idea of Proof-of-Validity and Parachain Validation Function as first-class concepts in the Polkadot protocol is removed. These are now specializations of more general concepts.

We introduce a number of new interrelated concepts: Work Package, Work Class, Work Item, Work Package Output, Work Package Result, Work Package Report (also known as a Candidate) and Work Package Attestation, Work Class Trie.

mod v0 {
    const PROGRESS_WEIGHT_PER_PACKAGE: Weight = MAX_BLOCK_WEIGHT * 3 / 4;
    type WorkClass = u32;
    type WorkPayload = Vec<u8>;
    struct WorkItem {
        class: WorkClass,
        payload: WorkPayload,
    }
    type MaxWorkItemsInPackage = ConstU32<4>;
    type MaxWorkPackagePrerequisites = ConstU32<4>;
    enum Authorization {
        Instantaneous(InstantaneousAuth),
        Bulk(Vec<u8>),
    }
    type HeaderHash = [u8; 32];
    /// Just a Blake2-256 hash of an EncodedWorkPackage.
    type WorkPackageHash = [u8; 32];
    type Prerequisites = BoundedVec<WorkPackageHash, MaxWorkPackagePrerequisites>;
    struct Context {
        header_hash: HeaderHash,
        prerequisites: Prerequisites,
    }
    struct WorkPackage {
        authorization: Authorization,
        context: Context,
        items: BoundedVec<WorkItem, MaxWorkItemsInPackage>,
    }
}
type MaxWorkPackageSize = ConstU32<5 * 1024 * 1024>;
struct EncodedWorkPackage {
    version: u32,
    encoded: BoundedVec<u8, MaxWorkPackageSize>,
}
impl TryFrom<EncodedWorkPackage> for v0::WorkPackage {
    type Error = ();
    fn try_from(e: EncodedWorkPackage) -> Result<Self, ()> {
        match e.version {
            0 => Self::decode(&mut &e.encoded[..]).map_err(|_| ()),
            _ => Err(()),
        }
    }
}

A Work Package is an Authorization together with a series of Work Items and a context , limited in plurality, versioned and with a maximum encoded size.

Work Items are a pair of class and payload, where the class identifies the Class of Work to be done in this item (Work Class).

Though this process happens entirely in consensus, there are two main consensus environments at play, in-core and on-chain. We therefore partition the progress into two pairs of stages: Collect & Refine and Join & Accumulate.

Collect-Refine

The first two stages of the CoreJam process are Collect and Refine. Collect refers to the collection and authorization of Work Packages collections of items together with an authorization to utilize a Polkadot Core. Refine refers to the performance of computation according to the Work Packages in order to yield a Work Result. Finally, each Validator Group member attests to a Work Package yielding a set of Work Results and these attestations form the basis for inclusion on-chain and integration into the Relay-chain's state (in the following stages).

Collection and is_authorized

Collection is the means of a Validator Group member attaining a Work Package which is authorized to be performed on their assigned Core at the current time. Authorization is a prerequisite for a Work Package to be included on-chain. Computation of Work Packages which are not Authorized is not rewarded. Incorrectly attesting that a Work Package is authorized is a disputable offence and can result in substantial punishment.

There are two kinds of Authorization corresponding to the two kinds of Coretime which are sold by Polkadot (see RFC#0001). An Authorization for usage of Instantaneous Coretime consists of a self-contained Signature of an account which own enough Instantaneous Coretime Credit in order to purchase a block of Coretime at the current rate signing a payload of the Work Package hash.

RCVGs run the risk of a credit owner not having the credit at the point of inclusion, in which case the RCVG will not be rewarded. Credit may never be withdrawn, therefore RCVGs can safely accept a block if and only if the Credit account contains a balance of at least the product of the number of Cores assigned to IC, the price per IC core per block and the number of blocks behind the head of the finalized chain which the RCVG currently may be.

An Authorization for usage of Bulk Coretime is more sophisticated. We introduce the concept of an Authorizer procedure, which is a piece of logic stored on-chain to which Bulk Coretime may be assigned. Assigning some Bulk Coretime to an Authorizer implies allowing any Work Package which passes that authorization process to utilize that Bulk Coretime in order to be submitted on-chain. It controls the circumstances under which RCVGs may be rewarded for evaluation and submission of Work Packages (and thus what Work Packages become valid to submit onto Polkadot). Authorization logic is entirely arbitrary and need not be restricted to authorizing a single collator, Work Package builder, parachain or even a single Work Class.

An Authorizer is a parameterized procedure:

type CodeHash = [u8; 32];
type AuthParamSize = ConstU32<1024>;
type AuthParam = BoundedVec<u8, AuthParamSize>;
struct Authorizer {
    code_hash: CodeHash,
    param: AuthParam,
}

The code_hash of the Authorizer is assumed to be the hash of some code accessible in the Relay-chain's Storage pallet. The procedure itself is called the Authorization Procedure (AuthProcedure) and is expressed in this code (which must be capable of in-core VM execution). Its entry-point prototype is:

fn is_authorized(param: &AuthParam, package: &WorkPackage, core_index: CoreIndex) -> bool;

If the is_authorized function overruns the system-wide limit or panicks on some input, it is considered equivalent to returning false. While it is mostly stateless (e.g. isolated from any Relay-chain state) it is provided with a context parameter in order to give information about a recent Relay-chain block. This allows it to be provided with a concise proof over some recent state Relay-chain state.

A single Authorizer value is associated with the index of the Core at a particular Relay-chain block and limits in some way what Work Packages may be legally processed by that Core.

Since encoded Authorizer values may be fairly large (up to 1,038 bytes here), they may not be a drop-in replacement for the ParaId/TaskId used at present in the Agile Coretime interface. Because of this, we provide a lookup mechanism allowing a much shorter AuthId to be used within the Coretime scheduling messaging. Conveniently, this is precisely the same datatype size (32-bit) as a ParaId/TaskId.

There is an Authorizations Pallet which stores the association. Adding a new piece of code is permissionless but requires a deposit commensurate with its size.

type AuthId = u32;
type Authorizers = StorageMap<AuthId, Authorizer>;

An Authorization is simply a blob which helps the Authorizer recognize a properly authorized Work Package. No constraints are placed on Authorizers over how they may interpret this blob. Expected authorization content includes signatures, Merkle-proofs and more exotic succinct zero-knowledge proofs.

(Note: depending on future Relay-chain Coretime scheduling implementation concerns, a window of Relay-chain blocks).

The need of validators to be rewarded for doing work they might reasonably expect to be useful competes with that of the Coretime procurers to be certain to get work done which is useful to them. In Polkadot 1.0, validators only get rewarded for PoVs ("work packages") which do not panic or overrun. This ensures that validators are well-incentivized to ensure that their computation is useful for the assigned parachain. This incentive model works adequately where all PVF code is of high quality and collators are few and static.

However with this proposal (and even the advent of on-demand parachains), validators have little ability to identify a high-quality Work Package builder and the permissionless design means a greater expectation of flawed code executing in-core. Because of this, we make a slightly modified approach: Work Packages must have a valid Authorization, i.e. the Coretime-assigned is_authorized returns true when provided with the Work Package. However, Validators get rewarded for any such authorized Work Package, even one which ultimately panics or overruns on its evaluation.

This ensures that Validators do a strictly limited amount of work before knowing whether they will be rewarded and are able to discontinue and attempt other candidates earlier than would otherwise be the case. There is the possibility of wasting Coretime by processing Work Packages which result in error, but well-written authorization procedures can mitigate this risk by making a prior validation of the Work Items.

Refine

The refine function is implemented as an entry-point inside a code blob which is stored on-chain and whose hash is associated with the Work Class.

type ClassCodeHash = StorageMap<ClassId, CodeHash>;
type WorkOutputLen = ConstU32<1024>;
type WorkOutput = BoundedVec<u8, WorkOutputLen>;
fn refine(
    payload: WorkPayload,
    authorization: Authorization,
    auth_id: Option<AuthId>,
    context: Context,
    package_hash: WorkPackageHash,
) -> WorkOutput;

Both refine and is_authorized are only ever executed in-core. Within this environment, we need to ensure that we can interrupt computation not long after some well-specified limit and deterministically determine when an invocation of the VM exhausts this limit. Since the exact point at which interruption of computation need not be deterministic, it is expected to be executed by a streaming JIT transpiler with a means of approximate and overshooting interruption coupled with deterministic metering.

Several host functions (largely in line with the host functions available to Parachain Validation Function code) are supplied. Two additional ones include:

/// Determine the preimage of `hash` utilizing the Relay-chain Storage pallet.
fn lookup(hash: [u8; 32]) -> Vec<u8>;
/// Determine the state root of the block at given `height`.
fn state_root(height: u32) -> Option<[u8; 32]>;

Other host functions will allow for the possibility of executing a WebAssembly payload (for example, a Parachain Validation Function) or instantiating and entering a subordinate RISCV VM (for example for Actor Progressions).

When applying refine from the client code, we must allow for the possibility that the VM exits unexpectedly or does not end. Validators are always rewarded for computing properly authorized Work Packages, including those which include such broken Work Items. But they must be able to report their broken state into the Relay-chain in order to collect their reward. Thus we define a type WorkResult:

enum WorkError {
    Timeout,
    Panic,
}
struct WorkResult {
    class: WorkClass,
    item_hash: [u8; 32],
    result: Result<WorkOutput, WorkError>,
    weight: Weight,
}
fn apply_refine(item: WorkItem) -> WorkResult;

The amount of weight used in executing the refine function is noted in the WorkResult value, and this is used later in order to help apportion on-chain weight (for the Join-Accumulate process) to the Work Classes whose items appear in the Work Packages.

struct WorkReport {
    /// The hash of the underlying WorkPackage.
    hash: WorkPackageHash,
    /// The context of the underlying WorkPackage.
    context: Context,
    /// The core index of the attesting RCVG.
    core_index: CoreIndex,
    /// The results of the evaluation of the Items in the underlying Work Package.
    results: BoundedVec<WorkResult, MaxWorkItemsInPackage>,
}
struct Attestation {
    report: WorkReport,
    validator: AccountId,
    signature: Signature,
}
/// Since all RCVG members should be attesting to the same few Work Reports, it may
/// make sense to send Attestations without the full underlying WorkReport, but only
/// its hash.
struct BareAttestation {
    report_hash: WorkReportHash,
    validator: AccountId,
    signature: Signature,
}

Each Relay-chain block, every Validator Group representing a Core which is assigned work provides a series of Work Results coherent with an authorized Work Package. Validators are rewarded when they take part in their Group and process such a Work Package. Thus, together with some information concerning their execution context, they sign a Report concerning the work done and the results of it. This is also known as a Candidate. This signed Report is called an Attestation, and is provided to the Relay-chain block author. If no such Attestation is provided (or if the Relay-chain block author refuses to include it), then that Validator Group is not rewarded for that block.

The process continues once the Attestations arrive at the Relay-chain Block Author.

Join-Accumulate

Join-Accumulate is a second major stage of computation and is independent from Collect-Refine. Unlike with the computation in Collect-Refine which happens contemporaneously within one of many isolated cores, the computation of Join-Accumulate is both entirely synchronous with all other computation of its stage and operates within (and has access to) the same shared state-machine.

Being on-chain (rather than in-core as with Collect-Refine), information and computation done in the Join-Accumulate stage is carried out by the block-author and the resultant block evaluated by all validators and full-nodes. Because of this, and unlike in-core computation, it has full access to the Relay-chain's state.

The Join-Accumulate stage may be seen as a synchronized counterpart to the parallelised Collect-Refine stage. It may be used to integrate the work done from the context of an isolated VM into a self-consistent singleton world model. In concrete terms this means ensuring that the independent work components, which cannot have been aware of each other during the Collect-Refine stage, do not conflict in some way. Less dramatically, this stage may be used to enforce ordering or provide a synchronisation point (e.g. for combining entropy in a sharded RNG). Finally, this stage may be a sensible place to manage asynchronous interactions between subcomponents of a Work Class or even different Work Classes and oversee message queue transitions.

Initial Validation

There are a number of initial validation requirements which the RCBA must do in order to ensure no time is wasted on further, possibly costly, computation.

Firstly, any given Work Report must have enough attestation signatures to be considered for inclusion on-chain. Only one Work Report may be considered for inclusion from each RCVG per block.

Secondly, any Work Reports introduced by the RCBA must be Recent, defined as having a context.header_hash which is an ancestor of the RCBA head and whose height is less than RECENT_BLOCKS from the block which the RCBA is now authoring.

const RECENT_BLOCKS: u32 = 16;

Thirdly, the RCBA may not include multiple Work Reports for the same Work Package. Since Work Reports become inherently invalid once they are no longer Recent, then this check may be simplified to ensuring that there are no Work Reports of the same Work Package within any Recent blocks.

Fourthly, the RCBA may not include Work Reports whose prerequisites are not themselves included in Recent blocks.

In order to ensure all of the above tests are honoured by the RCBA, a block which contains Work Reports which fail any of these tests shall panic on import. The Relay-chain's on-chain logic will thus include these checks in order to ensure that they are honoured by the RCBA. We therefore introduce the Recent Inclusions storage item, which retaining all Work Package hashes which were included in the Recent blocks:

const MAX_CORES: u32 = 512;
/// Must be ordered.
type InclusionSet = BoundedVec<WorkPackageHash, ConstU32<MAX_CORES>>;
type RecentInclusions = StorageValue<BoundedVec<InclusionSet, ConstU32<RECENT_BLOCKS>>>

The RCBA must keep an up to date set of which Work Packages have already been included in order to avoid accidentally attempting to introduce a duplicate Work Package or one whose prerequisites have not been fulfilled. Since the currently authored block is considered Recent, Work Reports introduced earlier in the same block do satisfy prerequisites of Work Packages introduced later.

While it will generally be the case that RCVGs know precisely which Work Reports will have been introduced at the point that their Attestation arrives with the RCBA by keeping the head of the Relay-chain in sync, it will not always be possible. Therefore, RCVGs will never be punished for providing an Attestation which fails any of these tests; the Attestation will simply be kept until either:

  1. it stops being Recent;
  2. it becomes included on-chain; or
  3. some other Attestation of the same Work Package becomes included on-chain.

Metering

Join-Accumulate is, as the name suggests, comprised of two subordinate stages. Both stages involve executing code inside a VM on-chain. Thus code must be executed in a metered format, meaning it must be able to be executed in a sandboxed and deterministic fashion but also with a means of providing an upper limit on the amount of weight it may consume and a guarantee that this limit will never be breached.

Practically speaking, we may allow a similar VM execution metering system similar to that for the refine execution, whereby we do not require a strictly deterministic means of interrupting, but do require deterministic metering and only approximate interruption. This would mean that full-nodes and Relay-chain validators could be made to execute some additional margin worth of computation without payment, though any attack could easily be mitigated by attaching a fixed cost (either economically or in weight terms) to an VM invocation.

Each Work Class defines some requirements it has regarding the provision of on-chain weight. Since all on-chain weight requirements must be respected of all processed Work Packages, it is important that each Work Report does not imply using more weight than its fair portion of the total available, and in doing so provides enough weight to its constituent items to meet their requirements.

struct WorkItemWeightRequirements {
    prune: Weight,
    accumulate: Weight,
}
type WeightRequirements = StorageMap<WorkClass, WorkItemWeightRequirements>;

Each Work Class has two weight requirements associated with it corresponding to the two pieces of permissionless on-chain Work Class logic and represent the amount of weight allotted for each Work Item of this class included in a Work Package assigned to a Core.

The total amount of weight utilizable by each Work Package (weight_per_package) is specified as:

weight_per_package := relay_block_weight * safety_margin / max_cores

safety_margin ensures that other Relay-chain system processes can happen and important transactions can be processed and is likely to be around 75%.

A Work Report is only valid if all weight liabilities of all included Work Items fit within this limit:

let total_weight_requirement = work_statement
    .items
    .map(|item| weight_requirements[item.class])
    .sum(|requirements| requirements.prune + requirements.accumulate)
total_weight_requirement <= weight_per_package

Because of this, Work Report builders must be aware of any upcoming alterations to max_cores and build Statements which are in accordance with it not at present but also in the near future when it may have changed.

Join and prune

For consideration: Place a hard limit on total weight able to be used by prune in any Work Package since it is normally computed twice and an attacker can force it to be computed a third time.

The main difference between code called in the Join stage and that in the Accumulate stage is that in the former code is required to exit successfully, within the weight limit and may not mutate any state.

The Join stage involves the Relay-chain Block Author gathering together all Work Packages backed by a Validator Group in a manner similar to the current system of PoV candidates. The Work Results are grouped according to their Work Class, and the untrusted Work Class function prune is called once for each such group. This returns a Vec<usize> of invalid indices. Using this result, invalid Work Packages may be pruned and the resultant set retried with confidence that the result will be an empty Vec.

fn prune(outputs: Vec<WorkOutput>) -> Vec<usize>;

The call to the prune function is allocated weight equal to the length of outputs multiplied by the prune field of the Work Class's weight requirements.

The prune function is used by the Relay-chain Block Author in order to ensure that all Work Packages which make it through the Join stage are non-conflicting and valid for the present Relay-chain state. All Work Packages are rechecked using the same procedure on-chain and the block is considered malformed (i.e. it panics) if the result is not an empty Vec.

The prune function has immutable access to the Work Class's child trie state, as well as regular read-only storage access to the Relay-chain's wider state.

fn get_work_storage(key: &[u8]) -> Result<Vec<u8>>;
fn get_work_storage_len(key: &[u8]);

The amount of weight prune is allowed to use is a fixed multiple of the number of

Accumulate

The second stage is that of Accumulate. The function signature to the accumulate entry-point in the Work Class's code blob is:

fn accumulate(results: Vec<(Authorization, Vec<(ItemHash, WorkResult)>)>);
type ItemHash = [u8; 32];

The logic in accumulate may need to know how the various Work Items arrived into a processed Work Package. Since a Work Package could have multiple Work Items of the same Work Class, it makes sense to have a separate inner Vec for Work Items sharing the Authorization (by virtue of being in the same Work Package).

As stated, there is an amount of weight which it is allowed to use before being forcibly terminated and any non-committed state changes lost. The lowest amount of weight provided to accumulate is defined as the number of WorkResult values passed in results to accumulate multiplied by the accumulate field of the Work Class's weight requirements.

However, the actual amount of weight may be substantially more. Each Work Package is allotted a specific amount of weight for all on-chain activity (weight_per_package above) and has a weight liability defined by the weight requirements of all Work Items it contains (total_weight_requirement above). Any weight remaining after the liability (i.e. weight_per_package - total_weight_requirement) may be apportioned to the Work Classes of Items within the Report on a pro-rata basis according to the amount of weight they utilized during refine. Any weight unutilized by classes within one package may be carried over to the next package and utilized there.

Work Items are identified by their hash (ItemHash). We provide both the authorization of the package and the item identifers and their results in order to allow the refine logic to take appropriate action in the case that an invalid Work Item was issued.

(Note for later: We may wish to provide a more light-client friendly Work Item identifier than a simple hash; perhaps a Merkle root of equal-size segments.)

fn get_work_storage(key: &[u8]) -> Result<Vec<u8>>;
fn get_work_storage_len(key: &[u8]);
fn checkpoint() -> Weight;
fn weight_remaining() -> Weight;
fn set_work_storage(key: &[u8], value: &[u8]) -> Result<(), ()>;
fn remove_work_storage(key: &[u8]);

Read-access to the entire Relay-chain state is allowed. No direct write access may be provided since refine is untrusted code. set_storage may fail if an insufficient deposit is held under the Work Class's account.

Full access to a child trie specific to the Work Class is provided through the work_storage host functions. Since refine is permissionless and untrusted code, we must ensure that its child trie does not grow to degrade the Relay-chain's overall performance or place untenable requirements on the storage of full-nodes. To this goal, we require an account sovereign to the Work Class to be holding an amount of funds proportional to the overall storage footprint of its Child Trie. set_work_storage may return an error should the balance requirement not be met.

Host functions are provided allowing any state changes to be committed at fail-safe checkpoints to provide resilience in case of weight overrun (or even buggy code which panics). The amount of weight remaining may also be queried without setting a checkpoint. Weight is expressed in a regular fashion for a solo-chain (i.e. one-dimensional).

Other host functions, including some to access Relay-chain hosted services such as the Balances and Storage Pallet may also be provided commensurate with this executing on-chain.

(Note for discussion: Should we be considering light-client proof size at all here?)

We can already imagine three kinds of Work Class: Parachain Validation (as per Polkadot 1.0), Actor Progression (as per Coreplay in a yet-to-be-proposed RFC) and Simple Ordering (placements of elements into a namespaced Merkle trie). Given how abstract the model is, one might reasonably expect many more.

Relay-chain Storage Pallet

There is a general need to be able to reference large, immutable and long-term data payloads both on-chain and in-core. This is both the case for fixed-function logic such as fetching the VM code for refine and accumulate as well as from within Work Packages themselves.

Owing to the potential for forks and disputes to happen beyond the scope of initial validation, there are certain quite subtle requirements over what data held on-chain may be utilized in-core. Because of this, it makes sense to have a general solution which is known to be safe to use in all circumstances. We call this solution the Storage Pallet.

The Storage Pallet provides a simple API, accessible to untrusted code through host functions & extrinsics and to trusted Relay-chain code via a trait interface.

trait Storage {
    /// Immutable function to attempt to determine the preimage for the given `hash`.
    fn lookup(hash: &[u8; 32]) -> Option<Vec<u8>>;

    /// Allow a particular preimage to be `provide`d.
    /// Once provided, this will be available through `lookup` until
    /// `unrequest` is called.
    fn request(hash: &[u8; 32], len: usize) -> bool;
    /// Remove request that some data be made available. If the data was never
    /// available or the data will remain available due to another request,
    /// then `false` is returned and `expunge` may be called immediately.
    /// Otherwise, `true` is returned and `expunge` may be called in
    /// 24 hours.
    fn unrequest(hash: &[u8; 32]) -> bool;

    // Functions used by implementations of untrusted functions; such as
    // extrinsics or host functions.

    /// Place a deposit in order to allow a particular preimage to be `provide`d.
    /// Once provided, this will be available through `lookup` until
    /// `unrequest_untrusted` is called.
    fn request_untrusted(depositor: &AccountId, hash: &[u8; 32], len: usize);
    /// Remove request that some data be made available. If the data was never
    /// available or the data will remain available due to another request,
    /// then `false` is returned and `expunge_untrusted` may be called immediately.
    /// Otherwise, `true` is returned and `expunge_untrusted` may be called in
    /// 24 hours.
    fn unrequest_untrusted(depositor: &AccountId, hash: &[u8; 32]) -> bool;

    // Permissionless items utilizable directly by an extrinsic or task.

    /// Provide the preimage of some requested hash. Returns `Some` if its hash
    /// was requested; `None` otherwise.
    ///
    /// Usually utilized by an extrinsic and is free if `Some` is returned.
    fn provide(preimage: &[u8]) -> Option<[u8; 32]>;
    /// Potentially remove the preimage of `hash` from the chain when it was
    /// unrequested using `unrequest`. `Ok` is returned iff the operation is
    /// valid.
    ///
    /// Usually utilized by a task and is free if it returns `Ok`.
    fn expunge(hash: &[u8; 32]) -> Result<(), ()>;
    /// Return the deposit associated with the removal of the request by
    /// `depositor` using `unrequest_untrusted`. Potentially
    /// remove the preimage of `hash` from the chain also.  `Ok` is returned
    /// iff the operation is valid.
    ///
    /// Usually utilized by a task and is free if it returns `Ok`.
    fn expunge_untrusted(depositor: &AccountId, hash: &[u8; 32]) -> Result<(), ()>;

    /// Equivalent to `request` followed immediately by `provide`.
    fn store(data: &[u8]) -> [u8; 32];
}

Internally, data is stored with a reference count so that two separate usages of store need not be concerned about the other.

Every piece of data stored for an untrusted caller requires a sizeable deposit. When used by untrusted code via a host function, the depositor would be set to an account controlled by the executing code (e.g. the Work Class's sovereign account).

Removing data happens in a two-phase procedure; first the data is unrequested, signalling that calling lookup on its hash may no longer work (it may still work if there are other requests active). 24 hours following this, the data is expunged with a second call which, actually removes the data from the chain assuming no other requests for it are active.

Only once expunge is called successfuly is the deposit returned. If the data was never provided, or is additional requests are still active, then expunge may be called immediately after a successful unrequest.

Notes on Agile Coretime

Crucially, a Task is no longer a first-class concept. Thus the Agile Coretime model, which in large part allows Coretime to be assigned to a Task Identifier from the Coretime chain, would need to be modified to avoid a hard dependency on this.

In this proposal, we replace the concept of a Task with a more general ticketing system; Coretime is assigned to an Authorizer instead, a parameterized function. This would allow a succinct Authorization (i.e. a small blob of data) to be included in the Work Package which, when fed into the relevant Authorizer function could verify that some Work Package is indeed allowed to utilize that Core at (roughly) that time. A simple proof system would be a regular PKI signature. More complex proof systems could include more exotic cryptography (e.g. multisignatures or zk-SNARKs).

In this model, we would expect any authorized Work Packages which panic or overrun to result in a punishment to the specific author by the logic of the Work Class.

Notes for migrating from a Parachain-centric model

All Parachain-specific data held on the Relay-chain including the means of tracking the Head Data and Code would be held in the Parachains Work Class (Child) Trie. The Work Package would be essentially equivalent to the current PoV blob, though prefixed by the Work Class. refine would prove the validity of the parachain transition described in the PoV which is the Work Package. The Parachains Work Output would provide the basis for the input of what is currently termed the Paras Inherent. accumulate would identify and resolve any colliding transitions and manage message queue heads, much the same as the current hard-coded logic of the Relay-chain.

We should consider utilizing the Storage Pallet for Parachain Code and store only a hash in the Parachains Work Class Trie.

Notes for implementing the Actor Progression model

Actor code is stored in the Storage Pallet. Actor-specific data including code hash, VM memory hash and sequence number is stored in the Actor Work Class Trie under that Actor's identifier. The Work Package would include pre-transition VM memories of actors to be progressed whose hash matches the VM memory hash stored on-chain and any additional data required for execution by the actors (including, perhaps, swappable memory pages). The refine function would initiate the relevant VMs and make entries into those VMs in line with the Work Package's manifest. The Work Output would provide a vector of actor progressions made including their identifer, pre- and post-VM memory hashes and sequence numbers. The accumulate function would identify and resolve any conflicting progressions and update the Actor Work Class Trie with the progressed actors' new states. More detailed information is given in the Coreplay RFC.

Notes on Implementation Order

In order to ease the migration process from the current Polkadot on- and off-chain logic to this proposal, we can envision a partial implementation, or refactoring, which would facilitate the eventual proposal whilst remaining compatible with the pre-existing usage and avoid altering substantial code.

We therefore envision an initial version of this proposal with minimal modifications to current code:

  1. Remain with Webassembly rather than RISC-V, both for Work Class logic and the subordinate environments which can be set up from Work Class logic. The introduction of Work Classes is a permissioned action requiring governance intervention. Work Packages will otherwise execute as per the proposal. Minor changes to the status quo.
  2. Attested Work Packages must finish running in time and not panic. Therefore WorkResult must have an Infallible error type. If an Attestation is posted for a Work Package which panics or times out, then this is a slashable offence. No change to the status quo.
  3. There should be full generalization over Work Package contents, as per the proposal. Introduction of Authorizers, refine, prune and accumulate. Additional code to the status quo.

Later implementation steps would polish (1) to replace with RISC-V (with backwards compatibility) and polish (2) to support posting receipts of timed-out/failed Work Packages on-chain for RISC-V Work Classes.

Performance, Ergonomics and Compatibility

This system must be broadly compatible with our existing Parachain Validation Function/Proof-of-Validity model, however a specific feasibility study into transitioning/migration has yet to be completed.

To aid swift deployment, the Relay-chain may retain its existing parachain-specific logic "hardcoded", and the Coregap logic added separately, with Work Class "zero" being special-cased to mean "the hard-coded Parachain logic".

Testing, Security and Privacy

Standard Polkadot testing and security auditing applies.

The proposal introduces no new privacy concerns.

None at present.

Drawbacks, Alternatives and Unknowns

Important considerations include:

  1. In the case of composite Work Packages, allowing synchronous (and therefore causal) interactions between the Work Items. If this were to be the case, then some sort of synchronisation sentinel would be needed to ensure that should one subpackage result without the expected effects on its Work Class State (by virtue of the accumulate outcome for that subpackage), that the accumulate of any causally entangled subpackages takes appropriate account for this (i.e. by dropping it and not effecting any changes from it).

  2. Work Items may need some degree of coordination to be useful by the accumulate function of their Work Class. To a large extent this is outside of the scope of this proposal's computation model by design. Through the authorization framework we assert that it is the concern of the Work Class and not of the Relay-chain validators themselves. However we must ensure that certain requirements of the parachains use-case are practically fulfillable in some way. Within the legacy parachain model, PoVs:

    1. shouldn't be replayable;
    2. shouldn't require unbounded buffering in accumulate if things are submitted out-of-order;
    3. should be possible to evaluate for ordering by validators making a best-effort.

Prior Art and References

None.

Chat

for this we need a pallet on the RC to allow arbitrary data to be stored for a deposit, with a safeguard that it would remain in RC state for at least 24 hours (in case of dispute); and a host function to allow the PoV to reference it.
this issue is that for fast-changing data, you'd need to store all of the different images for 24 hours each.
this would quickly get prohibitive.
Yeah, but then we can not give these tasks that much memory. Maybe around 0.5 MiB to 1 MiB (depending on how well they compress) 
yeah
tasks which expect to execute alone could get ~2MB.
Gav
an alternative would be to require the author-network to compute the precise output itself and send it to the storage chain separately.
and if we do this, then up to around 4.5MB.
it's not much by today's desktop standards, but it still beats the shit out of smart contracts.
In reply to 
Gav
Gav
and if we do this, then up to around 4.5MB.
Why can we then double it? What do I miss?
5MB PoV limit.
best case you need to provide the full pre-state (with which to initialize the VM) and a hash of the post-state (to verify computation)
however, it's not clear where you get the next block's pre-state from.
one solution is to provide both pre- and post-state into the PoV and piggy-back on Polkadot's 24-hour availability system
(as long as you build the next block at most 24 hours from the last)
You can get the next state by re executing? 
Or keep it in your local cache or whatever 
but you don't necessarily have other tasks at that time.
or the RC state.
Ahh I see
or, indeed, the pre-state.
PoV disappears after 24 hours.
We can not recalculate all the state 
no
this means an average of 5MB / 2 memory bandwidth per block.
minus a bit for smaller tasks to coordinate with gives 2 - 2.5MB.
But if we put the post state into availability system it should work as well? 
but we assume the PoV accounts for All of the RCV's resources.
if it doesn't then we should just increase the PoV size.
Yeah fine 
i.e. if the RCV can handle 5MB PoV (availability) plus 5MB additional availability, then isn't it just easier to say 10MB PoV?
I think the limit currently is just from the networking side
Aka we could not even really handle these big PoVs 
But with async backing it should now be possible 
maybe i'm wrong here - if the limit of 5MB PoV is less about availability and more about transport from the collator, then sure, we can split it into a PoV limit and an availability limit
and have 5MB PoV and 10MB availability.
but i assumed that the bottleneck was just availability.
definitely an assumption to test
Yeah we should find out. Maybe I probably neglecting the erasure coding 
since there is a difference between data from some other guy (pre-state) and data which you generate yourself (post-state)
assuming i'm right about the bottleneck, then the best we could do without some form of paging (which should be easy enough to implement with jan's help) is having the post-state be computed on the author-network and placed in the storage chain.
jan says that paging is pretty trivial to customise on his RISCV VM.
just a segfault everytime a new page is required and we can either suspend + snapshot; or fetch + assign the page and continue.
Gav
just a segfault everytime a new page is required and we can either suspend + snapshot; or fetch + assign the page and continue.
Yeah, that was also my idea.
But yeah, for the beginning we can probably start with 0.5MiB of memory 
we have 16KB pages currently, and we'll already likely be doing something similar for running multiple stacks
Basti.await
But yeah, for the beginning we can probably start with 0.5MiB of memory 
yup
definitely enough for a PoC.
but we will still need a new PoV format.
i.e. where we can:
a) request arbitrary data from the RCTaskStorage pallet (which guarantees data placed on it cannot be removed for 24 hours after last read).
b) progress a particular task (in whatever order) with weight spec
c) provide data into a particular task (in whatever order) with weight spec
we might also have a more abstract PoV protocol which allows for different classes of task.
and specifies in Wasm or whatever exactly how to interpret a PoV
then we reformulate the Parachain PoV into this "smart contract".
and we would create a new Task PoV format as just another instance within this overarching protocol.
e.g. this protocol could define UMP, DMP, XCMP and the initial version of the Actors-PoV format might reasonably not include this logic.
but an upgrade later could allow actors to use this stuff.
the main difference with just "hardcoding" it into the RC logic directly is that it could be permissionless - other teams could come up with their own PoV formats and protocols to determine validity.
ok so i've thought about it a bit.
i think there's a route forward for what i referenced up there as "generic PoVs".
in fact they cease to be "PoVs" at this point.
they're just "Work Packages" since you're not proving anything, you're just using the RCVs for compute.
we would define the concept of a Work Class.
all WPs must have a WT. the WT defines what it means to interpret/execute the WP.
we'd initially have two WTs: Parachains and Tasks.
WTs would be permissionless and would come with two main blobs; one (which I'll call map) which can still be in Wasm (since it runs off-chain at a level higher than the PVF) and one (called reduce) which must be strictly metered (so RISCV or very slow Wasm).
as the name suggests, the first represents map and the second represents reduce of a map-reduce pattern.
the first takes the WP as an argument and can inspect any data held in the RCTaskStore.
it then either panics (invalid) or returns a fixed-size (or rather known-maximum-length) blob.
all such blobs and panics are batched up into a Vec and fed into a WT-defined reduce function, with its max_weight = to the block_weight multiplied by the ratio of cores used for this WT compared to all cores.
all WTs have their own child trie.
only this reduce function may alter data in that trie.
for the Parachain WT, this child trie would include all the parachain-specific data (code and state); the ParachainWP output type would basically be the per-chain paras-inherent arguments, and so the PWT reduce function would basically be the paras-inherent logic.
for the Actors WT, this child trie would include Actor specific stuff like codehash (the actual code would be stored in the RCTaskStore) and RISCVVM memory hash, as well as sequence number.
The Actor output would include enough information on which actor-combinations got (maybe) progressed to allow the proper progressions to be recorded in the Actor Child Trie by the reduce function. essentially just the logic i already define in the RFC.
So the Actor map function would interpret the WorkPackage as a manifest and fetch all actor code, initialise each Actor's VM and start them with the entry points according to the manifest.
with this model, anyone could add their own Work Classs.
So if RobK/Arkadiy/Dave can implement the spec, we can focus on writing up the Actor-based map and reduce function. They need not be concerned with Actors at all.
sound sensible?
Good question :P 
So the map would have access to fetch the code from the relay chain?
And the map would be trusted code?
That is the same for every WT? 
neither map nor reduce are trusted
Gav
neither map nor reduce are trusted
Ahh yeah, you said it
map would only be able to read the part of RC state which is guaranteed to be available for 24 hours; this basically just means the RCTaskStorage.
But the code executed in reduce is defined by the output of map?
yes
Gav
map would only be able to read the part of RC state which is guaranteed to be available for 24 hours; this basically just means the RCTaskStorage.
The code to execute will be referenced by the code hash and this code hash needs to be "somewhere". Currently we store in the relay chain state
yeah, the code would need to be in the "RCTaskStore" (should be the RCWorkStore, i guess)
there is the question about the WorkType Child Trie
This is the child trie you mentioned? 
(The RCTaskStore) 
RCWorkStore is the on-chain paid data repository which guarantees data remains available for 24 hours after removal
there is also the WorkTypeChildTrie.
i guess this would need to guarantee some sort of dispute-availability also.
Gav
there is also the WorkTypeChildTrie.
This is stored in the RC state?
yes
it's a child trie.
yeah, just wanted to be sure :P 
Gav
i guess this would need to guarantee some sort of dispute-availability also.
Yeah for sure. If we need the data as input of the validation
we might need to limit this.
yeah, i think we don't need any of this data.
And map to reduce would be something like Vec<Work { code_hash, function, params, max_weight: Option<Weight> }>?
map output would basically be Vec<ProvisionalProgression>
where struct ProvisionalProgression { actor: ActorId, code: [u8;32], prestate: [u8;32], poststate: [u8;32] }
then reduce would take a Vec<Vec<ProvisionalProgression>>
reduce would check there are no collisions (same actor progressing from same poststate), that the code executed is the expected code hash of the actor and that the prestate fits into a coherent progression of the actor (which might not be the old poststate since the actor could be doing multiple progressions in a single RC block).
So you want to execute the actor code from within reduce?
no
actor code gets executed off-chain in map.
output of map informs reduce what it has done.
it's up to reduce to figure out how to combine all of this into a coherent RC state update.
But when is reduce executed? As part of the RC block execution/building?
yeah
reduce is on-chain,
it accepts a Vec<WorkTypeOutput>
WorkTypeOutput is an output specific to WorkType, i.e. WorkTypeOutput in fn map(WorkPackage) -> Result<WorkTypeOutput, WorkPackageError>
Okay, I see where you want to go. 
Gav
then reduce would take a Vec<Vec<ProvisionalProgression>>
Why this, because there can be multiple WorkPackages being progressed in one RC?
yes, one for every core.
those ProvisionalProgressions appearing together in an inner Vec have been co-scheduled.
it might possibly make a difference inside of reduce to know what progressions have happened together on the same core.
Okay. So reduce is getting all the WorkOutputs of all cores for one WT?
precisely.
one WT would be the whole of parachains, UMP, DMP, XCMP.
Yeah
another WT would be the whole actor environment.
Actors and parachains will never be able to talk to each other? 
yeah, we can imagine actor-v2 WT which includes the possibility of XCMP/DMP/UMP.
but i would get a basic version of actors done first.
Basti.await
Actors and parachains will never be able to talk to each other? 
I more meant "co-scheduled"
just to show it's possible.
In reply to 
Basti.await
Basti.await
I more meant "co-scheduled"
perhaps for actors-v3 which would allow WPs to include both parachain and actor progressions
but we should be able to get most of the benefits leaving it as XCMP
if actors work as well as we expect, then the need for chains will dramatically reduce
In reply to 
Basti.await
Gav
perhaps for actors-v3 which would allow WPs to include both parachain and actor progressions
Okay, my only comment on this would be that we then need to ensure that parachains and actors are not scheduled in different groups of "WTs"
But maybe some simple "was updated in RC block X" should be enough 
But yeah, what you propose sounds reasonable 
In reply to 
Gav
Basti.await
Okay, my only comment on this would be that we then need to ensure that parachains and actors are not scheduled in different groups of "WTs"
yeah, i think we would need to provide some sort of hard-coded-function on the RC to migrate between WTs in this case.
parachains couldn't be part of two WTs at once
Okay 
but again, i don't see too much benefit in synchronous composability between actors and chains
Yeah
Just wanted to ask
and it complicates things massively
And I think chains are an old concept anyway with coreplay :P 
quite.
the level of experimentation this would provide is pretty immense
Yes 
This would also quite help for stuff like "elastic scaling" etc. 
basically, just turn polkadot into a global, secure map-reduce computer. 
We could just experiment 
As a new WT
yup