consensus/random/randao_analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Randao Analysis using Markov Chains\n",
    "\n",
    "This Sagemath Jupyter notebook is for tha nalaysis of Randao randomness, under the assumption that honest block producers always produce blocks in their slot that always get into the chain and can never be reverted. This assumption is not necessarily reasonable.\n",
    "\n",
    "The randomness is sampled at a particular slot, e.g. the end of an epoch or just before it is used by a smart contract. The adversary cannot predict the randomness at the last slot up to this slot that has an honest block producer, because their contribution is random and unknown. However each adversarial blocm producer between this last honest slot and the sampled slot has a choice of whether to produce a block or not. Thus if the adversary controls m slots in a row up to the sampling slot, then they have 2^m choices for the randonness.\n",
    "\n",
    "The randomness sampled at the end of epoch n is used to determine the block producers in epoch n+2. We can imagine that the adversary wants to maximise the numnber of adversarial slots at the end of epoch n+2 to get control over the randonness in epoch n+4 etc. If they choose to do this, we can construct a Markov chain, where each state is the number of adversarial blocks at the end of this epoch. The next state is the number of adversarial blocks at the end of the epoch after next. (So odd and even numbered epochs are mostly indepedent.)\n",
    "\n",
    "If the adversray controls 1/3 of the validator set and controls m slots at the end of the current epoch then under this attack, the distribution of the number of slots they control at the end of the next epoch is the maximum of 2^m geometric distributions (the kind that start at 0) with parameter 2/3.\n",
    "\n",
    "The stationary distribution of this Markov chain is the distribution of the number of adversarial slots at the end of the peoch under continous attack where the adversary tries to maximise this.\n",
    "\n",
    "We want to consider sampling randomness 4 epochs after some trigger happens. Now an attacker could wasit until the current epoch has many adversarial validators at the end, before causing the trigger to happen. Then 4 epochs, later, the current epoch may still be somewhate biasable, allowing the adversary to have a more than usual chance to get many adversarial blocks before the trigger block.\n",
    "\n",
    "To analyse this, we first need a conservative estimate of how many slots at the end of the current  that adversary can feasibly wait to occur, under the coninuous attack or not. Then we can consider two transitions of the Markov chain from this event as being the distribution of the number of adversarial slots before the randomness is sampled. Now we can compute the expected number of options for the sampled randomness from this distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The cumulative distribution function and probability mass functiom\n",
    "# at m for the maximum of t geometric distributions with parameter p\n",
    "def cdfmaxgeo(p,t,m):\n",
    "    return (1-(1-p)^(m+1))^t\n",
    "def pmfmaxgeo(p,t,m):\n",
    "    return cdfmaxgeo(p,t,m)-cdfmaxgeo(p,t,m-1)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we build the transition matrix.\n",
    "# Each column is the maximum of 2^j geometric distributions with parameter 2/3.\n",
    "# For the last row, corresponding to m=63, \n",
    "# we take probability of being at least 63 so the probabilities add up to 1.\n",
    "def nextstateprob(j,i):\n",
    "    if i ==63:\n",
    "        return 1-cdfmaxgeo(2/3+0.0,2^j,62)\n",
    "    return pmfmaxgeo(2/3+0.0,2^j,i)\n",
    "tm=matrix([[nextstateprob(j,i) for j in range(64)] for i in range(64)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.999999999952366\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(0.476563095838547, 0.290008277069172, 0.139917968194352, 0.0586770548078090, 0.0224471788438172, 0.00810421240432098, 0.00282576354047732, 0.000965620671507453, 0.000326252792938701, 0.000109544499335094, 0.0000366568819313766, 0.0000122441918654699, 4.08585723181877e-6, 1.36273835423470e-6, 4.54384351862505e-7, 1.51485734113910e-7, 5.04995072826173e-8, 1.68339168055658e-8, 5.61143674448838e-9, 1.87050189995125e-9, 6.23504803604958e-10, 2.07835570232040e-10, 6.92786571866131e-11, 2.30930253742276e-11, 7.69755676003532e-12, 2.56585251964193e-12, 8.55525570183441e-13, 2.84933826330784e-13, 9.52193095509939e-14, 3.14984028202337e-14, 1.04994676112577e-14, 3.62050607336612e-15, 1.08615182206194e-15, 3.62050607359248e-16, 3.62050607361851e-16, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Next we use the power method to approximate the stationary distribution stat\n",
    "pm=tm\n",
    "for _ in range(20):\n",
    " pm = pm^2\n",
    "stat=pm*vector({1:1,63:0})\n",
    "# The probabilities should add up to 1. \n",
    "# Too many iterations of the power method will blow up the rounding error, \n",
    "# so it's better to check that it is not too far from 1.\n",
    "print(sum(stat))\n",
    "stat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.666666666666667, 0.222222222222222, 0.0740740740740741, 0.0246913580246914, 0.00823045267489719, 0.00274348422496573, 0.000914494741655170, 0.000304831580551723, 0.000101610526850648, 0.0000338701756168458, 0.0000112900585389486, 3.76335284635321e-6, 1.25445094878440e-6, 4.18150316261467e-7, 1.39383438679808e-7, 4.64611462636100e-8, 1.54870487545367e-8, 5.16234954783812e-9, 1.72078318261271e-9, 5.73594394204235e-10, 1.91198168408846e-10, 6.37326857955145e-11, 2.12442285985048e-11, 7.08144654026910e-12, 2.36044517265555e-12, 7.86815057551848e-13, 2.62345700718924e-13, 8.73745520379998e-14, 2.91988655476416e-14, 9.65894031423886e-15, 3.21964677141295e-15, 1.11022302462516e-15, 3.33066907387547e-16, 1.11022302462516e-16, 1.11022302462516e-16, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000, 0.000000000000000)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# unbiased is the distribution after one transition, which should be just geometric with parameter 2/3\n",
    "unbiased=tm*vector({0:1,63:0})\n",
    "unbiased"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.64611462636100e-8 4.64611462636100e-8 1.51485734113910e-7\n"
     ]
    }
   ],
   "source": [
    "# Just some sanity checking\n",
    "print(pmfmaxgeo(2/3+0.0,1,15),unbiased[15],stat[15])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a single transition from ubiased gives expected tail length  0.500000000000001  and expected options  1.99999968601354\n",
      "the steady state distribution has expected tail length  0.904072888428179  and expected options  3.26106196081950\n"
     ]
    }
   ],
   "source": [
    "# Next we look at the expected number of options the adversary has under these distributions\n",
    "def expectedoptions(dist):\n",
    "    return sum([2^i*x for (i,x) in zip(range(64),dist)])\n",
    "def expectedtail(dist):\n",
    "    return sum([i*x for (i,x) in zip(range(64),dist)])\n",
    "print(\"a single transition from ubiased gives expected tail length \",expectedtail(unbiased),\" and expected options \",expectedoptions(unbiased))\n",
    "print(\"the steady state distribution has expected tail length \", expectedtail(stat), \" and expected options \",expectedoptions(stat))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.21765601217656e-8"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The number of epochs in a century\n",
    "epochsinacentury=(5*60*24*365*1000/32)\n",
    "1/epochsinacentury+0.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tail length 15 occurs every 8.03809031457012 years in expectation under the stationary distribution\n",
      "tail length 15 occurs every 26.2080493078638 years in expectation under the single transition from unbiased distribution\n",
      "tail length 16 occurs every 24.1122354988936 years in expectation under the stationary distribution\n",
      "tail length 16 occurs every 78.6241479235913 years in expectation under the single transition from unbiased distribution\n",
      "tail length 17 occurs every 72.3334935202939 years in expectation under the stationary distribution\n",
      "tail length 17 occurs every 235.872445461677 years in expectation under the single transition from unbiased distribution\n"
     ]
    }
   ],
   "source": [
    "#So what tail length can freasibly occur if the adversary waits long enough, \n",
    "# both when they trying to macimise the tail length, which gives the distribution stat\n",
    "# and when they just wait (i.e. using unbiased)\n",
    "print(\"tail length 15 occurs every\",100/(stat[15]*epochsinacentury),\"years in expectation under the stationary distribution\")\n",
    "print(\"tail length 15 occurs every\",100/(unbiased[15]*epochsinacentury),\"years in expectation under the single transition from unbiased distribution\")\n",
    "print(\"tail length 16 occurs every\",100/(stat[16]*epochsinacentury),\"years in expectation under the stationary distribution\")\n",
    "print(\"tail length 16 occurs every\",100/(unbiased[16]*epochsinacentury),\"years in expectation under the single transition from unbiased distribution\")\n",
    "print(\"tail length 17 occurs every\",100/(stat[17]*epochsinacentury),\"years in expectation under the stationary distribution\")\n",
    "print(\"tail length 17 occurs every\",100/(unbiased[17]*epochsinacentury),\"years in expectation under the single transition from unbiased distribution\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "172.837461872421"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# If we do two transitions from 16\n",
    "# corresponding to taking a sample 4 epochs after epoch that the adversary timed an atteck for\n",
    "expectedoptions(tm*tm*vector({16:1,63:0}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.41125360736917"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# and how long is the expected tail then?\n",
    "expectedtail(tm*tm*vector({16:1,63:0}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1901.01954391692"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# What about if we ait 2 epochs instead of 4?\n",
    "expectedoptions(tm*vector({16:1,63:0}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "36.3762992608834"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Or 6 epochs?\n",
    "expectedoptions(tm*tm*tm*vector({16:1,63:0}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What if randomness were sampled at a block number not a slot number?\n",
    "\n",
    "As I know how to query the block number and not the slot number from the EVM, this makes smart contracts getting randomness from RanDAO even trickier. So one might have say entrants to some lottery or the inout to some interactive proof that happens at block n and want to sample RanDAO using the prevRandDAO precompile at block n+128. The problem is that thia need not occur 128 slots later. In slots where the attacker controls the block producer, even when they cannot influence the randomness directky because of a later honest slot, they can still skip producing a block and push the sampling point to a later slot. In this way they can hope to sample in the middle of a strng of adversarial slots.\n",
    "\n",
    "However, there is not much point them doing this until 2 epochs before, because then they don't know which slot to aim for until they know the block producers. Just how many choices of slots do they have, starting at 64 blocks before? If all block producers produce blocks, then the can be a minium of 64 slots. The maximum number of slots before 64 blocks is 64 plus the number of adversarial slots before there are 64 honest slots. If the slots assignments were random, and of course they can be biased as above, this is a sample from a negative binomial distribution with parameters r=64. p=2/3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def negbintail(r,p,t):\n",
    "    tail=0.0\n",
    "    for k in range(ceil(t),2*ceil(t)+r):\n",
    "        tail += binomial(k+r-1,k)*p^r*(1-p)^k\n",
    "    return tail"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7774.15709413154"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The expectation of this is 33.5, but it takes almost twice that for the tail probability to become small.\n",
    "1/negbintail(67,2/3,64)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "SageMath 10.1",
   "language": "sage",
   "name": "sagemath"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}