Fat blockchain

December 30, 2011
By Amir Taaki (genjix)

Bitcoin’s story is the blockchain. A story of all the money flows since the system’s inception. Every spend is stored on every node’s hard-drive in this database as a chain of blocks.

Each person running bitcoin, has a copy of this global database / ledger. It is what enables your bitcoin software to verify that every spend is accounted for, and that no rules are being broken. You consent to the mathematical rules of the bitcoin system when you join, and likewise you enforce compliance from others. For this the software needs the books for doing the accounting.

This database is append only. The database can only be extended- never changed. When a new piece is created, it is joined to the end. Then subsequent new additions will build on the end of this piece. Each piece is colloquially termed a block. This chain of joined blocks is termed the blockchain.


A 4 block long blockchain

Each block contains a bunch of new transactions. If the first block contains a transaction saying “A sent 10 BTC to B” and the third block contains a transaction saying “B sent 2 BTC to C”, we know that B has (10 – 2) = 8 BTC and C has 2 BTC. We know this from examining the blockchain.

Blocks birthed from math

Creating new blocks is not trivial. A block is born from the solution to a mathematical riddle. This riddle is chosen collaboratively by the network according to a mathematical algorithm. When bitcoin nodes solve this riddle, they mint a new block, appending it to the end of the blockchain.

Nodes solving this riddle are called miners or generators, although the terms are misnomers; mining conjures visions of money being invented from nowhere; generators does not convey anything. A better term is validators. Each newly created block contains the new set of transactions that are accepted by the network. The creators of this block- the validators, gather up these transactions and validate they are correct in anticipation of the next block they are about to create.

When you download the blockchain, you must validate correctness of the blocks and the transactions within them:

  • Was this transaction already spent?
  • Does this spend have enough of a balance to cover it?
  • Are there any anomalies?

Actually blockchain is another misnomer! Blocktree would be more accurate as validators are able to build off earlier blocks (if they wish). However the chain which was hardest to compute (nearly always the longest chain) is the currently accepted history of events in the network.

In practice, the blockchain rarely branches and only ever 1 block deep. Although the longest branch was 6 blocks deep! This is why we casually call it a blockchain- it is a long chain with short stubby branches.

Computing a block involves solving the mathematical riddle. Solving this riddle means your computer is processing numbers. Processing numbers means you are using electricity. Electricity costs money.

As a reward for this valuable service of validating transactions, these validators get a subsidy of 50 BTC. This type of transaction is called a coinbase transaction.

Theory says that if:

50 * BTC exchange rate – cost of electricity > 0

Then you will make a profit.

There are other externalities like initial hardware costs and your time but we will ignore them here.

As more people try to join the network and solve these riddles and generate new blocks, the rate of new blocks appended to the blockchain will increase. The network reconvenes and decides to raise the difficulty for creating new blocks. By raising the difficulty, a validator has to expend more electricity to create a new block which lowers their profit. If blocks are being computed too slowly, then the network will lower the difficulty. Difficulty changes are done every 2016 blocks trying to aim for an ideal of a new block ever 10 minutes. 2016 * 10 minutes is around once every 2 weeks.

Every 2 weeks, the network reconvenes to decide on the new difficulty adjustment. Difficulty is adjusted to either slow or hasten generation of new blocks towards a goal of 10 minutes.

Rewriting history

Making a new block is not an easy task. That is why it is impossible to rewrite the blockchain. If I wish to reverse block 10 in a blockchain 20 blocks long, not only do I have to reverse block 10, but I must reverse block 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 expending all the power and money that went into creating those blocks. As it stands now, the total amount of computation in the bitcoin blockchain is immense and unlikely to ever be reversed.

Occasionally though, 2 validators will create a new block at the same time. We say they are in a race. It is now a gamble depending on which block the next block is built off. Which block is built off is down to mostly luck and externalities (like whether the right people discover the block at the right time).

The next block is critical. Mathematics shows us that the probability of a block being able to be reversed drops exponentially with each new block added to the end. If appending a new block decreases the chance of reversal to 50% then adding 2 blocks will decrease the chance to 50%*50% = 25%, 3 blocks by 50%*50%*50% = 12% and it very quickly tends towards 0 but never quite reaching it.

With the odds against him, if he doesn’t make a lucky lunge forward early on, his chances become vanishingly small as he falls further behind.
~Satoshi

For more information see Satoshi’s paper where he models new blocks using the gamblers risk of ruin with a Poisson distribution.

Linear vs polynomial

Blocks take time to be checked for correctness when we download then. That is why the blockchain is slow. Not because downloading blocks takes time. Downloading the entire blockchain can be done in a matter of minutes on a good internet connection. It is the checking that takes time. Checking to ensure we have not been cheated.

Transactions in new blocks are checked to ensure they connect up with an old transaction. This connecting of transactions is the biggest slowdown of all (the ConnectInputs(...) function).

When bitcoin tries to connect a transaction up with its predecessor, it must search for it in the blockchain. As the blockchain increases from 100 blocks, to 1000 blocks, so too does the number of potential blocks increase from 100 to 1000 that the predecessor transaction may exist in.

This is why the blockchain is slow. Searching the entire blockchain for a transaction only becomes worse as the blockchain grows. Luckily we have rising hardware specifications and programmer ingenuity to overcome this in the future.

With our simple model, we could say that for every new block added to the blockchain, the time to check that block increases linearly. As the time for checking one block increases linearly, the time for checking the entire blockchain increases polynomially.

9 Responses to Fat blockchain

  1. Hawkix on December 30, 2011 at 6:07 pm

    You are wrong about exponential time needed to check the entire blockchain. Actually, it is “just” quadratic in the number of blocks.

    • Amir Taaki (genjix) on December 30, 2011 at 7:05 pm

      You’re right that it isn’t exponential, but is polynomial growth for checking. I’ll correct that. Thanks.

      • Stefan Thomas (justmoon) on December 31, 2011 at 9:02 am

        Actually, I believe it’s linear growth, not polynomial – assuming the average block size stays the same. To check a transaction you have to do one or two hash table lookups (fetch the prior output, check if it’s spent) and hash table lookups are constant time O(1). Therefore verifying an input should be constant time as well and verifying a block should depend only on the number of inputs.*

        In practice: Yes, when your hash table no longer fits in memory, lookups will get slower and when it no longer fits on a single machine they will get slower again, but beyond that it’ll be constant. So if you have a 200 MB block and you want to verify it against a 1 petabyte chain that will take the same amount of time as verifying the same block against a 2 petabyte chain. The difference will be that you’ll need twice as many hard drives and servers in the second case, which is why block chain size still matters.

        * One caveat: hashForSignature is not constant time – it gets slower the larger the transaction gets. However we will have *more* transactions as we scale up not *larger* ones, so this doesn’t come into play.

        • Tiago on January 3, 2012 at 1:46 pm

          I don’t know how it’s implemented, but I have the impression that it is the indexing that should be blamed for the delay, not the look ups.

          I have the impression that the client tries to update the index on disk at each new block that comes. Inserting in an index is a costly operation, specially if done on disk.
          Perhaps if the database inserts during blockchain download could be made on large bulks, that would speed up things significantly. Easier said than done, of course.

          Can someone who knows the client better confirm if what I say is true?

          • mkrohn5 on January 6, 2012 at 9:25 pm

            Quote: “I thought people would be excited to hear that I finally got around to really investigating why the initial chain sync-up was so slow (at least for some users) and that I found an easily fixed problem.”

            Sounds interesting.

  2. Mark Oates on January 1, 2012 at 9:44 am

    Amir, you say “Luckily we have rising hardware specifications and programmer ingenuity to overcome this in the future.” but we also have rising bitcoin adoption and a rising number of transactions on top of that. In my view, the blockchain should be more of a concern.

  3. mkrohn5 on January 3, 2012 at 12:29 am

    Good article about the blockchain, thanks!

    * Link to Satoshi’s paper should be “bitcoin.pdf”
    * The subsection “Linear vs exponential” still includes the “exponential” reference

  4. cloudswrest on January 4, 2012 at 4:21 am

    Why not have, after every 10,000 blocks or so, a “totalization” block that sums or totalizes the balance of every bitcoin address that has a non-zero balance (discard addresses that total to zero balance). This block can then be timestamped and hash signed by miners. New regular blocks then continue on after, and are linked to, the totalization block. After some number, perhaps 100 or so, of new blocks the totalization block can be considered validated and all block chain info prior to the totalization block can be discarded. I’ll call it “regenesis”! :) A metaphor to think of is rocket staging on a trip to orbit. The dead weight of older stages are discarded. For block chain length competiton you could accept the chain with the earliest totalization block with the longest chain of following blocks (but not more than 10,000). I’m sure there are issues to work out, but it seems sound.

    • Etienne on January 7, 2012 at 5:19 am

      It does seem sound, and I’m sure many people have thought of this.
      Removing old transactions form the BC makes perfect sense. Besides, they won’t be lost.
      They simply won’t be carried around anymore.

Leave a Reply

Your email address will not be published. Required fields are marked *

*