Given that the block tree database (chain.dat) and the active chain
database (coins.dat) are entirely separate now, it becomes legal to
swap one with another instance without affecting the other.
This commit introduces a check in the startup code that detects the
presence of a better chain in chain.dat that has not been activated
yet, and does so efficiently (in batch, while reusing the blk???.dat
files).
To prevent excessive copying of CCoins in and out of the CCoinsView
implementations, introduce a GetCoins() function in CCoinsViewCache
with returns a direct reference. The block validation and connection
logic is updated to require caching CCoinsViews, and exploits the
GetCoins() function heavily.
Use CBlock's vMerkleTree to cache transaction hashes, and pass them
along as argument in more function calls. During initial block download,
this results in every transaction's hash to be only computed once.
During the initial block download (or -loadblock), delay connection
of new blocks a bit, and perform them in a single action. This reduces
the load on the database engine, as subsequent blocks often update an
earlier block's transaction already.
This switches bitcoin's transaction/block verification logic to use a
"coin database", which contains all unredeemed transaction output scripts,
amounts and heights.
The name ultraprune comes from the fact that instead of a full transaction
index, we only (need to) keep an index with unspent outputs. For now, the
blocks themselves are kept as usual, although they are only necessary for
serving, rescanning and reorganizing.
The basic datastructures are CCoins (representing the coins of a single
transaction), and CCoinsView (representing a state of the coins database).
There are several implementations for CCoinsView. A dummy, one backed by
the coins database (coins.dat), one backed by the memory pool, and one
that adds a cache on top of it. FetchInputs, ConnectInputs, ConnectBlock,
DisconnectBlock, ... now operate on a generic CCoinsView.
The block switching logic now builds a single cached CCoinsView with
changes to be committed to the database before any changes are made.
This means no uncommitted changes are ever read from the database, and
should ease the transition to another database layer which does not
support transactions (but does support atomic writes), like LevelDB.
For the getrawtransaction() RPC call, access to a txid-to-disk index
would be preferable. As this index is not necessary or even useful
for any other part of the implementation, it is not provided. Instead,
getrawtransaction() uses the coin database to find the block height,
and then scans that block to find the requested transaction. This is
slow, but should suffice for debug purposes.
Introduce a AllocateFileRange() function in util, which wipes or
at least allocates a given range of a file. It can be overriden
by more efficient OS-dependent versions if necessary.
Block and undo files are now allocated in chunks of 16 and 1 MiB,
respectively.
Change the block storage layer again, this time with multiple files
per block, but tracked by txindex.dat database entries. The file
format is exactly the same as the earlier blk00001.dat, but with
smaller files (128 MiB for now).
The database entries track how many bytes each block file already
uses, how many blocks are in it, which range of heights is present
and which range of dates.
The CTxUndo class encapsulates data necessary to undo the effects of
a transaction on the txout set, namely the previous outputs consumed
by it (script + amount), and potentially transaction meta-data when
it is spent entirely.
The CCoins class represents a pruned set of transaction outputs from
a given transaction. It only retains information about its height in
the block chain, whether it was a coinbase transaction, and its
unspent outputs (script + amount).
It has a custom serializer that has very low redundancy.
Special serializer/deserializer for amount values. It is optimized for
values which have few non-zero digits in decimal representation. Most
amounts currently in the txout set take only 1 or 2 bytes to
represent.
Special serializers for script which detect common cases and encode
them much more efficiently. 3 special cases are defined:
* Pay to pubkey hash (encoded as 21 bytes)
* Pay to script hash (encoded as 21 bytes)
* Pay to pubkey starting with 0x02, 0x03 or 0x04 (encoded as 33 bytes)
Other scripts up to 121 bytes require 1 byte + script length. Above
that, scripts up to 16505 bytes require 2 bytes + script length.
Variable-length integers: bytes are a MSB base-128 encoding of the number.
The high bit in each byte signifies whether another digit follows. To make
the encoding is one-to-one, one is subtracted from all but the last digit.
Thus, the byte sequence a[] with length len, where all but the last byte
has bit 128 set, encodes the number:
(a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1))
Properties:
* Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes)
* Every integer has exactly one encoding
* Encoding does not depend on size of original integer type
This reverts commit 199d88cf90, reversing
changes made to 65bc1573e7.
License is worse instead of better. Will only accept public domain and
MIT-licensed icons from now on.