Many descriptions of blockchain technology relate it to well-known data storage techniques. One of the more popular descriptions states that a blockchain is essentially a distributed ledger of transactions. This description is somewhat true but overly simplified. A blockchain does store transactions like a ledger and is distributed, but it contains far more interesting information. If a blockchain only stored transactions it wouldn’t be very interesting because it would be little more than a distributed spreadsheet.
A blockchain is far more than just a distributed spreadsheet — it contains an indelible record of the data’s current state (current values) and a complete historical record of how the data came to the current state. Traditional data repositories generally store only the final state of data. As you make changes, those changes overwrite any previous values. More sophisticated data repositories maintain audit records, which are generally external notes that record changes to data values.
Additionally, blockchain apps can create logging entries that document events that occur as smart contracts execute. The ability to record activities can provide a view of how data changes, not just the fact that it did change. And finally, each block in a blockchain stores information about the functions — and input parameters — in a smart contract that an application calls.
In this chapter, you learn about the different types of data available to you in a blockchain environment and how you can identify data that might be useful for analysis.
Exploring Blockchain Data
In this section, you discover what data gets stored in blocks on a blockchain. Although each blockchain implementation differs in its low-level details, the concepts are generally consistent across blockchain types.
Because the purpose of this book is to introduce you to the most important concepts of blockchain analytics, I won’t cover the specific technical details of every blockchain. Instead, you learn about the specific features of the most popular public blockchain implementation, Ethereum. If you don't use Ethereum, don’t worry — the concepts you learn here will apply easily to any other blockchain implementation.
The main difference between the most popular types of blockchain is the way in which they handle transactions. Bitcoin uses the Unspent Transaction Output (UTXO) model, in which each transaction spends some of what was leftover from a previous transaction, and then creates new output that is the remaining (unspent) balance after processing a transaction. The other main approach to handling transactions is the Account/Balance model, which Ethereum uses. In the Account/Balance model, each account has a recorded balance, and transactions add to, or subtract from, that balance. The Account/Balance model is similar to a traditional ledger. I focus on Ethereum and the Account/Balance model in this book.
Understanding what's stored in blockchain blocks
As mentioned in Chapter 2, blockchain is just a specially constructed group of blocks that are linked, or chained, to one another. Each block header contains the hash of the previous block, forming the link that creates the chain. From many descriptions, it sounds like the blocks in a blockchain are pretty much the same as the data in a database or other data repository. However, that view isn’t accurate. Blockchain stores a lot more than just values of data items, which is why blockchain analytics is so interesting. A lot of information is in a blockchain, but you need to know how to get to it.
Each block consists of some header information and a collection of transactions. In most blockchain implementations, miners select the transactions they want to include in blocks. In Ethereum, if a miner is the first to mine that block, he or she selects transactions based on the potential payoff. Other blockchain implementations use different methods to create blocks. Hyperledger Fabric, for example, uses order nodes instead of miners. Because Hyperledger Fabric uses a different consensus mechanism, it doesn’t rely on competing miners to create valid blocks. Hyperledger Fabric is built on a modular design that makes replacing components, including the consensus mechanism, easy. Hyperledger Fabric uses a consensus mechanism called Kafka by default, but that can be changed if desired. Kafka depends on current nodes electing a leader, and that leader has the authority to build blocks of transactions.
Recording transaction data
Regardless of the approach used to create new blocks, blocks generally contain transactions or smart contract code. Because blockchain technology was introduced to manage cryptocurrency, it stands to reason that transaction data focuses on transferring ownership from one address to another. In this section, you look at a block to see its header information and a list of transactions.
Etherscan is a popular website that allows you to examine the live Ethereum network, mainnet. Figure 3-1 shows a portion of Etherscan’s block header view. The block we will examine is block number 8976776. Note that this block contains 95 transactions.
FIGURE 3-1: Viewing block header information in Etherscan.
To find block 8976776 in Etherscan, go to https://etherscan.io/
and enter the block number in the All Filters field. Then click or tap the search icon (magnifying glass).
Etherscan does much more than provide a way to peek at data on Ethereum’s mainnet. You can examine and retrieve data from mainnet; popular testnets including Ropsten, Kovan, Rinkeby, and Goerli; and the Energy Web Foundation (EWF) chain. If you create an account and request a free API key, you can use the key to extract blockchain data.
To see a list of transactions in block 8976776, click or tap the 95 Transactions link. Figure 3-2 shows the first 5 transactions in block 8976776. You can see that each transaction has a From account, a To account, and an amount. In simplest terms, each transaction records an amount in the Value column being transferred from one Ethereum account to another.
FIGURE 3-2: Listing transactions in a block in Etherscan.
Click or tap the fourth transaction in Figure 3-2 to open the Etherscan transaction details page shown in Figure 3-3. This initial page shows general information about the Ethereum transaction. The To field shows that the target address is Contract, which means that this transaction is the result of a call to a smart contract.
FIGURE 3-3: Examining a transaction in Etherscan.
In Ethereum, the only way you can access data stored in the blockchain is through a smart contract. You use the smart contract’s address (where the smart contract code is stored in the blockchain) to run, or invoke, one of its functions. Smart contract functions contain the instructions for accessing blockchain data.Click