Video - Bitcoin 101 - Merkle Roots and Merkle Trees - Bitcoin Coding and Software - The Block Header

Most people on earth have never even heard of Merkle roots. But bitcoin programmers deal with them every day. This is old school technology in terms of software, but still very important in terms of security and data management. In this video James will walk you into the world of Merkle roots and Ralph Merkle, who invented this technology in the late 1980s. If you like seeing actual coding examples, your in luck. James starts incorporating actual python examples...after all, would you want to listen to Car Talk and not want to hear about a muffler?

TRANSCRIPT

Hello this is James D'Angelo. And welcome to the Bitcoin 101 Blackboard Series. Today we're taking a look at this doohickey right here. And how it ends up making this doohickey right here. And how in every block in Bitcoin you'll see this doohickey right here. And this thing right here is called the Merkle roots. So, we're going kind of technical, little bit low level. And we're going to even look at some programming examples, some quick Python code to try and understand what exactly is a Merkle root. A little bit about why it's used? And even how to generate the exact Merkle root for a specific block, okay. And so, you might have already understood that this right here is a picture of a Merkle tree. And these are considered the leaves. And all the way down here are the branches and this right here is the Merkle root. And this number is one of the things that protects the integrity of Bitcoin transactions.

Okay so, let's rewind for a second and understand better what we're looking at. Right here we're on blockexplorer.com which is a great site to see all the details there inside of every block on the blockchain. So, every 10 minutes a new block with all the transactions of the previous 10 minutes gets packed together and stuffed on top of the blockchain. And each particular block has a number. This particular block is 286,819. And what that means is there have been 286,819 blocks before this back to when Satoshi Nakamoto first released Bitcoin in January of 2009. And he started mining probably all by himself when he started mining block one. Okay so, we're looking at the block header on blockexplorer. And you can type in the number for any block and actually see all the transactions that happen inside it. So, if you went to blockexplorer and just type the number one. It would show you the first transactions ever made with Bitcoin which is really just a Coinbase transaction to whoever was mining most likely Satoshi himself. Now, let's just sort of run through the rest of this header. So, we can kind of understand what we're dealing with. And then we'll go further into this whole idea of Merkle roots okay.

So, here we have the hash. This is the hash of the block that expresses all the difficulty it took to mine it. And this is going to be something like 16 or 17 zeros. And then just sort of some random numbers afterwards. And this hash is like a signature for this particular block. In the next block 286,820 will have to include this number inside to prove that it actually came after the previous block. And this is how a timestamp server works. It uses unique numbers that have to be included in the following block, okay. And so, here we've actually got the previous block. So, this is the hash of block 286,818, okay. And though this number wasn't known at the time. Here is actually the hash of the following block, okay. So, if you look inside them actual details the raw block which you can actually see right here. Just by clicking this you'll see the JSON interpretation of the block. It won't have the next block included in it. And so, this is the hash of block 286,818 okay. And this color is not great. So, let's move to another one. And this is the time okay. So, this is the time approximately when this block was made. So, that was February 20th, 2014 at 4:57 in the morning.

And this next number is the difficulty. This is this number that tells you how much computing power you're going to need to generate this particular block in approximately 10 minutes. And right now, you need something like thirty peta hashes. And will actually compute the difficulty from these number and other videos and all that. We're going to actually look at all these items in different videos. Here's a number of transactions of this particular block. So, in these 10 minutes in this approximately 10-minute time there were 99 transactions okay. And we're going to look at the whole list of the transactions. So, in a second, during those 10 minutes 3,925 Bitcoins were transferred. At the current rate of around $600, it's pretty good amount of money that's flying around okay. The size, and this is very important to people who are storing the entire blockchain. The size of this particular block is 152 kilobytes. Okay so, it sounds really small. It's actually smaller than most photos file size are on Facebook. But every 10 minutes adding this kind of size accumulates.

And right now, after five years of generating blocks. We've got a blockchain which is now 17 gigabytes of information. So, it's getting pretty beefy and there's a lot of talk about how to prune the blockchain. And they'll eventually succeed right. Most computers will be able to just hold unspent outputs and that will drop the amount really low. And then if you have unspent outputs that you can spend. So, you're not including all the one Satoshi transactions. You could actually probably drop a very useful blockchain down in the order of megabytes probably maybe 50 megabytes or less. But that's something for the future. Okay right now if you want to control and look at the blockchain. You're going to be talking 17 gigs of information. And then we get to our friend the Merkle group. And this is to most people the most confusing one of all. Because it's got this really weird name. But we're going to jump past it really quickly and just kind of understand the rest of this block.

The next number is the nonce. The nonce we'll also get a lot of attention when we go into the details of mining. But here's the nonce. This is a particular number that the miners are generating. So that when they calculate the SHA256 so this block header that they'll end up with this SHA256 right here. So, they're just cranking through Nonces until they get the right key to solve this block. And so, the nonce is basically that lucky combination that allowed this miner to end up with the 25 bitcoins associated with this block. And what the particular miner did was probably start at some other number. I don't know, 85 million or something like that. And just started counting up until he reached this number. And if you do it faster than anyone else. You can win the bitcoins. So, raw block is not part of the block header. This is just a -- a thing you can click on blockexplorer. So, let's take a look at that.

Okay so here we are at the live page for this particular block on blockexplorer.com. It's just a great site for looking at really really sort of gritty details of Bitcoin. So, here's all the transactions. Okay all 99 transactions. And each of these transactions it turns out has a transaction ID. And that transaction ID is right here. So, if I click on this particular transaction ID, for example, I'm going to get the information on this specific transaction. So, what address it came from? What address it went to? Okay so here's a, "To address" here's another, "To address". So, one of these is the change address. One of them is the real recipient. And here's the, "From address" that's one transaction. So, remember this particular block had 99. It shows the date and time of this particular transaction. Everything you could possibly want to know except for the people who actually received and sent the money. Remember Bitcoin provides a level of anonymity, okay.

And just for fun let's jump back and take a look at another transaction. And here we go. And this transaction has this transaction hash very important for when we're talking about Merkle roots. And inside the transaction, we've got the "From address", the "To address" and the amounts being spent. Okay and so here we're talking something underneath a Bitcoin. So, likely what was sent was this amount here. And the person who had the Bitcoins this amount sent that back to themselves in the form of change. But again, it's really important to realize that every transaction has a hash. So, they take all the transaction info right in the raw form. And they actually take it in the binary form. But it looks you know something like this a lot of code. They hash that without obviously the hash number. At top they hash all of this in binary form. And end up with this it's a SHA256 hash which we'll talk about in another video. Each transaction every single transaction in Bitcoin will have a unique transaction ID hash. And so, this particular one has 4668, okay. And every single one will have some other bizarre transaction ID. Okay and here's another transaction. It's got the hash of the transaction right here BADC. And it's taking inputs for many different places that are all sign able by the person who's signing off with this transaction and sending them as 42 Bitcoins to this one particular address on the specific time okay. So, everything can be tracked.

Okay so, now let's get a little more busy with the idea of this particular Merkle root. And we're going to try and actually calculate this particular one by using the information from all these transactions. So let's get a little background on exactly what we're doing by kind of drawing this up on the blackboard. Okay so, we're talking about how to generate the Merkle root and how to build a Merkle tree. Okay and as we saw every single transaction had a particular ID a transaction hash. And so, what a Merkle tree is created out of those transaction ID's those hashes. So, say you have very simple, an eight-transaction block okay. And so, each of those transactions are going to have their transaction ID. And Bitcoin references just those ID's as it's making the Merkle tree. So, here's transaction ID one. Here's transaction ID two, three, four, five, six, seven, eight. Now, a Merkle tree what it does is it takes those two hashes of transaction one and transaction ID two and it hashes them together. So, it just concatenates the two long 64-character hex transaction ID's right that we saw right here okay. So, it takes the 64-character hex thing adds to it the next hash right over here. And then takes a hash of both of them together. So, it ends up with let's just call this hash one okay.

And then it does the same thing with the next two. So, let's call this hash two. And it does it with five and six. And you're getting the idea hash three and seven and eight, hash four. And it calls these things at the top. These top transactions are referred to as Merkle tree leaves okay. Because the leaves are at the very end the branches. And the root is at the very base okay. So, then what you do is you go down one-level. And then you take the first-two from the next layer of branches and it hashes those together. And I don't know, let's call this B hash one and hashing the next two together let's call this B hash two. And you get sort of another level of your branches, okay. Here you have two branches. And here you kind of are getting down to one. Then all you have to do is hash those two together. And you end up with what's known as the Merkle root.

And really that's it. That's the whole idea behind Merkles. Of course, there's some big endian issues. Little endian issues and when you actually go to program it. Gets kind of fun and complex and we're going to take a look at some of that in a second. But what a lot of people are wondering is probably where did this freaky name come from?

Well, that's pretty easy. Here we got a guy named Ralph Merkle, okay. Born February 2nd, 1952 and he invented this way of storing data in a way that's sort of provable and secure okay. And here he is on his own home page right here's Ralph Merkle. And so, Ralph came up with this idea in 1987 in a paper called a digital signature based on a conventional encryption function. And he comes up with the idea of a Merkle tree. Now in his paper because he's introducing the idea. You won't find any reference to a Merkle tree anywhere okay. You'll just find his name because he wrote the paper. Well it became such an important thing and they didn't have a very convenient name for it. So, they decided to call it a Merkle tree and use these things called Merkle roots. Merkle trees and Merkle roots are used in a variety of applications. And again, they're used in every block in Bitcoin. And they actually are a savings of computational power. And the amount that needs to be sent out by mining pools. Because instead of projecting or sending all the transactions or rehashing all the transactions every time you hash the particular block you can just hash the block header. So, you're just using that one tiny little Merkle string.

So, it turns out to be a huge computational savings for Bitcoin. It is other ways in which it saves computational power especially as we look to decrease the size the blockchain as we're dealing and looking for individual transactions. Now, a lot of this will get clear if we start to look into how a Merkle tree is actually encoded? So, here we've got a piece of simple software. And it was actually written and posted by a guy named Ken Shirriff who's got this amazing blog. Okay this is a guy who's just recently jumped into Bitcoin in a big way. And as some of the most beautiful posts for understanding Bitcoin, okay. So, please take a look at his blog. And at the very bottom of this most recent post okay.

So, this is from February 23rd. If you go to the very bottom. At the very bottom of his post in the notes and references section. He includes some code about how to generate a Merkle tree okay. And you can just click view raw and you can take his wonderful little piece of code. And just dump that into a simple Python interpreter okay. And that's exactly what we did. And so, here's our code over here by Ken Shiriff. And these down here are all 99 transaction IDs. So, these are all the individual hashes that we were seeing in the previous blockexplorer page and just to hammer that home. Let's go back to the blockexplorer. And we see that the first transaction ID is 00BAF6626. And if we look at Ken's code over here. We see 00BAF266. The next one is 91C75. So, this is taking all the transaction IDs. And just putting them in one little file. And then with just these few lines of Python code. He's going to generate the Merkle root. So, we can just take this right here. And if you use Idle or Python 2.76 you can just hit your run key. And bam you end up with this number right here, which turns out to be the actual Merkle root of this block.

Okay well, that seems to work pretty good. But let's actually dive into this code. And understand a little better what's going on okay. Like I said we've got all these transaction ID's. And the amazing thing with how SHA256 works is that if I run this code as many times as I want. It's always going to calculate the same fingerprint. So, let's run it again. Boom, exact same. Ok run it again. Boom the exact same 64 characters of hex which make the Merkle root for this block. But if I take any one of these. And just change one letter anywhere. So, let's take the seven right here. And turn it into a nine, okay. Then I hit save and now I run this, end up with a completely different hash okay. And that right there is a really good example of how amazing SHA is, okay. This is a unique identifier that can only be generated by this collection of transactions. And even changing just one letter will end up in this particular fingerprint. Those one that starts with a 2FA. And now every time if I run it. It will generate the exact same 2FA fingerprint, okay.

So, hopefully I can undo the edit I made and get back to the original version. Yep there's my seven, okay. And now I save again. And just by changing one letter. I'll get back to my original hash that I was looking for. So, let's take a better look at this code. And see how it's putting these things together okay. And some of the key lines here that'll make sense or you see this hash Lib SHA256 of two different items okay. So, it's doing what we talked about on the Merkle tree. This particular line is taking any two of these and dropping them into one hash value. So, that same piece of code will work over here. Or will work even down here, okay. So, this piece of code is actually the key code for generating the SHA256. And in fact, it's the only point in the code where you see SHA256 mentioned. So, just quickly running down the code. We see if the length of this hash list okay. And this is right here which is 99 when you start. If it's equal to one. Just return the first value hash lists zero okay. So, if it's the only transaction and the thing you just actually return the hash of that particular transaction. If not let's create this new string of hashes okay.

Now what it's going to do and if you're a little familiar with Python. It says, "for I in the range of zero. So, starting at the very first transaction. To the entire length of this hash list which in this case is 99 alternating by every two grab the first two and now hash them together. Grab the next two and hash them together. So, you'll see that right here. Append to this new list. The hash, this function right here. Which is actually this function down here which does our hashing. Stick these two guys together and hash them together. And so what you're going to do is go through this entire list of transactions and hash together the pairs just like we talked about. So, here's the first two. Here's the second two. Here's the third all that. And they'll generate because there's 99 transactions. Some of them generate a much more leafy and much taller tree. And we're going to run through this code really quickly. And then we're going to add some instructions to this code to sort of tease out some of the more interesting aspects of it, okay.

Now in this line right here. Basically, says if there's one transaction left over okay. So, if you have an odd number. And in this case, we have 99. Then you take the last transaction. Transaction number 99 and you hash it to itself okay. Because Merkle trees need a new hash for every level. You can't just keep passing along the transaction ID that's stringing out at the end. You need new hashes okay. And then when you're done. When you've taken those 99 and chop them down to what 50 you run the whole thing over again with the 50 that you just generated okay. And so you end up with round two until 50 goes to 25. 25 goes to I don't know 13 because of the odd number goes to 13. 13 goes to 7. And you end up with a bunch of recursions, okay. So, to better understand it. I've added a number of print statements in the middle of Ken's code that will tease out some of the ways this hashing works okay. And we're just going to run this with a much smaller set of transactions. We're going to actually rename our transactions to really tiny things. So, you'll see the very beginning ones just be small. And then you'll see the SHA256 coming in.

And for the purposes of this demonstration. We don't even need those transactions okay. So, our transaction list is right here. And what is that? There's 10 of them there and so we're going to run the code with a much simpler set of transactions. So, let's say go. Okay and you see a lot of my print statements are generating some information. And so at the bottom, we get a Merkle root which is not the Merkle root of the blog we were looking at. But it's actually the Merkle root of these 10 transactions with really tiny transaction ID names that I just kind of put in okay. So, the bottom is our Merkle root. Which is always going to be 64 hex characters. Because we're doing a SHA256 Hash okay. SHA256 will always be 64 characters. No more no less and they will always be hex characters okay. So, it generates 256 bits of information okay and hex is obviously much shorter. So, you get 64 characters okay.

But what we see is that we basically get a number of rounds or a number of levels on our tree okay. And if we go to the top. We see that our tree is much bigger. And then as we go down each section our tree gets smaller. And each time it gets smaller and smaller. Until we end up with one Merkle root. And so really what we're doing is exactly what we did in this drawing. Where at the beginning we're saying this is aa. This is bb, cc, dd, ee. And then I did 11; 22; 33 and it continued over here. So. if we look at our output over here. Branch one is aa. So, it's the first transaction that we're dealing with that's the first well it's truly a leaf in this case because it's at the beginning but we'll just call it a branch for now. Branch two is bb. Hashing A and B together okay. so, if you just took the letters aa no space bb and stuff that into a SHA256 hash. You would end up with this. Of course, there is some little-endian and big-endian stuff. So, you have to reverse those bytes in there. But you end up with this. And so if you're a coder you can see some of this little-endian and big-endian dealing with that right here okay. These are basically just flipping around the bite order okay. So, Bitcoin has a lot of reversals of big-endian and little-endian kind of a pain in the butt. But that's the first hash. So, this hash right here corresponds to this hash right here. And that's exactly which one it is okay.

Then branch three and branch four ends up with this hash. So, this one 18AC would actually be right here 18AC blah-blah-blah. Because here's three. Here's four cc dd get hash down into here okay. And so, what it does is it goes through all these transactions right here and ends up with a new list. And the new list starts with this hash. Then this hash then this hash. This one and this one. So, it ends up with one, two, three, four, five. The next round it takes those particular hashes and hashes them. So, it says branch one is 3392 which we get right from here. And branch two is 18AC and we get that from here and hashes those two together. And it gets the next level of branching. And of course, this starts to get pretty obvious.

And so hopefully with this you begin to understand how Merkle roots are generated inside of the block header okay. And they've got lots of great uses. And there's a lot of great reasons for them being there. There's even lots of discussion of adding more levels of Merkle roots and Merkle trees inside the transactions themselves. That's likely to happen. So, it's really important to understand how these things work. And not be afraid of them because they've got some guys name Ralph Merkle who wrote some paper in 1987, okay.

So, that's it for now. Just a taste for Merkle roots. We'll be talking much more about this stuff and many other videos. Hope you enjoy. Please remember to like, comment, subscribe. Do all those things you do. And we'll catch you at the next video.

Written by James DeAngelo on March 1, 2014.