Let me start this post with one of the first things I said when I created the Cryptography and Math category: "The subject (cryptography) is interesting to me but I am not a cryptographer and I don't have the time and computing power to be one. Having said that I have decided to put some of my ideas and screwball calculations on this blog"
The main goal for me here is to learn Visual Basic for Excel and because I am interested in math and cryptography this seems like a good way to do it. I want to restate that because I don't want to get a bunch of comments and nasty emails from people pointing out that "You don't really know what you are doing" and the most obvious one "Why design your own when there are so many free open source crypto-systems out there??" - Answer: I'm doing this for fun and to learn more about tricky Excel trickery.
The main goal for me here is to learn Visual Basic for Excel and because I am interested in math and cryptography this seems like a good way to do it. I want to restate that because I don't want to get a bunch of comments and nasty emails from people pointing out that "You don't really know what you are doing" and the most obvious one "Why design your own when there are so many free open source crypto-systems out there??" - Answer: I'm doing this for fun and to learn more about tricky Excel trickery.
OK now that I got that out of the way....
Sawtooth is possibly a component of an algorithm and the idea behind it is only to provide Cryptographic Diffusion. Click on that link for a nice Wikipedia page describing what that is if you are interested. The design goals that I set out for this and later modified slightly (are now also being modified again) are:
f) There should be (ideally) a 1 to 1 mapping from any input string of bytes to a unique output string of bytes. In other words two different inputs shouldn't produce the same output.
a) If you change any one of the Input bytes by even 1 bit, all the Output bytes should change
b) In addition to all the Output bytes changing (with a 1 bit input change) about half of the 64 Output bits should change
c) Ideally about half of the bits in each Output byte should change (4 bits of the 8 bits in each byte)
d) If all the Input bytes are 0's then the Output bytes should not all be 0's
e) Keep it really simple because I want to do all the tricky stuff in Microsoft Excelf) There should be (ideally) a 1 to 1 mapping from any input string of bytes to a unique output string of bytes. In other words two different inputs shouldn't produce the same output.
g) It would be nice if the input string size could be varied to any length, not set at 64 bits (8 bytes)
Another good thing for me to mention is that I have no intention of trying to figure out an inverse of whatever I come up with here. If I decide to pursue this and use it in some kind of crypto-system it would be implemented in something like a Feistel type algorithm. Click on that link for another nice Wikipedia page for more info if you are interested. I may or may not bother to do that depending on how much fun it is.
So back to what I was talking about... Over the last month or so I have been messing around with lots and lots of spreadsheets and different ideas on how to accomplish the above stated design goals, and modifying the goals as I feel like it. What I have come up with is what I think I'm going to call the "final" design for this until at which time I decide to revise it.
For the Sawtooth Forward and Sawtooth Reverse functions I was using some basic math like addition and multiplication and I experimented with various combination's of things. Overall I wanted to keep these calculations as simple as possible and so what I finally decided to do was use an idea based on the 'dreaded' Linear Congruential Generator or LCG for short (again a link to Wikipedia for more info about LCG's). I say 'dreaded' because LCG's have been used over the years for generating pseudo-random numbers and they tend not to do that very well. As a side note I wonder why anyone would think a function with the word 'linear' in it's name would be good at generating random numbers... but that is off topic.
I'll add a little explanation of why I choose the numbers that I did because any time you use a numeric constant someone always asks "Why did you use that specific number and not some other?" There are several kinds of LCG's but the most straightforward and simple is something called the Multiplicative Linear Congruential Generator (or MLCG) and is the recurrence of this:
Xn = A * Xn-1 mod M
Where:
Xn is the new number
Xn-1 is the previous number in the sequence
A is a multiplier
M is the Modulus
Without going into too much detail because there is tons of detail on the web like THIS (click the link for more than you ever wanted to know) The M number is best if it is a prime number and the A number is a 'generator number'. That is not a requirement, but for what I am doing I want M to be a prime number. One drawback of using a prime number is it's slower for computers to use than a even power of 2 number. I tried a bunch of those and they didn't work as good as a prime number seems to. I'm doing this for fun and prime numbers are fun so that is what I'll use! I decided to pick a prime M number that was as close to 16 bits in length as possible because 16 is a nice computer bus width friendly number and that number is: 65521. So M = 65521 for what I am doing.
For any M number there are lot's of possible A numbers and each has it's own pro's and con's. I decided to use 2 different numbers for A. One for the Saw Forward function and the other for Saw Reverse. The reason I chose the numbers that I did was because one is good at some random number tests that the other one isn't. Without going on any more about this, here is the A value for Saw Forward = 47104 and A for Saw Reverse = 32236.
I should make a point here that I'm not using the recursive form of the formula above so these numbers don't matter that much and more importantly I'm not trying to generate a sequence of random numbers. I just wanted numbers that won't 'lock up' and form patterns too easily when running through all the calculations. I picked the numbers that I did because random or pseudo-random numbers are not a bad thing for what I am doing and the numbers I choose have been studied and found to be good numbers to use. Even if I'm not using them in a traditional LCG formula. LCG's don't work by themselves for cryptographic purposes but as a (small) part of one they are OK.
I'll add a little explanation of why I choose the numbers that I did because any time you use a numeric constant someone always asks "Why did you use that specific number and not some other?" There are several kinds of LCG's but the most straightforward and simple is something called the Multiplicative Linear Congruential Generator (or MLCG) and is the recurrence of this:
Xn = A * Xn-1 mod M
Where:
Xn is the new number
Xn-1 is the previous number in the sequence
A is a multiplier
M is the Modulus
Without going into too much detail because there is tons of detail on the web like THIS (click the link for more than you ever wanted to know) The M number is best if it is a prime number and the A number is a 'generator number'. That is not a requirement, but for what I am doing I want M to be a prime number. One drawback of using a prime number is it's slower for computers to use than a even power of 2 number. I tried a bunch of those and they didn't work as good as a prime number seems to. I'm doing this for fun and prime numbers are fun so that is what I'll use! I decided to pick a prime M number that was as close to 16 bits in length as possible because 16 is a nice computer bus width friendly number and that number is: 65521. So M = 65521 for what I am doing.
For any M number there are lot's of possible A numbers and each has it's own pro's and con's. I decided to use 2 different numbers for A. One for the Saw Forward function and the other for Saw Reverse. The reason I chose the numbers that I did was because one is good at some random number tests that the other one isn't. Without going on any more about this, here is the A value for Saw Forward = 47104 and A for Saw Reverse = 32236.
I should make a point here that I'm not using the recursive form of the formula above so these numbers don't matter that much and more importantly I'm not trying to generate a sequence of random numbers. I just wanted numbers that won't 'lock up' and form patterns too easily when running through all the calculations. I picked the numbers that I did because random or pseudo-random numbers are not a bad thing for what I am doing and the numbers I choose have been studied and found to be good numbers to use. Even if I'm not using them in a traditional LCG formula. LCG's don't work by themselves for cryptographic purposes but as a (small) part of one they are OK.
As in the last iteration of the design I decided to use a Pseudo-Hadamard Transform or PHT for short. Click on that link for another nice Wikipedia page describing what PHT is if you are interested. It is a pretty easy way to mix two numbers together. Again I won't go into any in depth explanation of why but here is specifically what I used:
O' = X + Y mod 256
O" = 2*X + Y mod 256
So if X and Y are the two numbers you want to mix to get the two output numbers (O' and O") then you just do the above math. Because the Saw Forward and Saw Reverse function is 16 bit (mod 65521) and I want 8 bit outputs I am doing the PHT mod 256. That will give me two 'mixed numbers' between the values 0 and 255.
Last but not least in keeping with the cryptographic theme of all this I decided to introduce some key data into the mix. I think it's cool and nifty if the Cryptographic Diffusion is based not only on the input data you are encrypting but also some of the secret key. One way and the most often used would be to Exclusive Or 8 bytes of the key with the 8 bytes of either input or output data. I thought about that but decided that if I want the design goal (restated here):
"g) It would be nice if the input string size could be varied to any length, not set a 64 bits (8 bytes)"
to hold true I would have to have really long keys. Each time the input data string got longer the key would have to get longer too and that is a pain. So at the beginning of the Saw Forward function I decided to add in a key byte and at the beginning of the Saw Reverse function do the same. Please note that for all the calculations that I have done thus far the key values have been 0. I didn't want to add another variable into all this so I just ignored the key and changed the 8 input bytes.
OK at this point I think it's time for a picture. You can click on it to get a bigger and clearer view. (In the legend above the picture the blue and orange horizontal arrows are reversed....)
O' = X + Y mod 256
O" = 2*X + Y mod 256
So if X and Y are the two numbers you want to mix to get the two output numbers (O' and O") then you just do the above math. Because the Saw Forward and Saw Reverse function is 16 bit (mod 65521) and I want 8 bit outputs I am doing the PHT mod 256. That will give me two 'mixed numbers' between the values 0 and 255.
Last but not least in keeping with the cryptographic theme of all this I decided to introduce some key data into the mix. I think it's cool and nifty if the Cryptographic Diffusion is based not only on the input data you are encrypting but also some of the secret key. One way and the most often used would be to Exclusive Or 8 bytes of the key with the 8 bytes of either input or output data. I thought about that but decided that if I want the design goal (restated here):
"g) It would be nice if the input string size could be varied to any length, not set a 64 bits (8 bytes)"
to hold true I would have to have really long keys. Each time the input data string got longer the key would have to get longer too and that is a pain. So at the beginning of the Saw Forward function I decided to add in a key byte and at the beginning of the Saw Reverse function do the same. Please note that for all the calculations that I have done thus far the key values have been 0. I didn't want to add another variable into all this so I just ignored the key and changed the 8 input bytes.
OK at this point I think it's time for a picture. You can click on it to get a bigger and clearer view. (In the legend above the picture the blue and orange horizontal arrows are reversed....)
In the next paragraph I'm going to explain in detail what is going on in the picture. If you are not that interested skip the next few paragraphs (or this entire blog post for that matter!!)
In the picture there are a bunch of colored arrows and what looks like three different Excel ranges. In reality it's all one range and I just copied it three times for the picture to make it a bit clearer and easier to explain. This is essentially the same 'structure' that I had originally come up with just done with different calculations (and fewer of them) than before.
So in the top picture there is a Key 1 on the left with a blue arrow pointing up to the first blue Input cell with a value of 0 in it. Look at the top legend and it tells you what the blue arrows are mathematically (look at the orange one because I goofed in the picture and don't want to fix it). So in that first case the Key 1 is added to 0 (that's the X+Y part) then then multiplied by one of the generator numbers, the answer is put below it, 0 in this case. Then it repeats until it hits the 1 in the input and (0 + 1) * 47104 = 47104 so that is what is in the cell below the 1. It goes on like that to the right hand side of the range and all 8 input values have been assimilated.
On the far right second row there is a value of 10580. That value is added to Key 2 (set to 0 for this example) and then multiplied by 32236 mod 65521 to get 20075. Once that has been done the same pattern as before goes from right to left as shown in the middle range in the picture. Once all that has been done the PHT part is done to each pair of values as shown in the bottom range in the picture. Boy Oh Boy I give you a lot of credit if you read all that.
As mentioned before in the previous posts I built a spreadsheet that has two of the above ranges built into it. In the blue input areas I checked for the difference between the two inputs by taking the Exclusive Or of each cell and representing it in binary form with the Excel =DEC2BIN() function. The Exclusive Or was something I had to add in Visual Basic for Excel and the code for that is at the bottom of this post if you are interested. So each of the 8 pairs on input cells were checked with this formula:
=DEC2BIN(EXCLUSIVEOR(G16,G25),8)
So that would give me the 8 bit binary difference between the inputs (cells G16 and G25 in this case). I could then count the number of 1's in each cell to give me the binary difference or the Hamming Weight in computer vernacular with this Excel code:
=LEN(G5)-LEN(SUBSTITUTE(G5,"1",""))
At the same time I did the same thing with the output values of each range, Exclusive Or them, convert them to binary and count the 1's to give the difference (Hamming Weight). Click HERE of the original Sawtooth post for a picture of what this looks like and a visual of the "walking 1" that I am going to mention a bunch of times in the next paragraph. Walking a 1 is just multiplying by factors of 2.
Once I had all that working I wrote some more Visual Basic code to get 8 byte strings of random numbers and automatically copy and paste them into each of the input ranges. After each copy and paste of random numbers, the code then "walked a 1" through one of the input ranges and copied the binary output differences between the two into an empty area of the spread sheet. I set the code to loop through 82 iterations then let it rip. After that was done I filled one input range with all 0' and "walked a 1" and then 255 and "walked a 1".
This gave me a total of 87 different inputs, each with 64 "walking 1" differences for a total of 5568 different outputs. This spreadsheet suddenly got really really big (I can't understate that last comment).
Once I had all that data in the spreadsheet I then compiled it into some statistics. On thing I did is I looked at how many bits changed in each 8 byte output by counting the 1's of the Exclusive Or's with the above =LEN code and divided by 64 * 100 to get a percentage of bits changed. As mentioned above in design goal (restated here):
"b) In addition to all the Output bytes changing (with a 1 bit input change) about half of the 64 Output bits should change"
What I found was on Average 51.091% of the bits changed. The Median value was 51.563% with a Mode of 50.000%. The Minimum value was 28.125%; Maximum was 76.563% and a Standard Deviation of 5.183%. I guess what all that means is there is a relatively balanced bell curve with good grouping around the Average (standard deviation is low).
So far so good. Then I looked at the total number of output byte, bit differences by byte location. (that's a confusing sentence). In other words one might ask the question: "How many bytes in the first byte location (Byte 1) had 8 bits change?" Look below for the answer!!! (Hint it's 27, upper left corner)
Totals By Byte Location | # Bits Diff | Totals | |||||||
27 | 22 | 28 | 25 | 21 | 27 | 19 | 26 | 8 | 195 |
191 | 183 | 208 | 188 | 188 | 184 | 186 | 202 | 7 | 1530 |
622 | 637 | 603 | 630 | 632 | 648 | 567 | 690 | 6 | 5029 |
1380 | 1333 | 1242 | 1329 | 1311 | 1199 | 1125 | 1233 | 5 | 10152 |
1577 | 1593 | 1571 | 1701 | 1569 | 1512 | 1444 | 1490 | 4 | 12457 |
1206 | 1129 | 1166 | 1183 | 1166 | 1275 | 1307 | 1131 | 3 | 9563 |
486 | 535 | 546 | 486 | 557 | 599 | 713 | 538 | 2 | 4460 |
79 | 135 | 204 | 26 | 124 | 124 | 207 | 258 | 1 | 1157 |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 |
If you have read the previous Sawtooth posts you are sick and tired of looking at the above text chart so I took the trouble of making some cool graphs below. But to understand the graphs you need to see the data and there it is.
In the orange column the descending numbers from 8 to 0 are the number of bits that changed. The Byte 1 through Byte 8 along the bottom show the bytes and the numbers above each show how many times each changed by 8 bits, 7 bits, 6 bits etc... As you can see from the data the numbers are bigger in the 4 Bits Diff row and get smaller as you go down to 0 Bits Diff and up toward 8 Bits Diff. This is not an even distribution centered around 4 bits as there are many more 8 bit differences than 0 bit differences. In fact there was only 1 byte that had 0 bit's different in the entire test. (Byte 2). This satisfies design goal (restated here):
"c) Ideally about half of the bits in each Output byte should change (4 bits of the 8 bits in each byte)"
and it almost satisfies design goal (restated here):
"a) If you change any one of the Input bytes by even 1 bit, all the Output bytes should change"
I say it almost satisfied design goal a) because Byte 2 didn't change that one time. 0 Bits Diff row with a value of 1 as mentioned above.
This is much better than all the last Sawtooth attempts and it's good enough by the following equation that has a divide by zero error: = (1 / FUN) When fun approaches and finally reaches zero I stop messing with this spreadsheet. Fun now = 0 so I'm calling a single 0 byte difference in 5568 different output differences good enough.
As promised to summarize the above confusing and boring text chart I have created a couple of full color graphs that are as exciting as they are colorful. (click on it for a bigger clearer view)
There are three graphs in the picture. The upper left one with the red bars is a graph of the red Totals column data from the text chart above. That graph is showing the totals for all the byte locations and the biggest red bar is in the middle representing that most of the byte changes were 4 bit/byte changes. As can be seen the chart is skewed slightly to the left where the higher bit/byte changes are.
The next graph on the top right is the remaining data from the above text chart. Similar to the first graph the number of bit changes is across the bottom but in this one each byte location is shown with a separate colored bar.
The last graph on the bottom of the picture is the actual output values, NOT the Exclusive Or differences. of the outputs as in the other two charts. That is the actual values that came out of the Sawtooth function as the "walking 1" changed the random numbers that were entered as inputs. In that case I ran only 2288 sample iterations to get the data. I did that because I wanted to be sure that the values that came out of the Sawtooth function spanned the full 0 to 255 possible output values for each byte location and they did. I also wanted to get an idea of how evenly distributed the output values would be.
That bottom chart is a histogram of each value with the frequency of occurrence in the Y axis and the byte values 0 to 255 in the X value. They are pretty evenly distributed with a numeric Average of the actual values = 127.764; Median of 128; Mode of 221 and a standard deviation of 74.192. Not too bad considering that this data was created with 1 bit differences of random numbers. So the initial numbers were random but each set of random numbers changes 64 times in a very linear way.
This leaves me with one last design goal (restated here):
"f) There should be (ideally) a 1 to 1 mapping from any input string of bytes to a unique output string of bytes. In other words two different inputs shouldn't produce the same output."
This is a tricky one and I have not thought of an easy way to check this one. I'm going to write some code to test the data that I have already generated to look for this not happening but I'm not sure there is an easy way to test to see if it will happen. As long as FUN not equal to zero I'll keep looking into that but for now I'll just think about it.
I think the next thing I am going to do with this is run some tests on keeping the input data fixed and just change the key values and see what happens. As I mentioned up to this point the key values were 0 to keep things simple. Knowing how the key values change the output with the input data held constant would be a good thing to know.
Below is the Exclusive Or code that I added in Visual Basic to get that function in Excel. Below that is the Visual Basic Code I wrote to generate all the data for everything I talked about here. The Exclusive Or code has pretty colors to help understand it. If you paste it into the Visual Basic Developer window in Excel the function =EXCLUSIVEOR( ) will be added to the list of things you can type into any cell. It will give you the Exclusive Or value of the two numbers (8 bits or less) that you reference. It works like that because it's a Function.
The other code doesn't have pretty colors because I have to add the color manually and I don't feel like doing it. If you copy and paste it into Excel it will recognize all the code words and color it for you, making it easier to follow. That code will run as a Macro because it is defined as a Sub and you will see it in the list of macro's with the name "Walk_Diff_Multiple1"
Email me or leave a comment in the blog and I'll detail how it works or clean it up with colors or something to make it easier to read.
Enjoy!
.........................................
Exclusive Or Code:
Function EXCLUSIVEOR(Byte1 As Integer, Byte2 As Integer)
'
' EXCLUSIVEOR (Byte1 , Byte2)
' XOR Exclusive Or of Two 8 bit Byte's
' November 2009 http://ottobelden.blogspot.com
'
EXCLUSIVEOR = Byte1 Xor Byte2
End Function
...........................................
Sub Walk_Diff_Multiple1()
'
' This code will get rnd numbers generated by Excel from one range in sheet
' then copy values only to three locations, Top Input To Change,
' Bottom Input To Change and where current output row is for Reference
' Once Rnd values have been copied, "walk a one" thru all Input to Change values
' copying output value differences to output rows and Top Raw Outputs
' then go back and do it again
' November 2009 http://ottobelden.blogspot.com
'
' Rnd Numbers range C31, D31, E31, F31, G31, H31, I31, J31
' Input to Change C9, D9, E9, F9, G9, H9, I9, J9
' Input Diff to Copy C2, D2, E2, F2, G2, H2, I2, J2
' Output Diff to Copy C5, D5, E5, F5, G5, H5, I5, J5
' Top Raw Output Valus C12, D12, E12, F12, G12, H12, I12, J12
'
Dim Multi As Long ' This is the walking 1 valut to Xor
Dim Orig As Long ' This will be the Origional value of cell before Xor
Dim Row As Long ' This keeps track of what row the output goes to
Dim Col As Long ' This keeps track of the column that Multi will change
Row = 73 ' Row to start pasting 73 is in unused rows range
For SampleSize = 1 To 42 ' Number of samples
Col = 10 ' Start in Column "J" 10th letter
Multi = 1 ' Value to start Xor'ing with
Orig = 0 ' Set to 0 in case I have to initialize variables
' Select Rnd values to copy
Range("C31:J31").Select
Selection.Copy
' Paste Rnd values for Reference
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Select Rnd values to copy from Reference because Rnds have changed
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.Copy
' Paste Rnd values to Top Input
Range(Cells(9, 3), Cells(9, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Select Rnd values to copy from Reference because Rnds have changed
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.Copy
' Paste Rnd values to Bottom Input
Range(Cells(18, 3), Cells(18, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Start walking a one thru the input to change values
For ColCount = 1 To 8
For Counter = 1 To 8
'Copy and Paste Input Values Only to Output range
Orig = Cells(9, Col) ' Set the current value of the cell to Orig
Cells(9, Col) = Cells(9, Col) Xor Multi ' Xor Cells(ROW=9, COL) with Multi
Range("C2:J2").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 12), Cells(Row, 19)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Copy and Paste Output Values Only to Output range
Range("C5:J5").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 21), Cells(Row, 28)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Copy and paste Raw output values
Range("C12:J12").Select ' Select Raw output range to copy
Selection.Copy
Range(Cells(Row, 52), Cells(Row, 59)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Row = Row + 1
Multi = Multi * 2
Cells(9, Col) = Orig ' Set the cell that just got Xor'd back to it's Orig value
Next Counter
Multi = 1 ' Reset Multi for next input byte
Cells(9, Col) = Orig ' Set current input cell to Col
Col = Col - 1 ' Move back one column
Next ColCount
Next SampleSize
End Sub