Let me start this post with one of the first things I said when I created the Cryptography and Math category: "The subject (cryptography) is interesting to me but I am not a cryptographer and I don't have the time and computing power to be one. Having said that I have decided to put some of my ideas and screwball calculations on this blog"
The main goal for me here is to learn Visual Basic for Excel and because I am interested in math and cryptography this seems like a good way to do it. I want to restate that because I don't want to get a bunch of comments and nasty emails from people pointing out that "You don't really know what you are doing" and the most obvious one "Why design your own when there are so many free open source crypto-systems out there??" - Answer: I'm doing this for fun and to learn more about tricky Excel trickery.
OK now that I got that out of the way....
g) It would be nice if the input string size could be varied to any length, not set at 64 bits (8 bytes)
So back to what I was talking about... Over the last month or so I have been messing around with lots and lots of spreadsheets and different ideas on how to accomplish the above stated design goals, and modifying the goals as I feel like it. What I have come up with is what I think I'm going to call the "final" design for this until at which time I decide to revise it.
In the next paragraph I'm going to explain in detail what is going on in the picture. If you are not that interested skip the next few paragraphs (or this entire blog post for that matter!!)
In the picture there are a bunch of colored arrows and what looks like three different Excel ranges. In reality it's all one range and I just copied it three times for the picture to make it a bit clearer and easier to explain. This is essentially the same 'structure' that I had originally come up with just done with different calculations (and fewer of them) than before.
So in the top picture there is a Key 1 on the left with a blue arrow pointing up to the first blue Input cell with a value of 0 in it. Look at the top legend and it tells you what the blue arrows are mathematically (look at the orange one because I goofed in the picture and don't want to fix it). So in that first case the Key 1 is added to 0 (that's the X+Y part) then then multiplied by one of the generator numbers, the answer is put below it, 0 in this case. Then it repeats until it hits the 1 in the input and (0 + 1) * 47104 = 47104 so that is what is in the cell below the 1. It goes on like that to the right hand side of the range and all 8 input values have been assimilated.
On the far right second row there is a value of 10580. That value is added to Key 2 (set to 0 for this example) and then multiplied by 32236 mod 65521 to get 20075. Once that has been done the same pattern as before goes from right to left as shown in the middle range in the picture. Once all that has been done the PHT part is done to each pair of values as shown in the bottom range in the picture. Boy Oh Boy I give you a lot of credit if you read all that.
As mentioned before in the previous posts I built a spreadsheet that has two of the above ranges built into it. In the blue input areas I checked for the difference between the two inputs by taking the Exclusive Or of each cell and representing it in binary form with the Excel =DEC2BIN() function. The Exclusive Or was something I had to add in Visual Basic for Excel and the code for that is at the bottom of this post if you are interested. So each of the 8 pairs on input cells were checked with this formula:
=DEC2BIN(EXCLUSIVEOR(G16,G25),8)
So that would give me the 8 bit binary difference between the inputs (cells G16 and G25 in this case). I could then count the number of 1's in each cell to give me the binary difference or the
Hamming Weight in computer vernacular with this Excel code:
=LEN(G5)-LEN(SUBSTITUTE(G5,"1",""))
At the same time I did the same thing with the output values of each range, Exclusive Or them, convert them to binary and count the 1's to give the difference (Hamming Weight). Click
HERE of the original Sawtooth post for a picture of what this looks like and a visual of the "walking 1" that I am going to mention a bunch of times in the next paragraph. Walking a 1 is just multiplying by factors of 2.
Once I had all that working I wrote some more Visual Basic code to get 8 byte strings of random numbers and automatically copy and paste them into each of the input ranges. After each copy and paste of random numbers, the code then "walked a 1" through one of the input ranges and copied the binary output differences between the two into an empty area of the spread sheet. I set the code to loop through 82 iterations then let it rip. After that was done I filled one input range with all 0' and "walked a 1" and then 255 and "walked a 1".
This gave me a total of 87 different inputs, each with 64 "walking 1" differences for a total of 5568 different outputs. This spreadsheet suddenly got really really big (I can't understate that last comment).
Once I had all that data in the spreadsheet I then compiled it into some statistics. On thing I did is I looked at how many bits changed in each 8 byte output by counting the 1's of the Exclusive Or's with the above =LEN code and divided by 64 * 100 to get a percentage of bits changed. As mentioned above in design goal (restated here):
"b) In addition to all the Output bytes changing (with a 1 bit input change) about half of the 64 Output bits should change"
What I found was on Average 51.091% of the bits changed. The Median value was 51.563% with a Mode of 50.000%. The Minimum value was 28.125%; Maximum was 76.563% and a Standard Deviation of 5.183%. I guess what all that means is there is a relatively balanced bell curve with good grouping around the Average (standard deviation is low).
So far so good. Then I looked at the total number of output byte, bit differences by byte location. (that's a confusing sentence). In other words one might ask the question: "How many bytes in the first byte location (Byte 1) had 8 bits change?" Look below for the answer!!! (Hint it's 27, upper left corner)
|
|
| Totals By Byte Location
|
|
|
|
| # Bits Diff
| Totals
|
27
| 22
| 28
| 25
| 21
| 27
| 19
| 26
| 8
| 195
|
191
| 183
| 208
| 188
| 188
| 184
| 186
| 202
| 7
| 1530
|
622
| 637
| 603
| 630
| 632
| 648
| 567
| 690
| 6
| 5029
|
1380
| 1333
| 1242
| 1329
| 1311
| 1199
| 1125
| 1233
| 5
| 10152
|
1577
| 1593
| 1571
| 1701
| 1569
| 1512
| 1444
| 1490
| 4
| 12457
|
1206
| 1129
| 1166
| 1183
| 1166
| 1275
| 1307
| 1131
| 3
| 9563
|
486
| 535
| 546
| 486
| 557
| 599
| 713
| 538
| 2
| 4460
|
79
| 135
| 204
| 26
| 124
| 124
| 207
| 258
| 1
| 1157
|
0
| 1
| 0
| 0
| 0
| 0
| 0
| 0
| 0
| 1
|
Byte 1
| Byte 2
| Byte 3
| Byte 4
| Byte 5
| Byte 6
| Byte 7
| Byte 8
|
|
|
If you have read the previous Sawtooth posts you are sick and tired of looking at the above text chart so I took the trouble of making some cool graphs below. But to understand the graphs you need to see the data and there it is.
In the orange column the descending numbers from 8 to 0 are the number of bits that changed. The Byte 1 through Byte 8 along the bottom show the bytes and the numbers above each show how many times each changed by 8 bits, 7 bits, 6 bits etc... As you can see from the data the numbers are bigger in the 4 Bits Diff row and get smaller as you go down to 0 Bits Diff and up toward 8 Bits Diff. This is not an even distribution centered around 4 bits as there are many more 8 bit differences than 0 bit differences. In fact there was only 1 byte that had 0 bit's different in the entire test. (Byte 2). This satisfies design goal (restated here):
"c) Ideally about half of the bits in each Output byte should change (4 bits of the 8 bits in each byte)"
and it almost satisfies design goal (restated here):
"a) If you change any one of the Input bytes by even 1 bit, all the Output bytes should change"
I say it almost satisfied design goal a) because Byte 2 didn't change that one time. 0 Bits Diff row with a value of 1 as mentioned above.
This is much better than all the last Sawtooth attempts and it's good enough by the following equation that has a divide by zero error: = (1 / FUN) When fun approaches and finally reaches zero I stop messing with this spreadsheet. Fun now = 0 so I'm calling a single 0 byte difference in 5568 different output differences good enough.
As promised to summarize the above confusing and boring text chart I have created a couple of full color graphs that are as exciting as they are colorful. (click on it for a bigger clearer view)
There are three graphs in the picture. The upper left one with the red bars is a graph of the red Totals column data from the text chart above. That graph is showing the totals for all the byte locations and the biggest red bar is in the middle representing that most of the byte changes were 4 bit/byte changes. As can be seen the chart is skewed slightly to the left where the higher bit/byte changes are.
The next graph on the top right is the remaining data from the above text chart. Similar to the first graph the number of bit changes is across the bottom but in this one each byte location is shown with a separate colored bar.
The last graph on the bottom of the picture is the actual output values, NOT the Exclusive Or differences. of the outputs as in the other two charts. That is the actual values that came out of the Sawtooth function as the "walking 1" changed the random numbers that were entered as inputs. In that case I ran only 2288 sample iterations to get the data. I did that because I wanted to be sure that the values that came out of the Sawtooth function spanned the full 0 to 255 possible output values for each byte location and they did. I also wanted to get an idea of how evenly distributed the output values would be.
That bottom chart is a histogram of each value with the frequency of occurrence in the Y axis and the byte values 0 to 255 in the X value. They are pretty evenly distributed with a numeric Average of the actual values = 127.764; Median of 128; Mode of 221 and a standard deviation of 74.192. Not too bad considering that this data was created with 1 bit differences of random numbers. So the initial numbers were random but each set of random numbers changes 64 times in a very linear way.
This leaves me with one last design goal (restated here):
"f) There should be (ideally) a 1 to 1 mapping from any input string of bytes to a unique output string of bytes. In other words two different inputs shouldn't produce the same output."
This is a tricky one and I have not thought of an easy way to check this one. I'm going to write some code to test the data that I have already generated to look for this not happening but I'm not sure there is an easy way to test to see if it will happen. As long as FUN not equal to zero I'll keep looking into that but for now I'll just think about it.
I think the next thing I am going to do with this is run some tests on keeping the input data fixed and just change the key values and see what happens. As I mentioned up to this point the key values were 0 to keep things simple. Knowing how the key values change the output with the input data held constant would be a good thing to know.
Below is the Exclusive Or code that I added in Visual Basic to get that function in Excel. Below that is the Visual Basic Code I wrote to generate all the data for everything I talked about here. The Exclusive Or code has pretty colors to help understand it. If you paste it into the Visual Basic Developer window in Excel the function =EXCLUSIVEOR( ) will be added to the list of things you can type into any cell. It will give you the Exclusive Or value of the two numbers (8 bits or less) that you reference. It works like that because it's a Function.
The other code doesn't have pretty colors because I have to add the color manually and I don't feel like doing it. If you copy and paste it into Excel it will recognize all the code words and color it for you, making it easier to follow. That code will run as a Macro because it is defined as a Sub and you will see it in the list of macro's with the name "Walk_Diff_Multiple1"
Email me or leave a comment in the blog and I'll detail how it works or clean it up with colors or something to make it easier to read.
Enjoy!
.........................................
Exclusive Or Code:
Function EXCLUSIVEOR(Byte1
As Integer, Byte2
As Integer)
'
' EXCLUSIVEOR (Byte1 , Byte2)
' XOR Exclusive Or of Two 8 bit Byte's
' November 2009 http://ottobelden.blogspot.com
'
EXCLUSIVEOR = Byte1 Xor Byte2
End Function
...........................................
Sub Walk_Diff_Multiple1()
'
' This code will get rnd numbers generated by Excel from one range in sheet
' then copy values only to three locations, Top Input To Change,
' Bottom Input To Change and where current output row is for Reference
' Once Rnd values have been copied, "walk a one" thru all Input to Change values
' copying output value differences to output rows and Top Raw Outputs
' then go back and do it again
' November 2009 http://ottobelden.blogspot.com
'
' Rnd Numbers range C31, D31, E31, F31, G31, H31, I31, J31
' Input to Change C9, D9, E9, F9, G9, H9, I9, J9
' Input Diff to Copy C2, D2, E2, F2, G2, H2, I2, J2
' Output Diff to Copy C5, D5, E5, F5, G5, H5, I5, J5
' Top Raw Output Valus C12, D12, E12, F12, G12, H12, I12, J12
'
Dim Multi As Long ' This is the walking 1 valut to Xor
Dim Orig As Long ' This will be the Origional value of cell before Xor
Dim Row As Long ' This keeps track of what row the output goes to
Dim Col As Long ' This keeps track of the column that Multi will change
Row = 73 ' Row to start pasting 73 is in unused rows range
For SampleSize = 1 To 42 ' Number of samples
Col = 10 ' Start in Column "J" 10th letter
Multi = 1 ' Value to start Xor'ing with
Orig = 0 ' Set to 0 in case I have to initialize variables
' Select Rnd values to copy
Range("C31:J31").Select
Selection.Copy
' Paste Rnd values for Reference
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Select Rnd values to copy from Reference because Rnds have changed
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.Copy
' Paste Rnd values to Top Input
Range(Cells(9, 3), Cells(9, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Select Rnd values to copy from Reference because Rnds have changed
Range(Cells(Row, 3), Cells(Row, 10)).Select
Selection.Copy
' Paste Rnd values to Bottom Input
Range(Cells(18, 3), Cells(18, 10)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
' Start walking a one thru the input to change values
For ColCount = 1 To 8
For Counter = 1 To 8
'Copy and Paste Input Values Only to Output range
Orig = Cells(9, Col) ' Set the current value of the cell to Orig
Cells(9, Col) = Cells(9, Col) Xor Multi ' Xor Cells(ROW=9, COL) with Multi
Range("C2:J2").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 12), Cells(Row, 19)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Copy and Paste Output Values Only to Output range
Range("C5:J5").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 21), Cells(Row, 28)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Copy and paste Raw output values
Range("C12:J12").Select ' Select Raw output range to copy
Selection.Copy
Range(Cells(Row, 52), Cells(Row, 59)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Row = Row + 1
Multi = Multi * 2
Cells(9, Col) = Orig ' Set the cell that just got Xor'd back to it's Orig value
Next Counter
Multi = 1 ' Reset Multi for next input byte
Cells(9, Col) = Orig ' Set current input cell to Col
Col = Col - 1 ' Move back one column
Next ColCount
Next SampleSize
End Sub