I have been interested in cryptography and the tricky math and such that is used in cryptography for a long time. The subject is interesting to me but I am not a cryptographer and I don't have the time and computing power to be one. Having said that I have decided to put some of my ideas and screwball calculations on this blog. I have looked around the web and have seen other peoples blog's where they have put up their own ideas and I have read all the nasty comments people leave on their sites. To try and mitigate some of that let me say that I am not offering anything that I post here as an actual algorithm and I'm not trying to necessarily create an actual working cryptographic function. I do this as a hobby, pastime and just for me to play around with and learn about programming, math and fun!
OK having said all that let me get to the point of this post. In reading about different cryptographic functions one thing that seems to come up is 'mixing functions' and/or 'avalanche effect'. Similar concepts were brought up in the 'Wide Trail Design Strategy' with branch numbers and all that. While thinking about this I thought it might be fun to mess around with something that is mathematically simple and see what the results turn out ot be. I'm not a math wizard and I don't pretend to understand all the concepts so this is just a guess on a Saturday afternoon.
Again this is not an actual algorithm but might be useful as a component to something more complicated. The idea is to start with a block of numbers and mess with them to see what happens. I picked 8 bytes as the block size to use because that is a common block size (64 bits) that is used in various 'real' crypto-systems. So start with 8 Input bytes, do something to them, get 8 new Output bytes! (yes this is a boring Saturday). The 'mess with them' part is the interesting thing and I decided to set some Design Goals for what the 'mess with them' part should do:
a) If you change any one of the Input bytes by even 1 bit, all the Output bytes should change
b) In addition to all the Output bytes changing about half of the 64 Output bits should change
c) In addition to that ideally about half of the bits in each Output byte should change (4 bits of the 8 bits in each byte)
d) If all the Input bytes are 0's then the Output bytes should not all be 0's
e) Keep it really simple because I want to do all the tricky stuff in Microsoft Excel
The reasoning behind this is that if the input to a cipher changes just a little bit (like by 1 bit) the Output should change a lot and the change should be spread out as evenly as possibly throughout the entire output. As I have mentioned a few times what I am describing here is not a cipher by itself and isn't supposed to be. So here is my first go around with the above stated goals.
Now it's time for a picture! This is a screen shot from Excel. As I mentioned I wanted to use Excel because it's easy to change stuff and get statistical info without having to write a bunch of code and compile it.
So what is going on in the picture you ask? The blue row at the top is the bytes of input data in decimal and the purple row at the bottom is the output data in decimal. The blue arrows show the data being added together with a 1 added as well. The addition is being done mod 257 because 257 is a prime number and prime numbers are cool! So what is happening is the numbers are being added together from left to right as the blue arrows indicate. When I got to the right side (where the number 14 is) I copied the number 14 down (green arrow) and then added mod 275 back from right to left. When I got to the left side where the number 73 is I copied the 73 down and then continued adding left to right. The arrows look like a Sawtooth waveform so I called this 'Sawtooth' just to have a cool sounding name for it. The last step is to take the last row of numbers and convert them to values that are 8 bits long by taking the number mod 256. More on that below.
So a quick summary and explanation. I added the numbers from left to right then back from right to left so that if any one number changes is changes all the numbers that come after it. See 'a' above in the design goals. I decided to go back and forth from left to right 3 times because by experimentation any less than that doesn't cause a change to 'trickle down' to the output numbers and any more times doesn't increase the 'trickle down'. 3 times back and forth seems to be enough for a change in one number to change all of the later numbers nicely. I decided to have each addition include a +1 in it just in case all the input numbers are 0's. With the +1 in there the all 0's case doesn't result in all 0's on the output. See 'd' above in the design goals. Lastly I used a modulus of 257 not only because it's a cool prime but because if i used 256 (the obvious choice) then an input of all 128's causes the output to be all 128's. Even with the +1 in there an input of 128, 127, 126 etc.. causes the output to be all 128's. And now the last of the lastly in the explanation is the conversion with the mod 256. I had to do this because if I didn't the numbers would be in the range from 0 to 256 and 256 is bigger than 8 bits. The conversion with the mod 256 makes all the numbers 0 to 255.
OK now it's time once again for another picture!!! Once I got the Excel thing figured out and working I copied it so that I have two identical copies. This is important so that I can see how changes in the inputs differ or in other words how differences in the inputs compare to each other.
In the picture you can see both copies of the identical regions of the spreadsheet. Above those there are two boxes that are labeled with I' * I" and O' * O" where the * is the Exclusive Or symbol circle with a + in it. The I' * I" is the binary Exclusive Or of the two blue input parts of the spreadsheet and the O' * O" is the binary Exclusive Or of the two purple outputs. The numbers under the 1's and 0's is the number of bits that are different which is the number of 1's after the Exclusive Or. To get the Exclusive Or to work in Excel I had to create a custom function in Microsoft Visual Basic for Excel like this:
Function EXCLUSIVEOR(Byte1 As Integer, Byte2 As Integer)
'
' EXCLUSIVEOR (Byte1 , Byte2)
' XOR Exclusive Or of Two 8 bit Byte's
EXCLUSIVEOR = Byte1 Xor Byte2
End Function
'
' EXCLUSIVEOR (Byte1 , Byte2)
' XOR Exclusive Or of Two 8 bit Byte's
EXCLUSIVEOR = Byte1 Xor Byte2
End Function
Once I had that EXCLUSIVEOR function created I could type in a cell in the spreadsheet:
=DEC2BIN(EXCLUSIVEOR(E16,E25),8)
and I would get in the example above the binary Exclusive Or of cells E16 and E25 in an 8 bit binary value.
To figure out how many 1's that there are in a cell I used the command string:
=LEN(E2)-LEN(SUBSTITUTE(E2,"1",""))
and that will count the number of 1's and return how many it finds. I don't know exactly why it does that but it does and that is all I care about. Boy this is fun!!!! This, by the way I believe is called the Hamming Weight or Hamming Distance. This is named after a guy named Richard Hamming - too bad Francis Bacon didn't invent it.
So you can see in the above picture that in the top region the 4th byte from the left is a 6 and in the 4th byte from the left on the bottom region it's 0. The difference in binary between the two is shown above in the I'*I" row as 00000110 which is two 1's. In the bottom O'*O" row you can see the Exclusive Or difference between the outputs and the number of 1's in each. So in other words with an Input Difference of 00000110 in the fourth byte from the left the Output difference is shown in the O'*O" row.
At this point I can type any value I want into either of the blue input rows and see what the Input Difference is and see what the corresponding Output differences are. I can also see how many bits in each Input and Output byte are different. This is all getting to the answers to 'a through c' in the Design Goals above.
OK now that I can change the Inputs, look at the outputs and see the differences it's time to change all the input bits one at a time and see how the outputs change. One way to do this would be to change each bit by hand in the correct Input cell and copy and paste the I*I and O*O parts to someplace else on the spreadsheet for later review. But that wouldn't be too much fun so what I did was write another function in Visual Basic for Excel to do it for me. Here is the function:
Sub Walk1Key2()
'
' Set up a loop then enter values in Input range
' After setting values in Input range, paste values to Output range from Formula range
' Go back and change Input range and paste new Output values into NEXT row of Output range
' http://ottobelden.blogspot.com/
' Input to Change C9, D9, E9, F9, G9, H9, I9, J9
' Input Diff to Copy C2, D2, E2, F2, G2, H2, I2, J2
' Output Diff to Copy C5, D5, E5, F5, G5, H5, I5, J5
'
' Set up a loop then enter values in Input range
' After setting values in Input range, paste values to Output range from Formula range
' Go back and change Input range and paste new Output values into NEXT row of Output range
' http://ottobelden.blogspot.com/
' Input to Change C9, D9, E9, F9, G9, H9, I9, J9
' Input Diff to Copy C2, D2, E2, F2, G2, H2, I2, J2
' Output Diff to Copy C5, D5, E5, F5, G5, H5, I5, J5
Dim Multi As Long ' This is the walking 1 var
Dim Row As Long ' This keeps track of what row the output goes to
Dim Col As Long ' This keeps track of the column that Multi will change
Col = 10 ' Start in Column "J" 10th letter
Multi = 1
Row = 9
Dim Row As Long ' This keeps track of what row the output goes to
Dim Col As Long ' This keeps track of the column that Multi will change
Col = 10 ' Start in Column "J" 10th letter
Multi = 1
Row = 9
For ColCount = 1 To 8
For Counter = 1 To 8
'Copy and Paste Input Values Only to Output range
Cells(9, Col) = Multi ' Set Input cell to multi Cells(ROW=9, COL)
Range("C2:J2").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 12), Cells(Row, 19)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
For Counter = 1 To 8
'Copy and Paste Input Values Only to Output range
Cells(9, Col) = Multi ' Set Input cell to multi Cells(ROW=9, COL)
Range("C2:J2").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 12), Cells(Row, 19)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'Copy and Paste Output Values Only to Output range
Range("C5:J5").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 21), Cells(Row, 28)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Row = Row + 1
Multi = Multi * 2
Range("C5:J5").Select ' Select Input Diff range to copy
Selection.Copy
Range(Cells(Row, 21), Cells(Row, 28)).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Row = Row + 1
Multi = Multi * 2
Next Counter
Multi = 1 ' Reset Multi for next input byte
Cells(9, Col) = 0 ' Set current input cell to 0
Col = Col - 1 ' Move back one column
Next ColCount
End Sub
Multi = 1 ' Reset Multi for next input byte
Cells(9, Col) = 0 ' Set current input cell to 0
Col = Col - 1 ' Move back one column
Next ColCount
End Sub
Boy oh boy talk about cryptic cryptography!?!? Figuring out that piece of code and getting it to work right took most of the afternoon!! Anyway what it does is a 'walking 1' in binary across all the blue input bytes of Row 9 in the pictures. 'Walking a 1' is nothing more that setting values of 1, 2, 4, 8, 16, 32, 64 and 128 as the Input byte in each byte of the blue input row. After changing the value in the Input byte the code copies and pastes the I*I and the O*O values to the right in the spreadsheet.
After all that it's time once again for yet another picture!!!!! Click on it for a bigger view - it's worth it!!!!
In the picture above you can see the familiar copied spreadsheet regions with the infamous I*I and O*O above them. To the right and going down the sheet there is a big region of binary numbers under the heading I*I and you can see the "walking 1" pattern in it. To the right of that and under the O*O heading is the output differences that correspond to the input differences. That is just a portion of the data, it went on for a total of 64 rows to get all the input differences.
Now that I had this interesting spreadsheet with all these numbers on it I could go address the Design Goals set forth above. Here is what I found:
1) Did every output byte change in every case? Every output byte did in fact change with every input difference change of 1 bit. So at least in this test a bit difference of 1 in every possible input bit location caused every byte of the output to change. No byte was left unchanged!!!
2) Did approx half of the 64 output bits change in each case? Ideally exactly half of the 64 total output bits would change with a input difference of just 1 bit. The minimum number of bits that changed was 25 bits or 39.063%. The maximum number of bits that changed was 38 or 59.375% . The mathematical average was 48.438%. with 11 instances being exactly 32 bits or 50%. Not too bad but not exactly half of them in every case. As mentioned above the bit changes were spread out over all 8 output bytes in every case.
3) Did on average 4 bits in each byte of the output change with each 1 bit input difference? This is trickier because of the statistics above not every byte had exactly 4 bits change. To simplify this I did a histogram of each number so:
# bits # times
8 1
7 15
6 54
5 92
4 138
3 138
2 47
1 27
0 0
So from the above chart there was only 1 time that a byte had all 8 bits change. There were 15 times a byte had 7 bits change and so on. As can be seen the majority of bytes that changed had 2, 3, 4 and 5 bits change.
What does all this mean??? This means without a doubt that I have spent an entire afternoon messing around with numbers and Excel!!! Yeah! Big Blog High 5!!!! As far as 'Sawtooth' being useful as an element in a cryptographic algorithm I think I'm leaning toward a qualified yes. As a small part of something bigger it might be a useful method of creating an avalanche effect in input data and diffuse small input changes over larger chucks of data. I will mess around with it some more to get a better idea of it's characteristics and see how it works in different input ranges. Things I am considering investigating:
A) Does every possible output state exist for every possible input state? Or are there some output values that it won't reach no matter what the input is making some different inputs have the same output? In other words is it a mathematical group?
B) Figure out exactly how much avalanche this created over a broad range of input differences. I'm not sure I need to do this one as the 'walking 1' should be enough but I need to prove that to myself.