Monday 18 January 2010

One Small Leap

First, I would like to explain the concept of the leap second. At first the definition of a second was 1/86400 of a solar day. This was not a good enough standard, so the definition was later changed to the somewhat less comprehensible length of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium-133 atom. This gave some new problems since the solar day became approximately 1.7ms longer every century, the reason being that the earth's rotation is slowing down. To counter this sometimes an extra second is added June 30 or December 31. This added second is called a leap second. Worth mentioning is that the decreasing rotation varies based on unpredictable factors like the motion of mass in the center of the earth, which means we have no formula to calculate when the next leap second needs to be added. Instead the leap second is added when the difference between solar day time and the measured time gets too large (0.9).

This gives us some strange effects, especially when trying to calculate the difference between two points in time, where one or more leap seconds have been inserted. Imagine that a leap second was added on December 31 at 11.59.60p.m. The last minute of the year is made one second longer. This means that if were to count the seconds between 8 a.m. December 31 and 8 a.m January 1, we would get a count of 86401. If we were to count the hours instead though, the result would be 24 which, if we translate it into seconds, is 86400 and we just lost a second in the process. (This makes for some strange effects when dealing with date differences in computer systems, but that is not the topic here).

How can this be? Well, it all comes down to the difference between elapsed time and our daily measured time. The reason for the added second is to get our clocks synchronized with the solar time. The elapsed time would be 86401s = 1440m1s = 24h0m1s = 1d0h0m1s, and our clocks are calibrated so that we remove one second from the measurement of time. So what we actually do with the leap second is not to add a second, but instead hide it, delaying the coming of the next day.

We sweep one second under the rug, a little ashamed that we cannot cope with it any other way. Life goes on but the seconds we hide away will always be there in the past as a testimony that not everything can be exact.

Tuesday 12 January 2010

Pseudo-Uniqueness

Pseudo-Uniqueness is a topic that came up during a project I took part in last year. Even though I already brought it up on that project's blog, it is a subject that I think would fit well into the spirit of the Reflection 42 blog, so here it is. Enjoy!

In the project, we were faced with the need to dynamically create unique user identifiers that did not follow a naming pattern (i.e. they needed to be very hard to guess).

So what did we do? Well, I wrote a utility class that creates hashes from input strings, then at user creation we passed the user's data to that class which returned an identifier. Using the MD5 algorithm for hashing, we would get a 32-digit hexadecimal number which would be virtually unique. Now I can hear some of you think "Virtually unique? How can you be sure it will not collide with an already existing ID?". Well, that is a good question and the short answer is that I cannot!

If you are interested, the long answer now follows.

If I generate a 32-digit hexadecimal number, it consists of 128 bits. 128 bits can be set in 2^128 different combinations, which means that the chance of getting the same hash for another user is 1 in 340282366920938463463374607431768211456 (39 digits). Lets say we have an unrealistically busy site, generating a million new users each day. Then after a thousand years we will have generated less than 4*10^11 different hashes. This means that we have a chance of 4*10^11 in 2^128 of calculating an already existing hash at this point with a probabiltiy in the magnitude of 10^-28.

Maybe our site is really successful and continues until no life can exist on earth (8Gyears from now). Well I am not going to bore you with the calculations for this one, but at that point we would have generated 3.2*10^21 different hashes. This means we would have the overwhelming probability of below 10^-18 of hitting one of the hashes we already used.

This is why I say virtually unique. The chance of getting the same ID is so small that it will probably never happen, and if it does the probability is so small that it probably did not. So I now leave it up to you to decide if this is unique enough for you!

By the way, I went with an SHA algorithm generating 160 bits instead, but that's just me!