Random Number Generator data and analysis

Posted by Ea! on 10/10

I've done some testing for the randomness of our random number generator. This message is fairly technical in nature, if you get lost reading it, the bottom line is that our random number generator is more than adequate for the mud. There's three methods available for us to use in generating random numbers. The first, which we use, is the rand() function. It's got a lot of advantages -- it's fast, it's widely available (code portability is one of the long term goals of the coding department), and it's easy to use. The second method is to read numbers from /dev/random. /dev/random is a source of random numbers that exists under Linux (and a few other operating systems) that is quite good at generating really random numbers. It's main disadvantage is that it's slow. Really slow. Too slow for use with the mud, most likely. It's also not as widely available. The third method is to read random numbers from /dev/urandom -- a psuedo-random number generating source that's one step better than rand() and faster than /dev/random. I ran a file of random numbers generated by the three sources (the calls to rand() were done in a similar way to how the code uses it, the others were just generated directly) through a program that's designed to determine the "randomness" of the data. Since truely random numbers can show sequences, what it's really determining is the entropy in the data. First, rand(): >Entropy = 7.999977 bits per byte. > >Optimum compression would reduce the size >of this 7649861 byte file by 0 percent. > >Chi square distribution for 7649861 samples is 238.63, and randomly >would exceed this value 75.00 percent of the times. > >Arithmetic mean value of data bytes is 127.4981 (127.5 = random). >Monte Carlo value for Pi is 3.140184600 (error 0.04 percent). >Serial correlation coefficient is 0.000308 (totally uncorrelated = 0.0). And /dev/urandoms: >Entropy = 7.999976 bits per byte. > >Optimum compression would reduce the size >of this 7475200 byte file by 0 percent. > >Chi square distribution for 7475200 samples is 247.97, and randomly >would exceed this value 50.00 percent of the times. > >Arithmetic mean value of data bytes is 127.4452 (127.5 = random). >Monte Carlo value for Pi is 3.145153652 (error 0.11 percent). >Serial correlation coefficient is -0.000082 (totally uncorrelated = 0.0). And /dev/randoms (with a smaller file -- the larger the file, the more acurate the numbers are likely to be). >Entropy = 7.999884 bits per byte. > >Optimum compression would reduce the size >of this 1638925 byte file by 0 percent. > >Chi square distribution for 1638925 samples is 263.72, and randomly >would exceed this value 50.00 percent of the times. > >Arithmetic mean value of data bytes is 127.6365 (127.5 = random). >Monte Carlo value for Pi is 3.140514142 (error 0.03 percent). >Serial correlation coefficient is 0.001704 (totally uncorrelated = >0.0). The best of the tests is the Chi square distribution: 50% is "random", 0% or 100% is totally non-random. Anything between 10% and 90% is basically random. I'd say that rand() is good enough for what we're doing -- the extra speed is more helpful to the mud than the extra little bit of randomness that we would get by switching to either of the other methods. Unless we were doing cryptography that's sensitive to bad random numbers, I couldn't see any real advantage to switching. -Ea!

From: Mac Friday, October 06 2000, 06:10AM I did my own lil test, from about 50 pretyped scan/cointoss's to try duplicate the constant delay between random calls as you'd get in pk. I got runs of 5, 7 and 8, I don't know if that's unusual in the real world but I had noticed runs like this in the past, I would even deliberately wait for my skill delay to pass before executing the same command again if the previous attempt failed.

From: Danar Friday, October 06 2000, 03:08PM The runs you found are statistically unimportant, Mac.

From: Infidel Friday, October 06 2000, 03:27PM WTF? statistically unimportant? this isn't being done for fun but to locate a problem that's been around for ages, of course when you run a random # generator over time it'll even out but these runs are significant and as far as I know, hasn't been checked for.

From: Mac Friday, October 06 2000, 04:31PM http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm If you can't reproduce runs of 7 and 8 out within 50 or so attempts as I did, try another runs test at the same intervals as the MUDs skill lag. It sure don't seem right to me.

From: Ea! Friday, October 06 2000, 05:41PM The program I used to verify the randomness does, in fact, check for runs (runs are a lack of entropy). Just to give numbers, the data set I was looking at was about equivilent to slightly over 7 million coin tosses. Frankly, we pretty much directly call the standard random function in Linux. If there's a problem, I'd say that it's over 95% likely that it's a bug in the Linux libraries and not in the mud code. The test I ran are pretty convincing mathematically. -Ea!

From: Ea! Friday, October 06 2000, 05:54PM One other thing to mention -- the random number generator is called so frequently within the code that any runs that you're seeing aren't going to be proper runs -- for each dieroll, there are tons of calls in between. However, the way that the random number generator works, there's a chance that the random numbers would be in a sequence -- when the list of random numbers got used up, it would start over again. Now, the odds of this coinciding with ticks, pulses, or any other measurment within the code must be astronomically low. However, as there was a theoretical chance, I made an adjustment to the code to have it use a different set of random numbers when the random number list is used up. That's the change to the RNG that went in earlier today. It now never re-uses the same random seed. (That's a really simplistic and cut down description of some of the internals of rand(), quite probably not even the implementation used on modern computers, but since it was a chance, I figured it was worth changing.) -Ea!

From: Craven Friday, October 06 2000, 09:03PM well, I don't know if theres a problem with the generator or not, but from the few times I've helped test things on test muds or played really early in the monring etc, the fewer people on, the more things tend to be -normal-. So maybe its like you say, since its accessed so often by the time you roll your 2nd round your back to the same place you were last round and you get the same number. I don't know but it sure seems like the fewer people accessing it, the more random it is.

From: Mac Saturday, October 07 2000, 02:09AM Well, the thing I'm suspicious of is that the random function isn't based on previous calls but the computers clock alone meaning it wouldn't matter how many times its called inbetween and that if you call it at specific intervals the seed used may have certain characteristics that can produce similar results to those at the last or even recent intervals.

From: Ea! Saturday, October 07 2000, 04:51PM rand() is not based soley on the clock of the computer. -Ea!

From: Christopher Tuesday, October 10 2000, 04:34AM Frankly I wouldn't care if the random numbers were generated by Kaige picking a number from 1-10. I've not a problem with number gens. I've personally seen one of my characters knock down an opponent every headbutt and the next opponent (same mob type) makes me miss every headbutt