alvinalexander.com | career | drupal | java | mac | mysql | perl | scala | uml | unix  
<tr BGCOLOR="#EEEEFF"> <tr> <tr> <tr> <tr> <tr> <tr> <tr> <tr> </table> </p> <p> So for most simulation problems, the better generators like <a href="../apidocs/org/apache/commons/math3/random/Well19937c.html">Well19937c</a> and are probably very good choices. </p> <p> Note that <em>none of these generators are suitable for cryptography. They are devoted to simulation, and to generate very long series with strong properties on the series as a whole (equidistribution, no correlation ...). They do not attempt to create small series but with very strong properties of unpredictability as needed in cryptography. </p> <p> Examples: <dl> <dt>Create a RandomGenerator based on RngPack's Mersenne Twister <dd>To create a RandomGenerator using the RngPack Mersenne Twister PRNG as the source of randomness, extend <code>AbstractRandomGenerator overriding the derived methods that the RngPack implementation provides: <source> import edu.cornell.lassp.houle.RngPack.RanMT; /** * AbstractRandomGenerator based on RngPack RanMT generator. */ public class RngPackGenerator extends AbstractRandomGenerator { private RanMT random = new RanMT(); public void setSeed(long seed) { random = new RanMT(seed); } public double nextDouble() { return random.raw(); } public double nextGaussian() { return random.gaussian(); } public int nextInt(int n) { return random.choose(n); } public boolean nextBoolean() { return random.coin(); } } </source> </dd> <dt>Use the Mersenne Twister RandomGenerator in place of <code>java.util.Random in RandomData <dd> <source> RandomData randomData = new RandomDataImpl(new RngPackGenerator()); </source> </dd> <dt>Create an adaptor instance based on the Mersenne Twister generator that can be used in place of a <code>Random <dd> <source> RandomGenerator generator = new RngPackGenerator(); Random random = RandomAdaptor.createAdaptor(generator); // random can now be used in place of a Random instance, data generation // calls will be delegated to the wrapped Mersenne Twister </source> </dd> </dl> </p> </subsection> </section> </body> </document>

Other Java examples (source code examples)

Here is a short list of links related to this Java random.xml source code file:

Java example source code file (random.xml)

This example Java source code file (random.xml) is included in the alvinalexander.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" TM.

Learn more about this Java project at its project page.

Java - Java tags/keywords

for, jdk, license, mersenne, mersennetwister, monte-carlo, prng, randomdatagenerator, randomgenerator, the, this, well, well19937c

The random.xml Java example source code

<?xml version="1.0"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
  -->
  
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<document url="random.html">

<properties>
    <title>The Commons Math User Guide - Data Generation
</properties>

<body>

<section name="2 Data Generation">

<subsection name="2.1 Overview" href="overview">
    <p>
    The Commons Math random package includes utilities for
    <ul>
        <li>generating random numbers
        <li>generating random vectors
        <li>generating random strings
        <li>generating cryptographically secure sequences of random numbers or
         strings</li>
        <li>generating random samples and permutations
        <li>analyzing distributions of values in an input file and generating
         values "like" the values in the file</li>
        <li>generating data for grouped frequency distributions or
         histograms</li>
    </ul>

<p> The source of random data used by the data generation utilities is pluggable. By default, the JDK-supplied PseudoRandom Number Generator (PRNG) is used, but alternative generators can be "plugged in" using an adaptor framework, which provides a generic facility for replacing <code>java.util.Random with an alternative PRNG. Other very good PRNG suitable for Monte-Carlo analysis (but <strong>not for cryptography) provided by the library are the Mersenne twister from Makoto Matsumoto and Takuji Nishimura and the more recent WELL generators (Well Equidistributed Long-period Linear) from François Panneton, Pierre L'Ecuyer and Makoto Matsumoto. </p> <p> Sections 2.2-2.6 below show how to use the commons math API to generate different kinds of random data. The examples all use the default JDK-supplied PRNG. PRNG pluggability is covered in 2.7. The only modification required to the examples to use alternative PRNGs is to replace the argumentless constructor calls with invocations including a <code>RandomGenerator instance as a parameter. </p> </subsection> <subsection name="2.2 Random numbers" href="deviates"> <p> The <a href="../apidocs/org/apache/commons/math3/random/RandomDataGenerator.html"> RandomDataGenerator</a> class implements methods for generating random sequences of numbers. The API contracts of these methods use the following concepts: <dl> <dt>Random sequence of numbers from a probability distribution <dd>There is no such thing as a single "random number." What can be generated are <i>sequences of numbers that appear to be random. When using the built-in JDK function <code>Math.random(), sequences of values generated follow the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm"> Uniform Distribution</a>, which means that the values are evenly spread over the interval between 0 and 1, with no sub-interval having a greater probability of containing generated values than any other interval of the same length. The mathematical concept of a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm"> probability distribution</a> basically amounts to asserting that different ranges in the set of possible values of a random variable have different probabilities of containing the value. Commons Math supports generating random sequences from each of the distributions in the <a href="../apidocs/org/apache/commons/math3/distribution/package-summary.html"> distributions</a> package. The javadoc for the <code>nextXxx methods in <a href="../apidocs/org/apache/commons/math3/random/RandomDataGenerator.html"> RandomDataGenerator</a> describes the algorithms used to generate random deviates. </dd> <dt>Cryptographically secure random sequences <dd>It is possible for a sequence of numbers to appear random, but nonetheless to be predictable based on the algorithm used to generate the sequence. If in addition to randomness, strong unpredictability is required, it is best to use a <a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator"> secure random number generator</a> to generate values (or strings). The nextSecureXxx methods implemented by <code>RandomDataGenerator use the JDK <code>SecureRandom PRNG to generate cryptographically secure sequences. The <code>setSecureAlgorithm method allows you to change the underlying PRNG. These methods are <strong>much slower than the corresponding "non-secure" versions, so they should only be used when cryptographic security is required. The <code>ISAACRandom class implements a fast cryptographically secure pseudorandom number generator.</dd> <dt>Seeding pseudo-random number generators <dd>By default, the implementation provided in RandomDataGenerator uses the JDK-provided PRNG. Like most other PRNGs, the JDK generator generates sequences of random numbers based on an initial "seed value." For the non-secure methods, starting with the same seed always produces the same sequence of values. Secure sequences started with the same seeds will diverge. When a new <code>RandomDataGenerator is created, the underlying random number generators are <strong>not initialized. The first call to a data generation method, or to a <code>reSeed() method initializes the appropriate generator. If you do not explicitly seed the generator, it is by default seeded with the current time in milliseconds. Therefore, to generate sequences of random data values, you should always instantiate <strong>one RandomDataGenerator and use it repeatedly instead of creating new instances for subsequent values in the sequence. For example, the following will generate a random sequence of 50 long integers between 1 and 1,000,000, using the current time in milliseconds as the seed for the JDK PRNG: <source> RandomDataGenerator randomData = new RandomDataGenerator(); for (int i = 0; i < 1000; i++) { value = randomData.nextLong(1, 1000000); } </source> The following will not in general produce a good random sequence, since the PRNG is reseeded each time through the loop with the current time in milliseconds: <source> for (int i = 0; i < 1000; i++) { RandomDataGenerator randomData = new RandomDataGenerator(); value = randomData.nextLong(1, 1000000); } </source> The following will produce the same random sequence each time it is executed: <source> RandomDataGenerator randomData = new RandomDataGenerator(); randomData.reSeed(1000); for (int i = 0; i = 1000; i++) { value = randomData.nextLong(1, 1000000); } </source> The following will produce a different random sequence each time it is executed. <source> RandomDataGenerator randomData = new RandomDataGenerator(); randomData.reSeedSecure(1000); for (int i = 0; i < 1000; i++) { value = randomData.nextSecureLong(1, 1000000); } </source> </dd> </p> </subsection> <subsection name="2.3 Random Vectors" href="vectors"> <p> Some algorithms require random vectors instead of random scalars. When the components of these vectors are uncorrelated, they may be generated simply one at a time and packed together in the vector. The <a href="../apidocs/org/apache/commons/math3/random/UncorrelatedRandomVectorGenerator.html"> UncorrelatedRandomVectorGenerator</a> class simplifies this process by setting the mean and deviation of each component once and generating complete vectors. When the components are correlated however, generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math3/random/CorrelatedRandomVectorGenerator.html"> CorrelatedRandomVectorGenerator</a> class provides this service. In this case, the user must set up a complete covariance matrix instead of a simple standard deviations vector. This matrix gathers both the variance and the correlation information of the probability law. </p> <p> The main use for correlated random vector generation is for Monte-Carlo simulation of physical problems with several variables, for example to generate error vectors to be added to a nominal vector. A particularly common case is when the generated vector should be drawn from a <a href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution"> Multivariate Normal Distribution</a>. </p> <p>
<dt>Generating random vectors from a bivariate normal distribution
<source> // Create and seed a RandomGenerator (could use any of the generators in the random package here) RandomGenerator rg = new JDKRandomGenerator(); rg.setSeed(17399225432l); // Fixed seed means same results every time // Create a GassianRandomGenerator using rg as its source of randomness GaussianRandomGenerator rawGenerator = new GaussianRandomGenerator(rg); // Create a CorrelatedRandomVectorGenerator using rawGenerator for the components CorrelatedRandomVectorGenerator generator = new CorrelatedRandomVectorGenerator(mean, covariance, 1.0e-12 * covariance.getNorm(), rawGenerator); // Use the generator to generate correlated vectors double[] randomVector = generator.nextVector(); ... </source> The <code>mean argument is a double[] array holding the means of the random vector components. In the bivariate case, it must have length 2. The <code>covariance argument is a RealMatrix, which needs to be 2 x 2. The main diagonal elements are the variances of the vector components and the off-diagonal elements are the covariances. For example, if the means are 1 and 2 respectively, and the desired standard deviations are 3 and 4, respectively, then we need to use <source> double[] mean = {1, 2}; double[][] cov = {{9, c}, {c, 16}}; RealMatrix covariance = MatrixUtils.createRealMatrix(cov); </source> where c is the desired covariance. If you are starting with a desired correlation, you need to translate this to a covariance by multiplying it by the product of the standard deviations. For example, if you want to generate data that will give Pearson's R of 0.5, you would use c = 3 * 4 * .5 = 6. </dd>

<p> In addition to multivariate normal distributions, correlated vectors from multivariate uniform distributions can be generated by creating a <a href="../apidocs/org/apache/commons/math3/random/UniformRandomGenerator.html">UniformRandomGenerator in place of the <code>GaussianRandomGenerator above. More generally, any <a href="../apidocs/org/apache/commons/math3/random/NormalizedRandomGenerator.html">NormalizedRandomGenerator may be used. </p> <p>
<dt>Low discrepancy sequences <dd> There exist several quasi-random sequences with the property that for all values of N, the subsequence x<sub>1, ..., xN has low discrepancy, which results in equi-distributed samples. While their quasi-randomness makes them unsuitable for most applications (i.e. the sequence of values is completely deterministic), their unique properties give them an important advantage for quasi-Monte Carlo simulations.<br/> Currently, the following low-discrepancy sequences are supported: <ul> <li>Sobol sequence (pre-configured up to dimension 1000) <li>Halton sequence (pre-configured up to dimension 40) </ul> <source> // Create a Sobol sequence generator for 2-dimensional vectors RandomVectorGenerator generator = new SobolSequence(2); // Use the generator to generate vectors double[] randomVector = generator.nextVector(); ... </source> The figure below illustrates the unique properties of low-discrepancy sequences when generating N samples in the interval [0, 1]. Roughly speaking, such sequences "fill" the respective space more evenly which leads to faster convergence in quasi-Monte Carlo simulations.<br/> <img src="../images/userguide/low_discrepancy_sequences.png" alt="Comparison of low-discrepancy sequences"/> </dd>
</p> <p>

</subsection> <subsection name="2.4 Random Strings" href="strings"> <p> The methods <code>nextHexString and nextSecureHexString can be used to generate random strings of hexadecimal characters. Both of these methods produce sequences of strings with good dispersion properties. The difference between the two methods is that the second is cryptographically secure. Specifically, the implementation of <code>nextHexString(n) in RandomDataGenerator uses the following simple algorithm to generate a string of <code>n hex digits: <ol> <li>n/2+1 binary bytes are generated using the underlying RandomGenerator <li>Each binary byte is translated into 2 hex digits The <code>RandomDataGenerator implementation of the "secure" version, <code>nextSecureHexString generates hex characters in 40-byte "chunks" using a 3-step process: <ol> <li>20 random bytes are generated using the underlying <code>SecureRandom. <li>SHA-1 hash is applied to yield a 20-byte binary digest. <li>Each byte of the binary digest is converted to 2 hex digits Similarly to the secure random number generation methods, <code>nextSecureHexString is much slower than the non-secure version. It should be used only for applications such as generating unique session or transaction ids where predictability of subsequent ids based on observation of previous values is a security concern. If all that is needed is an even distribution of hex characters in the generated strings, the non-secure method should be used. </p> </subsection> <subsection name="2.5 Random permutations, combinations, sampling" href="combinatorics"> <p> To select a random sample of objects in a collection, you can use the <code>nextSample method implemented by RandomDataGenerator. Specifically, if <code>c is a collection containing at least <code>k objects, and randomData is a <code>RandomDataGenerator instance randomData.nextSample(c, k) will return an <code>object[] array of length k consisting of elements randomly selected from the collection. If <code>c contains duplicate references, there may be duplicate references in the returned array; otherwise returned elements will be unique -- i.e., the sampling is without replacement among the object references in the collection. </p> <p> If <code>randomData is a RandomDataGenerator instance, and <code>n and k are integers with <code> k <= n, then <code>randomData.nextPermutation(n, k) returns an int[] array of length <code>k whose whose entries are selected randomly, without repetition, from the integers <code>0 through <code>n-1 (inclusive), i.e., <code>randomData.nextPermutation(n, k) returns a random permutation of <code>n taken k at a time. </p> </subsection> <subsection name="2.6 Generating data 'like' an input file" href="empirical"> <p> Using the <code>ValueServer class, you can generate data based on the values in an input file in one of two ways: <dl> <dt>Replay Mode <dd> The following code will read data from url (a <code>java.net.URL instance), cycling through the values in the file in sequence, reopening and starting at the beginning again when all values have been read. <source> ValueServer vs = new ValueServer(); vs.setValuesFileURL(url); vs.setMode(ValueServer.REPLAY_MODE); vs.resetReplayFile(); double value = vs.getNext(); // ...Generate and use more values... vs.closeReplayFile(); </source> The values in the file are not stored in memory, so it does not matter how large the file is, but you do need to explicitly close the file as above. The expected file format is \n -delimited (i.e. one per line) strings representing valid floating point numbers. </dd> <dt>Digest Mode <dd>When used in Digest Mode, the ValueServer reads the entire input file and estimates a probability density function based on data from the file. The estimation method is essentially the <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html"> Variable Kernel Method</a> with Gaussian smoothing. Once the density has been estimated, <code>getNext() returns random values whose probability distribution matches the empirical distribution -- i.e., if you generate a large number of such values, their distribution should "look like" the distribution of the values in the input file. The values are not stored in memory in this case either, so there is no limit to the size of the input file. Here is an example: <source> ValueServer vs = new ValueServer(); vs.setValuesFileURL(url); vs.setMode(ValueServer.DIGEST_MODE); vs.computeDistribution(500); //Read file and estimate distribution using 500 bins double value = vs.getNext(); // ...Generate and use more values... </source> See the javadoc for <code>ValueServer and <code>EmpiricalDistribution for more details. Note that <code>computeDistribution() opens and closes the input file by itself. </dd> </dl> </p> </subsection> <subsection name="2.7 PRNG Pluggability" href="pluggability"> <p> To enable alternative PRNGs to be "plugged in" to the commons-math data generation utilities and to provide a generic means to replace <code>java.util.Random in applications, a random generator adaptor framework has been added to commons-math. The <a href="../apidocs/org/apache/commons/math3/random/RandomGenerator.html"> RandomGenerator</a> interface abstracts the public interface of <code>java.util.Random and any implementation of this interface can be used as the source of random data for the commons-math data generation classes. An abstract base class, <a href="../apidocs/org/apache/commons/math3/random/AbstractRandomGenerator.html"> AbstractRandomGenerator</a> is provided to make implementation easier. This class provides default implementations of "derived" data generation methods based on the primitive, <code>nextDouble(). To support generic replacement of <code>java.util.Random, the <a href="../apidocs/org/apache/commons/math3/random/RandomAdaptor.html"> RandomAdaptor</a> class is provided, which extends <code>java.util.Random and wraps and delegates calls to a <code>RandomGenerator instance. </p> <p>Commons-math provides by itself several implementations of the JDKRandomGenerator that extends the JDK provided generator</li> <li> AbstractRandomGenerator</a> as a helper for users generators <li> BitStreamGenerator</a> which is an abstract class for several generators and which in turn is extended by: <ul> <li>MersenneTwister <li>Well512a <li>Well1024a <li>Well19937a <li>Well19937c <li>Well44497a <li>Well44497b </ul> </li> </ul> </p> <p> The JDK provided generator is a simple one that can be used only for very simple needs. The Mersenne Twister is a fast generator with very good properties well suited for Monte-Carlo simulation. It is equidistributed for generating vectors up to dimension 623 and has a huge period: 2<sup>19937 - 1 (which is a Mersenne prime). This generator is described in a paper by Makoto Matsumoto and Takuji Nishimura in 1998: <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.pdf">Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator</a>, ACM Transactions on Modeling and Computer Simulation, Vol. 8, No. 1, January 1998, pp 3--30. The WELL generators are a family of generators with period ranging from 2<sup>512 - 1 to 2<sup>44497 - 1 (this last one is also a Mersenne prime) with even better properties than Mersenne Twister. These generators are described in a paper by François Panneton, Pierre L'Ecuyer and Makoto Matsumoto <a href="http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf">Improved Long-Period Generators Based on Linear Recurrences Modulo 2</a> ACM Transactions on Mathematical Software, 32, 1 (2006). The errata for the paper are in <a href="http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng-errata.txt">wellrng-errata.txt</a>. </p> <p> For simple sampling, any of these generators is sufficient. For Monte-Carlo simulations the JDK generator does not have any of the good mathematical properties of the other generators, so it should be avoided. The Mersenne twister and WELL generators have equidistribution properties proven according to their bits pool size which is directly linked to their period (all of them have maximal period, i.e. a generator with size n pool has a period 2<sup>n-1). They also have equidistribution properties for 32 bits blocks up to s/32 dimension where s is their pool size. So WELL19937c for exemple is equidistributed up to dimension 623 (19937/32). This means a Monte-Carlo simulation generating a vector of n variables at each iteration has some guarantees on the properties of the vector as long as its dimension does not exceed the limit. However, since we use bits from two successive 32 bits generated integers to create one double, this limit is smaller when the variables are of type double. so for Monte-Carlo simulation where less the 16 doubles are generated at each round, WELL1024 may be sufficient. If a larger number of doubles are needed a generator with a larger pool would be useful. </p> <p> The WELL generators are more modern then MersenneTwister (the paper describing than has been published in 2006 instead of 1998) and fix some of its (few) drawbacks. If initialization array contains many zero bits, MersenneTwister may take a very long time (several hundreds of thousands of iterations to reach a steady state with a balanced number of zero and one in its bits pool). So the WELL generators are better to <i>escape zeroland as explained by the WELL generators creators. The Well19937a and Well44497a generator are not maximally equidistributed (i.e. there are some dimensions or bits blocks size for which they are not equidistributed). The Well512a, Well1024a, Well19937c and Well44497b are maximally equidistributed for blocks size up to 32 bits (they should behave correctly also for double based on more than 32 bits blocks, but equidistribution is not proven at these blocks sizes). </p> <p> The MersenneTwister generator uses a 624 elements integer array, so it consumes less than 2.5 kilobytes. The WELL generators use 6 integer arrays with a size equal to the pool size, so for example the WELL44497b generator uses about 33 kilobytes. This may be important if a very large number of generator instances were used at the same time. </p> <p> All generators are quite fast. As an example, here are some comparisons, obtained on a 64 bits JVM on a linux computer with a 2008 processor (AMD phenom Quad 9550 at 2.2 GHz). The generation rate for MersenneTwister was between 25 and 27 millions doubles per second (remember we generate two 32 bits integers for each double). Generation rates for other PRNG, relative to MersenneTwister: </p> <p> <table border="1" align="center"> <tr BGCOLOR="#CCCCFF">
Example of performances
Namegeneration rate (relative to MersenneTwister)
MersenneTwister1
JDKRandomGeneratorbetween 0.96 and 1.16
Well512abetween 0.85 and 0.88
Well1024abetween 0.63 and 0.73
Well19937abetween 0.70 and 0.71
Well19937cbetween 0.57 and 0.71
Well44497abetween 0.69 and 0.71
Well44497bbetween 0.65 and 0.71
... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

 

new blog posts

 

Copyright 1998-2024 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.