alvinalexander.com | career | drupal | java | mac | mysql | perl | scala | uml | unix  

Commons Math example source code file (random.xml)

This example Commons Math source code file (random.xml) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" TM.

Java - Commons Math tags/keywords

if, jdk, license, license, mersenne, prng, prng, random, randomdataimpl, randomdataimpl, the, the, valueserver, when

The Commons Math random.xml source code

<?xml version="1.0"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
  -->
  
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 796546 $ $Date: 2009-07-21 17:38:55 -0400 (Tue, 21 Jul 2009) $ -->
<document url="random.html">

<properties>
    <title>The Commons Math User Guide - Data Generation
</properties>

<body>

<section name="2 Data Generation">

<subsection name="2.1 Overview" href="overview">
    <p>
    The Commons Math random package includes utilities for
    <ul>
        <li>generating random numbers
        <li>generating random strings
        <li>generating cryptographically secure sequences of random numbers or
         strings</li>
        <li>generating random samples and permutations
        <li>analyzing distributions of values in an input file and generating
         values "like" the values in the file</li>
        <li>generating data for grouped frequency distributions or
         histograms</li>
    </ul>

<p> The source of random data used by the data generation utilities is pluggable. By default, the JDK-supplied PseudoRandom Number Generator (PRNG) is used, but alternative generators can be "plugged in" using an adaptor framework, which provides a generic facility for replacing <code>java.util.Random with an alternative PRNG. Another very good PRNG suitable for Monte-Carlo analysis (but <strong>not for cryptography) provided by the library is the Mersenne twister from Makoto Matsumoto and Takuji Nishimura </p> <p> Sections 2.2-2.6 below show how to use the commons math API to generate different kinds of random data. The examples all use the default JDK-supplied PRNG. PRNG pluggability is covered in 2.7. The only modification required to the examples to use alternative PRNGs is to replace the argumentless constructor calls with invocations including a <code>RandomGenerator instance as a parameter. </p> </subsection> <subsection name="2.2 Random numbers" href="deviates"> <p> The <a href="../apidocs/org/apache/commons/math/random/RandomData.html"> org.apache.commons.math.RandomData</a> interface defines methods for generating random sequences of numbers. The API contracts of these methods use the following concepts: <dl> <dt>Random sequence of numbers from a probability distribution <dd>There is no such thing as a single "random number." What can be generated are <i>sequences of numbers that appear to be random. When using the built-in JDK function <code>Math.random(), sequences of values generated follow the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm"> Uniform Distribution</a>, which means that the values are evenly spread over the interval between 0 and 1, with no sub-interval having a greater probability of containing generated values than any other interval of the same length. The mathematical concept of a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm"> probability distribution</a> basically amounts to asserting that different ranges in the set of possible values of a random variable have different probabilities of containing the value. Commons Math supports generating random sequences from the following probability distributions. The javadoc for the <code>nextXxx methods in <code>RandomDataImpl describes the algorithms used to generate random deviates from each of these distributions. <ul> <li> uniform distribution</a> <li> exponential distribution</a> <li> poisson distribution</a> <li> Gaussian distribution</a> </ul> </dd> <dt>Cryptographically secure random sequences <dd>It is possible for a sequence of numbers to appear random, but nonetheless to be predictable based on the algorithm used to generate the sequence. If in addition to randomness, strong unpredictability is required, it is best to use a <a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator"> secure random number generator</a> to generate values (or strings). The nextSecureXxx methods in the <code>RandomDataImpl implementation of the <code>RandomData interface use the JDK SecureRandom PRNG to generate cryptographically secure sequences. The <code>setSecureAlgorithm method allows you to change the underlying PRNG. These methods are <strong>much slower than the corresponding "non-secure" versions, so they should only be used when cryptographic security is required.</dd> <dt>Seeding pseudo-random number generators <dd>By default, the implementation provided in RandomDataImpl uses the JDK-provided PRNG. Like most other PRNGs, the JDK generator generates sequences of random numbers based on an initial "seed value". For the non-secure methods, starting with the same seed always produces the same sequence of values. Secure sequences started with the same seeds will diverge. When a new <code>RandomDataImpl is created, the underlying random number generators are <strong>not initialized. The first call to a data generation method, or to a <code>reSeed() method initializes the appropriate generator. If you do not explicitly seed the generator, it is by default seeded with the current time in milliseconds. Therefore, to generate sequences of random data values, you should always instantiate <strong>one RandomDataImpl and use it repeatedly instead of creating new instances for subsequent values in the sequence. For example, the following will generate a random sequence of 50 long integers between 1 and 1,000,000, using the current time in milliseconds as the seed for the JDK PRNG: <source> RandomData randomData = new RandomDataImpl(); for (int i = 0; i < 1000; i++) { value = randomData.nextLong(1, 1000000); } </source> The following will not in general produce a good random sequence, since the PRNG is reseeded each time through the loop with the current time in milliseconds: <source> for (int i = 0; i < 1000; i++) { RandomDataImpl randomData = new RandomDataImpl(); value = randomData.nextLong(1, 1000000); } </source> The following will produce the same random sequence each time it is executed: <source> RandomData randomData = new RandomDataImpl(); randomData.reSeed(1000); for (int i = 0; i = 1000; i++) { value = randomData.nextLong(1, 1000000); } </source> The following will produce a different random sequence each time it is executed. <source> RandomData randomData = new RandomDataImpl(); randomData.reSeedSecure(1000); for (int i = 0; i < 1000; i++) { value = randomData.nextSecureLong(1, 1000000); } </source> </dd> </p> </subsection> <subsection name="2.3 Random Vectors" href="vectors"> <p> Some algorithm requires random vectors instead of random scalars. When the components of these vectors are uncorrelated, they may be generated simply one at a time and packed together in the vector. The <a href="../apidocs/org/apache/commons/math/random/UncorrelatedRandomVectorGenerator.html"> org.apache.commons.math.UncorrelatedRandomVectorGenerator</a> class does however simplify this process by setting the mean and deviation of each component once and generating complete vectors. When the components are correlated however, generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math/random/CorrelatedRandomVectorGenerator.html"> org.apache.commons.math.CorrelatedRandomVectorGenerator</a> class provides this service. In this case, the user must set up a complete covariance matrix instead of a simple standard deviations vector. This matrix gathers both the variance and the correlation information of the probability law. </p> <p> The main use for correlated random vector generation is for Monte-Carlo simulation of physical problems with several variables, for example to generate error vectors to be added to a nominal vector. A particularly common case is when the generated vector should be drawn from a <a href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution"> Multivariate Normal Distribution</a>. </p> </subsection> <subsection name="2.4 Random Strings" href="strings"> <p> The methods <code>nextHexString and nextSecureHexString can be used to generate random strings of hexadecimal characters. Both of these methods produce sequences of strings with good dispersion properties. The difference between the two methods is that the second is cryptographically secure. Specifically, the implementation of <code>nextHexString(n) in RandomDataImpl uses the following simple algorithm to generate a string of <code>n hex digits: <ol> <li>n/2+1 binary bytes are generated using the underlying Random <li>Each binary byte is translated into 2 hex digits The <code>RandomDataImpl implementation of the "secure" version, <code>nextSecureHexString generates hex characters in 40-byte "chunks" using a 3-step process: <ol> <li>20 random bytes are generated using the underlying <code>SecureRandom. <li>SHA-1 hash is applied to yield a 20-byte binary digest. <li>Each byte of the binary digest is converted to 2 hex digits Similarly to the secure random number generation methods, <code>nextSecureHexString is much slower than the non-secure version. It should be used only for applications such as generating unique session or transaction ids where predictability of subsequent ids based on observation of previous values is a security concern. If all that is needed is an even distribution of hex characters in the generated strings, the non-secure method should be used. </p> </subsection> <subsection name="2.5 Random permutations, combinations, sampling" href="combinatorics"> <p> To select a random sample of objects in a collection, you can use the <code>nextSample method in the RandomData interface. Specifically, if <code>c is a collection containing at least <code>k objects, and randomData is a <code>RandomData instance randomData.nextSample(c, k) will return an <code>object[] array of length k consisting of elements randomly selected from the collection. If <code>c contains duplicate references, there may be duplicate references in the returned array; otherwise returned elements will be unique -- i.e., the sampling is without replacement among the object references in the collection. </p> <p> If <code>randomData is a RandomData instance, and <code>n and k are integers with <code> k <= n, then <code>randomData.nextPermutation(n, k) returns an int[] array of length <code>k whose whose entries are selected randomly, without repetition, from the integers <code>0 through <code>n-1 (inclusive), i.e., <code>randomData.nextPermutation(n, k) returns a random permutation of <code>n taken k at a time. </p> </subsection> <subsection name="2.6 Generating data 'like' an input file" href="empirical"> <p> Using the <code>ValueServer class, you can generate data based on the values in an input file in one of two ways: <dl> <dt>Replay Mode <dd> The following code will read data from url (a <code>java.net.URL instance), cycling through the values in the file in sequence, reopening and starting at the beginning again when all values have been read. <source> ValueServer vs = new ValueServer(); vs.setValuesFileURL(url); vs.setMode(ValueServer.REPLAY_MODE); vs.resetReplayFile(); double value = vs.getNext(); // ...Generate and use more values... vs.closeReplayFile(); </source> The values in the file are not stored in memory, so it does not matter how large the file is, but you do need to explicitly close the file as above. The expected file format is \n -delimited (i.e. one per line) strings representing valid floating point numbers. </dd> <dt>Digest Mode <dd>When used in Digest Mode, the ValueServer reads the entire input file and estimates a probability density function based on data from the file. The estimation method is essentially the <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html"> Variable Kernel Method</a> with Gaussian smoothing. Once the density has been estimated, <code>getNext() returns random values whose probability distribution matches the empirical distribution -- i.e., if you generate a large number of such values, their distribution should "look like" the distribution of the values in the input file. The values are not stored in memory in this case either, so there is no limit to the size of the input file. Here is an example: <source> ValueServer vs = new ValueServer(); vs.setValuesFileURL(url); vs.setMode(ValueServer.DIGEST_MODE); vs.computeDistribution(500); //Read file and estimate distribution using 500 bins double value = vs.getNext(); // ...Generate and use more values... </source> See the javadoc for <code>ValueServer and <code>EmpiricalDistribution for more details. Note that <code>computeDistribution() opens and closes the input file by itself. </dd> </dl> </p> </subsection> <subsection name="2.7 PRNG Pluggability" href="pluggability"> <p> To enable alternative PRNGs to be "plugged in" to the commons-math data generation utilities and to provide a generic means to replace <code>java.util.Random in applications, a random generator adaptor framework has been added to commons-math. The <a href="../apidocs/org/apache/commons/math/random/RandomGenerator.html"> org.apache.commons.math.RandomGenerator</a> interface abstracts the public interface of <code>java.util.Random and any implementation of this interface can be used as the source of random data for the commons-math data generation classes. An abstract base class, <a href="../apidocs/org/apache/commons/math/random/AbstractRandomGenerator.html"> org.apache.commons.math.AbstractRandomGenerator</a> is provided to make implementation easier. This class provides default implementations of "derived" data generation methods based on the primitive, <code>nextDouble(). To support generic replacement of <code>java.util.Random, the <a href="../apidocs/org/apache/commons/math/random/RandomAdaptor.html"> org.apache.commons.math.RandomAdaptor</a> class is provided, which extends <code>java.util.Random and wraps and delegates calls to a <code>RandomGenerator instance. </p> <p> Examples: <dl> <dt>Create a RandomGenerator based on RngPack's Mersenne Twister <dd>To create a RandomGenerator using the RngPack Mersenne Twister PRNG as the source of randomness, extend <code>AbstractRandomGenerator overriding the derived methods that the RngPack implementation provides: <source> import edu.cornell.lassp.houle.RngPack.RanMT; /** * AbstractRandomGenerator based on RngPack RanMT generator. */ public class RngPackGenerator extends AbstractRandomGenerator { private RanMT random = new RanMT(); public void setSeed(long seed) { random = new RanMT(seed); } public double nextDouble() { return random.raw(); } public double nextGaussian() { return random.gaussian(); } public int nextInt(int n) { return random.choose(n); } public boolean nextBoolean() { return random.coin(); } } </source> </dd> <dt>Use the Mersenne Twister RandomGenerator in place of <code>java.util.Random in RandomData <dd> <source> RandomData randomData = new RandomDataImpl(new RngPackGenerator()); </source> </dd> <dt>Create an adaptor instance based on the Mersenne Twister generator that can be used in place of a <code>Random <dd> <source> RandomGenerator generator = new RngPackGenerator(); Random random = RandomAdaptor.createAdaptor(generator); // random can now be used in place of a Random instance, data generation // calls will be delegated to the wrapped Mersenne Twister </source> </dd> </dl> </p> </subsection> </section> </body> </document>
... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

 

new blog posts

 

Copyright 1998-2024 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.