This example Java source code file (GTest.java) is included in the alvinalexander.com "Java Source Code Warehouse" project.

Learn more about this Java project at its project page.

Java - Java tags/keywords

chisquareddistribution, dimensionmismatchexception, gtest, maxcountexceededexception, notpositiveexception, notstrictlypositiveexception, outofrangeexception, zeroexception

The GTest.java Java example source code

 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *      http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * See the License for the specific language governing permissions and
 * limitations under the License.
package org.apache.commons.math3.stat.inference;

import org.apache.commons.math3.distribution.ChiSquaredDistribution;
import org.apache.commons.math3.exception.DimensionMismatchException;
import org.apache.commons.math3.exception.MaxCountExceededException;
import org.apache.commons.math3.exception.NotPositiveException;
import org.apache.commons.math3.exception.NotStrictlyPositiveException;
import org.apache.commons.math3.exception.OutOfRangeException;
import org.apache.commons.math3.exception.ZeroException;
import org.apache.commons.math3.exception.util.LocalizedFormats;
import org.apache.commons.math3.util.FastMath;
import org.apache.commons.math3.util.MathArrays;

 * Implements <a href="http://en.wikipedia.org/wiki/G-test">G Test
 * statistics.
 * <p>This is known in statistical genetics as the McDonald-Kreitman test.
 * The implementation handles both known and unknown distributions.</p>
 * <p>Two samples tests can be used when the distribution is unknown a priori
 * but provided by one sample, or when the hypothesis under test is that the two
 * samples come from the same underlying distribution.</p>
 * @since 3.1
public class GTest {

     * Computes the <a href="http://en.wikipedia.org/wiki/G-test">G statistic
     * for Goodness of Fit</a> comparing {@code observed} and {@code expected}
     * frequency counts.
     * <p>This statistic can be used to perform a G test (Log-Likelihood Ratio
     * Test) evaluating the null hypothesis that the observed counts follow the
     * expected distribution.</p>
     * <p>Preconditions: 
    * <li>Expected counts must all be positive. * <li>Observed counts must all be ≥ 0. * <li>The observed and expected arrays must have the same length and their * common length must be at least 2. </li>

* * <p>If any of the preconditions are not met, a * {@code MathIllegalArgumentException} is thrown.</p> * * <p>Note:This implementation rescales the * {@code expected} array if necessary to ensure that the sum of the * expected and observed counts are equal.</p> * * @param observed array of observed frequency counts * @param expected array of expected frequency counts * @return G-Test statistic * @throws NotPositiveException if {@code observed} has negative entries * @throws NotStrictlyPositiveException if {@code expected} has entries that * are not strictly positive * @throws DimensionMismatchException if the array lengths do not match or * are less than 2. */ public double g(final double[] expected, final long[] observed) throws NotPositiveException, NotStrictlyPositiveException, DimensionMismatchException { if (expected.length < 2) { throw new DimensionMismatchException(expected.length, 2); } if (expected.length != observed.length) { throw new DimensionMismatchException(expected.length, observed.length); } MathArrays.checkPositive(expected); MathArrays.checkNonNegative(observed); double sumExpected = 0d; double sumObserved = 0d; for (int i = 0; i < observed.length; i++) { sumExpected += expected[i]; sumObserved += observed[i]; } double ratio = 1d; boolean rescale = false; if (FastMath.abs(sumExpected - sumObserved) > 10E-6) { ratio = sumObserved / sumExpected; rescale = true; } double sum = 0d; for (int i = 0; i < observed.length; i++) { final double dev = rescale ? FastMath.log((double) observed[i] / (ratio * expected[i])) : FastMath.log((double) observed[i] / expected[i]); sum += ((double) observed[i]) * dev; } return 2d * sum; } /** * Returns the <i>observed significance level, or , * associated with a G-Test for goodness of fit</a> comparing the * {@code observed} frequency counts to those in the {@code expected} array. * * <p>The number returned is the smallest significance level at which one * can reject the null hypothesis that the observed counts conform to the * frequency distribution described by the expected counts.</p> * * <p>The probability returned is the tail probability beyond * {@link #g(double[], long[]) g(expected, observed)} * in the ChiSquare distribution with degrees of freedom one less than the * common length of {@code expected} and {@code observed}.</p> * * <p> Preconditions:

