alvinalexander.com | career | drupal | java | mac | mysql | perl | scala | uml | unix  
<th>"Rolling" capability? </table> </p> <p> <code>SummaryStatistics can be aggregated using <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html"> AggregateSummaryStatistics.</a> This class can be used to concurrently gather statistics for multiple datasets as well as for a combined sample including all of the data. </p> <p> <code>MultivariateSummaryStatistics is similar to SummaryStatistics but handles n-tuple values instead of scalar values. It can also compute the full covariance matrix for the input data. </p> <p> Neither <code>DescriptiveStatistics nor SummaryStatistics is thread-safe. <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedDescriptiveStatistics.html"> SynchronizedDescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedSummaryStatistics.html"> SynchronizedSummaryStatistics</a>, respectively, provide thread-safe versions for applications that require concurrent access to statistical aggregates by multiple threads. <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedMultiVariateSummaryStatistics.html"> SynchronizedMultivariateSummaryStatistics</a> provides threadsafe MultivariateSummaryStatistics. </p> <p> There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html"> StatUtils</a>, that provides static methods for computing statistics directly from double[] arrays. </p> <p> Here are some examples showing how to compute Descriptive statistics. <dl> <dt>Compute summary statistics for a list of double values <br>
<dd>Using the DescriptiveStatistics aggregate (values are stored in memory): <source> // Get a DescriptiveStatistics instance DescriptiveStatistics stats = new DescriptiveStatistics(); // Add the data from the array for( int i = 0; i < inputArray.length; i++) { stats.addValue(inputArray[i]); } // Compute some statistics double mean = stats.getMean(); double std = stats.getStandardDeviation(); double median = stats.getMedian(); </source> </dd> <dd>Using the SummaryStatistics aggregate (values are <strong>not stored in memory): <source> // Get a SummaryStatistics instance SummaryStatistics stats = new SummaryStatistics(); // Read data from an input stream, // adding values and updating sums, counters, etc. while (line != null) { line = in.readLine(); stats.addValue(Double.parseDouble(line.trim())); } in.close(); // Compute the statistics double mean = stats.getMean(); double std = stats.getStandardDeviation(); //double median = stats.getMedian(); <-- NOT AVAILABLE </source> </dd> <dd>Using the StatUtils utility class: <source> // Compute statistics directly from the array // assume values is a double[] array double mean = StatUtils.mean(values); double std = StatUtils.variance(values); double median = StatUtils.percentile(50); // Compute the mean of the first three values in the array mean = StatUtils.mean(values, 0, 3); </source> </dd> <dt>Maintain a "rolling mean" of the most recent 100 values from an input stream</dt> <br>
<dd>Use a DescriptiveStatistics instance with window size set to 100 <source> // Create a DescriptiveStats instance and set the window size to 100 DescriptiveStatistics stats = new DescriptiveStatistics(); stats.setWindowSize(100); // Read data from an input stream, // displaying the mean of the most recent 100 observations // after every 100 observations long nLines = 0; while (line != null) { line = in.readLine(); stats.addValue(Double.parseDouble(line.trim())); if (nLines == 100) { nLines = 0; System.out.println(stats.getMean()); } } in.close(); </source> </dd> <dt>Compute statistics in a thread-safe manner <br/> <dd>Use a SynchronizedDescriptiveStatistics instance <source> // Create a SynchronizedDescriptiveStatistics instance and // use as any other DescriptiveStatistics instance DescriptiveStatistics stats = new SynchronizedDescriptiveStatistics(); </source> </dd> <dt>Compute statistics for multiple samples and overall statistics concurrently <br/> <dd>There are two ways to do this using AggregateSummaryStatistics. The first is to use an <code>AggregateSummaryStatistics instance to accumulate overall statistics contributed by <code>SummaryStatistics instances created using <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#createContributingStatistics()"> AggregateSummaryStatistics.createContributingStatistics()</a>: <source> // Create a AggregateSummaryStatistics instance to accumulate the overall statistics // and AggregatingSummaryStatistics for the subsamples AggregateSummaryStatistics aggregate = new AggregateSummaryStatistics(); SummaryStatistics setOneStats = aggregate.createContributingStatistics(); SummaryStatistics setTwoStats = aggregate.createContributingStatistics(); // Add values to the subsample aggregates setOneStats.addValue(2); setOneStats.addValue(3); setTwoStats.addValue(2); setTwoStats.addValue(4); ... // Full sample data is reported by the aggregate double totalSampleSum = aggregate.getSum(); </source> The above approach has the disadvantages that the <code>addValue calls must be synchronized on the <code>SummaryStatistics instance maintained by the aggregate and each value addition updates the aggregate as well as the subsample. For applications that can wait to do the aggregation until all values have been added, a static <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#aggregate(java.util.Collection)"> aggregate</a> method is available, as shown in the following example. This method should be used when aggregation needs to be done across threads. <source> // Create SummaryStatistics instances for the subsample data SummaryStatistics setOneStats = new SummaryStatistics(); SummaryStatistics setTwoStats = new SummaryStatistics(); // Add values to the subsample SummaryStatistics instances setOneStats.addValue(2); setOneStats.addValue(3); setTwoStats.addValue(2); setTwoStats.addValue(4); ... // Aggregate the subsample statistics Collection<SummaryStatistics> aggregate = new ArrayList<SummaryStatistics>(); aggregate.add(setOneStats); aggregate.add(setTwoStats); StatisticalSummary aggregatedStats = AggregateSummaryStatistics.aggregate(aggregate); // Full sample data is reported by aggregatedStats double totalSampleSum = aggregatedStats.getSum(); </source> </dd> </dl> </p> </subsection> <subsection name="1.3 Frequency distributions"> <p> <a href="../apidocs/org/apache/commons/math/stat/Frequency.html"> org.apache.commons.math.stat.descriptive.Frequency</a> provides a simple interface for maintaining counts and percentages of discrete values. </p> <p> Strings, integers, longs and chars are all supported as value types, as well as instances of any class that implements <code>Comparable. The ordering of values used in computing cumulative frequencies is by default the <i>natural ordering, but this can be overriden by supplying a <code>Comparator to the constructor. Adding values that are not comparable to those that have already been added results in an <code>IllegalArgumentException. </p> <p> Here are some examples. <dl> <dt>Compute a frequency distribution based on integer values <br>
<dd>Mixing integers, longs, Integers and Longs: <source> Frequency f = new Frequency(); f.addValue(1); f.addValue(new Integer(1)); f.addValue(new Long(1)); f.addValue(2); f.addValue(new Integer(-1)); System.out.prinltn(f.getCount(1)); // displays 3 System.out.println(f.getCumPct(0)); // displays 0.2 System.out.println(f.getPct(new Integer(1))); // displays 0.6 System.out.println(f.getCumPct(-2)); // displays 0 System.out.println(f.getCumPct(10)); // displays 1 </source> </dd> <dt>Count string frequencies <br>
<dd>Using case-sensitive comparison, alpha sort order (natural comparator): <source> Frequency f = new Frequency(); f.addValue("one"); f.addValue("One"); f.addValue("oNe"); f.addValue("Z"); System.out.println(f.getCount("one")); // displays 1 System.out.println(f.getCumPct("Z")); // displays 0.5 System.out.println(f.getCumPct("Ot")); // displays 0.25 </source> </dd> <dd>Using case-insensitive comparator: <source> Frequency f = new Frequency(String.CASE_INSENSITIVE_ORDER); f.addValue("one"); f.addValue("One"); f.addValue("oNe"); f.addValue("Z"); System.out.println(f.getCount("one")); // displays 3 System.out.println(f.getCumPct("z")); // displays 1 </source> </dd> </dl> </p> </subsection> <subsection name="1.4 Simple regression"> <p> <a href="../apidocs/org/apache/commons/math/stat/regression/SimpleRegression.html"> org.apache.commons.math.stat.regression.SimpleRegression</a> provides ordinary least squares regression with one independent variable, estimating the linear model: </p> <p> <code> y = intercept + slope * x </p> <p> Standard errors for <code>intercept and slope are available as well as ANOVA, r-square and Pearson's r statistics. </p> <p> Observations (x,y pairs) can be added to the model one at a time or they can be provided in a 2-dimensional array. The observations are not stored in memory, so there is no limit to the number of observations that can be added to the model. </p> <p> <strong>Usage Notes:

Commons Math example source code file (stat.xml)

This example Commons Math source code file (stat.xml) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" TM.

Java - Commons Math tags/keywords

anova, anova, descriptive, descriptivestatistics, frequency, license, pearson's, summarystatistics, summarystatistics, the, the, this, to, to

The Commons Math stat.xml source code

<?xml version="1.0"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
  -->

<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 925895 $ $Date: 2010-03-21 17:05:20 -0400 (Sun, 21 Mar 2010) $ -->
<document url="stat.html">
  <properties>
    <title>The Commons Math User Guide - Statistics
  </properties>
  <body>
    <section name="1 Statistics">
      <subsection name="1.1 Overview">
        <p>
          The statistics package provides frameworks and implementations for
          basic Descriptive statistics, frequency distributions, bivariate regression,
          and t-, chi-square and ANOVA test statistics.
        </p>
        <p>
         <a href="#a1.2_Descriptive_statistics">Descriptive statistics

<a href="#a1.3_Frequency_distributions">Frequency distributions

<a href="#a1.4_Simple_regression">Simple Regression

<a href="#a1.5_Multiple_linear_regression">Multiple Regression

<a href="#a1.6_Rank_transformations">Rank transformations

<a href="#a1.7_Covariance_and_correlation">Covariance and correlation

<a href="#a1.8_Statistical_tests">Statistical Tests

</p> </subsection> <subsection name="1.2 Descriptive statistics"> <p> The stat package includes a framework and default implementations for the following Descriptive statistics: <ul> <li>arithmetic and geometric means <li>variance and standard deviation <li>sum, product, log sum, sum of squared values <li>minimum, maximum, median, and percentiles <li>skewness and kurtosis <li>first, second, third and fourth moments </ul> </p> <p> With the exception of percentiles and the median, all of these statistics can be computed without maintaining the full list of input data values in memory. The stat package provides interfaces and implementations that do not require value storage as well as implementations that operate on arrays of stored values. </p> <p> The top level interface is <a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html"> org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a> This interface, implemented by all statistics, consists of <code>evaluate() methods that take double[] arrays as arguments and return the value of the statistic. This interface is extended by <a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html"> StorelessUnivariateStatistic</a>, which adds increment(), <code>getResult() and associated methods to support "storageless" implementations that maintain counters, sums or other state information as values are added using the <code>increment() method. </p> <p> Abstract implementations of the top level interfaces are provided in <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html"> AbstractUnivariateStatistic</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html"> AbstractStorelessUnivariateStatistic</a> respectively. </p> <p> Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and each extends one of the abstract classes above (depending on whether or not value storage is required to compute the statistic). There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is generally more convenient (and efficient) to access them using the provided aggregates, <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> DescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> SummaryStatistics.</a> </p> <p> <code>DescriptiveStatistics maintains the input data in memory and has the capability of producing "rolling" statistics computed from a "window" consisting of the most recently added values. </p> <p> <code>SummaryStatistics does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be computed in one pass through the data without access to the full array of values. </p> <p> <table> <tr>
AggregateStatistics IncludedValues stored?
<a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> DescriptiveStatistics</a>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median</td>YesYes
<a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> SummaryStatistics</a>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td>NoNo
... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

 

new blog posts

 

Copyright 1998-2024 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.