<th>"Rolling" capability? </table> <code>SummaryStatistics can be aggregated using <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html"> AggregateSummaryStatistics.</a> This class can be used to concurrently gather statistics for multiple datasets as well as for a combined sample including all of the data. <code>MultivariateSummaryStatistics is similar to SummaryStatistics but handles n-tuple values instead of scalar values. It can also compute the full covariance matrix for the input data. Neither <code>DescriptiveStatistics nor SummaryStatistics is thread-safe. <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedDescriptiveStatistics.html"> SynchronizedDescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedSummaryStatistics.html"> SynchronizedSummaryStatistics</a>, respectively, provide thread-safe versions for applications that require concurrent access to statistical aggregates by multiple threads. <a href="../apidocs/org/apache/commons/math/stat/descriptive/SynchronizedMultiVariateSummaryStatistics.html"> SynchronizedMultivariateSummaryStatistics</a> provides threadsafe MultivariateSummaryStatistics. There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html"> StatUtils</a>, that provides static methods for computing statistics directly from double[] arrays. Here are some examples showing how to compute Descriptive statistics. <dl> <dt>Compute summary statistics for a list of double values 
<dd>Using the DescriptiveStatistics aggregate (values are stored in memory): <source> // Get a DescriptiveStatistics instance DescriptiveStatistics stats = new DescriptiveStatistics(); // Add the data from the array for( int i = 0; i < inputArray.length; i++) { stats.addValue(inputArray[i]); } // Compute some statistics double mean = stats.getMean(); double std = stats.getStandardDeviation(); double median = stats.getMedian(); </source> </dd> <dd>Using the SummaryStatistics aggregate (values are not stored in memory): <source> // Get a SummaryStatistics instance SummaryStatistics stats = new SummaryStatistics(); // Read data from an input stream, // adding values and updating sums, counters, etc. while (line != null) { line = in.readLine(); stats.addValue(Double.parseDouble(line.trim())); } in.close(); // Compute the statistics double mean = stats.getMean(); double std = stats.getStandardDeviation(); //double median = stats.getMedian(); <-- NOT AVAILABLE </source> </dd> <dd>Using the StatUtils utility class: <source> // Compute statistics directly from the array // assume values is a double[] array double mean = StatUtils.mean(values); double std = StatUtils.variance(values); double median = StatUtils.percentile(50); // Compute the mean of the first three values in the array mean = StatUtils.mean(values, 0, 3); </source> </dd> <dt>Maintain a "rolling mean" of the most recent 100 values from an input stream</dt> 
<dd>Use a DescriptiveStatistics instance with window size set to 100 <source> // Create a DescriptiveStats instance and set the window size to 100 DescriptiveStatistics stats = new DescriptiveStatistics(); stats.setWindowSize(100); // Read data from an input stream, // displaying the mean of the most recent 100 observations // after every 100 observations long nLines = 0; while (line != null) { line = in.readLine(); stats.addValue(Double.parseDouble(line.trim())); if (nLines == 100) { nLines = 0; System.out.println(stats.getMean()); } } in.close(); </source> </dd> <dt>Compute statistics in a thread-safe manner <dd>Use a SynchronizedDescriptiveStatistics instance <source> // Create a SynchronizedDescriptiveStatistics instance and // use as any other DescriptiveStatistics instance DescriptiveStatistics stats = new SynchronizedDescriptiveStatistics(); </source> </dd> <dt>Compute statistics for multiple samples and overall statistics concurrently <dd>There are two ways to do this using AggregateSummaryStatistics. The first is to use an <code>AggregateSummaryStatistics instance to accumulate overall statistics contributed by <code>SummaryStatistics instances created using <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#createContributingStatistics()"> AggregateSummaryStatistics.createContributingStatistics()</a>: <source> // Create a AggregateSummaryStatistics instance to accumulate the overall statistics // and AggregatingSummaryStatistics for the subsamples AggregateSummaryStatistics aggregate = new AggregateSummaryStatistics(); SummaryStatistics setOneStats = aggregate.createContributingStatistics(); SummaryStatistics setTwoStats = aggregate.createContributingStatistics(); // Add values to the subsample aggregates setOneStats.addValue(2); setOneStats.addValue(3); setTwoStats.addValue(2); setTwoStats.addValue(4); ... // Full sample data is reported by the aggregate double totalSampleSum = aggregate.getSum(); </source> The above approach has the disadvantages that the <code>addValue calls must be synchronized on the <code>SummaryStatistics instance maintained by the aggregate and each value addition updates the aggregate as well as the subsample. For applications that can wait to do the aggregation until all values have been added, a static <a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html#aggregate(java.util.Collection)"> aggregate</a> method is available, as shown in the following example. This method should be used when aggregation needs to be done across threads. <source> // Create SummaryStatistics instances for the subsample data SummaryStatistics setOneStats = new SummaryStatistics(); SummaryStatistics setTwoStats = new SummaryStatistics(); // Add values to the subsample SummaryStatistics instances setOneStats.addValue(2); setOneStats.addValue(3); setTwoStats.addValue(2); setTwoStats.addValue(4); ... // Aggregate the subsample statistics Collection<SummaryStatistics> aggregate = new ArrayList<SummaryStatistics>(); aggregate.add(setOneStats); aggregate.add(setTwoStats); StatisticalSummary aggregatedStats = AggregateSummaryStatistics.aggregate(aggregate); // Full sample data is reported by aggregatedStats double totalSampleSum = aggregatedStats.getSum(); </source> </dd> </dl> </subsection> <subsection name="1.3 Frequency distributions"> <a href="../apidocs/org/apache/commons/math/stat/Frequency.html"> org.apache.commons.math.stat.descriptive.Frequency</a> provides a simple interface for maintaining counts and percentages of discrete values. Strings, integers, longs and chars are all supported as value types, as well as instances of any class that implements <code>Comparable. The ordering of values used in computing cumulative frequencies is by default the natural ordering, but this can be overriden by supplying a <code>Comparator to the constructor. Adding values that are not comparable to those that have already been added results in an <code>IllegalArgumentException. Here are some examples. <dl> <dt>Compute a frequency distribution based on integer values 
<dd>Mixing integers, longs, Integers and Longs: <source> Frequency f = new Frequency(); f.addValue(1); f.addValue(new Integer(1)); f.addValue(new Long(1)); f.addValue(2); f.addValue(new Integer(-1)); System.out.prinltn(f.getCount(1)); // displays 3 System.out.println(f.getCumPct(0)); // displays 0.2 System.out.println(f.getPct(new Integer(1))); // displays 0.6 System.out.println(f.getCumPct(-2)); // displays 0 System.out.println(f.getCumPct(10)); // displays 1 </source> </dd> <dt>Count string frequencies 
<dd>Using case-sensitive comparison, alpha sort order (natural comparator): <source> Frequency f = new Frequency(); f.addValue("one"); f.addValue("One"); f.addValue("oNe"); f.addValue("Z"); System.out.println(f.getCount("one")); // displays 1 System.out.println(f.getCumPct("Z")); // displays 0.5 System.out.println(f.getCumPct("Ot")); // displays 0.25 </source> </dd> <dd>Using case-insensitive comparator: <source> Frequency f = new Frequency(String.CASE_INSENSITIVE_ORDER); f.addValue("one"); f.addValue("One"); f.addValue("oNe"); f.addValue("Z"); System.out.println(f.getCount("one")); // displays 3 System.out.println(f.getCumPct("z")); // displays 1 </source> </dd> </dl> </subsection> <subsection name="1.4 Simple regression"> <a href="../apidocs/org/apache/commons/math/stat/regression/SimpleRegression.html"> org.apache.commons.math.stat.regression.SimpleRegression</a> provides ordinary least squares regression with one independent variable, estimating the linear model: <code> y = intercept + slope * x Standard errors for <code>intercept and slope are available as well as ANOVA, r-square and Pearson's r statistics. Observations (x,y pairs) can be added to the model one at a time or they can be provided in a 2-dimensional array. The observations are not stored in memory, so there is no limit to the number of observations that can be added to the model. Usage Notes:

regressand

[n,k]

k

k-vector

regression parameters

u

n-vector

residuals

addData(double[] y, double[][] x, double[][] covariance)

MultipleLinearRegression

Ties strategy</a> deterimines how ties in the source data are handled by the ranking <li>

NaN strategy</a> determines how NaN values in the source data are handled. </ul> Examples: <source> NaturalRanking ranking = new NaturalRanking(NaNStrategy.MINIMAL, TiesStrategy.MAXIMUM); double[] data = { 20, 17, 30, 42.3, 17, 50, Double.NaN, Double.NEGATIVE_INFINITY, 17 }; double[] ranks = ranking.rank(exampleData); </source> results in <code>ranks containing {6, 5, 7, 8, 5, 9, 2, 2, 5}. <source> new NaturalRanking(NaNStrategy.REMOVED,TiesStrategy.SEQUENTIAL).rank(exampleData); </source> returns <code>{5, 2, 6, 7, 3, 8, 1, 4}. The default <code>NaNStrategy is NaNStrategy.MAXIMAL. This makes NaN values larger than any other value (including <code>Double.POSITIVE_INFINITY). The default <code>TiesStrategy is TiesStrategy.AVERAGE, which assigns tied values the average of the ranks applicable to the sequence of ties. See the <a href="../apidocs/org/apache/commons/math/stat/ranking/NaturalRanking.html"> NaturalRanking</a> for more examples and

TiesStrategy</a> and

NaNStrategy

X

E(Y)

n - 1.

E(Y)

X

Y

s(Y)

Covariance of 2 arrays

y

Covariance matrix

data

Pearson's correlation of 2 arrays

y

Pearson's correlation matrix

data

Pearson's correlation significance and standard errors

RealMatrix.

^1/2

_n-2

_ij

^1/2

_ij

Spearman's rank correlation coefficient

y

t-

One-Way ANOVA

One-sample t tests

mu.

Two-Sample t-tests

Example 1:

sample2

Example 2:

SummaryStatistics

Note:

SummaryStatistics

Chi-square tests

double[]

expected

observed

alpha


          0 < alpha < 1 </code> use:
          <source>
TestUtils.chiSquareTest(expected, observed, alpha);
          </source>
          The boolean value returned will be <code>true

j

alpha

One-Way Anova tests

TestUtils

Other Commons Math examples (source code examples)

Here is a short list of links related to this Commons Math stat.xml source code file:

	Commons Math example source code file (stat.xml) This example Commons Math source code file (stat.xml) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" ^TM. Java - Commons Math tags/keywords anova, anova, descriptive, descriptivestatistics, frequency, license, pearson's, summarystatistics, summarystatistics, the, the, this, to, to The Commons Math stat.xml source code <?xml version="1.0"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?> <!-- $Revision: 925895 $ $Date: 2010-03-21 17:05:20 -0400 (Sun, 21 Mar 2010) $ --> <document url="stat.html"> <properties> <title>The Commons Math User Guide - Statistics </properties> <body> <section name="1 Statistics"> <subsection name="1.1 Overview"> <p> The statistics package provides frameworks and implementations for basic Descriptive statistics, frequency distributions, bivariate regression, and t-, chi-square and ANOVA test statistics. </p> <p> <a href="#a1.2_Descriptive_statistics">Descriptive statistics <a href="#a1.3_Frequency_distributions">Frequency distributions <a href="#a1.4_Simple_regression">Simple Regression <a href="#a1.5_Multiple_linear_regression">Multiple Regression <a href="#a1.6_Rank_transformations">Rank transformations <a href="#a1.7_Covariance_and_correlation">Covariance and correlation <a href="#a1.8_Statistical_tests">Statistical Tests </p> </subsection> <subsection name="1.2 Descriptive statistics"> <p> The stat package includes a framework and default implementations for the following Descriptive statistics: <ul> <li>arithmetic and geometric means <li>variance and standard deviation <li>sum, product, log sum, sum of squared values <li>minimum, maximum, median, and percentiles <li>skewness and kurtosis <li>first, second, third and fourth moments </ul> </p> <p> With the exception of percentiles and the median, all of these statistics can be computed without maintaining the full list of input data values in memory. The stat package provides interfaces and implementations that do not require value storage as well as implementations that operate on arrays of stored values. </p> <p> The top level interface is <a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html"> org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a> This interface, implemented by all statistics, consists of <code>evaluate() methods that take double[] arrays as arguments and return the value of the statistic. This interface is extended by <a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html"> StorelessUnivariateStatistic</a>, which adds `increment(),` <code>getResult() and associated methods to support "storageless" implementations that maintain counters, sums or other state information as values are added using the <code>increment() method. </p> <p> Abstract implementations of the top level interfaces are provided in <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html"> AbstractUnivariateStatistic</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html"> AbstractStorelessUnivariateStatistic</a> respectively. </p> <p> Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and each extends one of the abstract classes above (depending on whether or not value storage is required to compute the statistic). There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is generally more convenient (and efficient) to access them using the provided aggregates, <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> DescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> SummaryStatistics.</a> </p> <p> <code>DescriptiveStatistics maintains the input data in memory and has the capability of producing "rolling" statistics computed from a "window" consisting of the most recently added values. </p> <p> <code>SummaryStatistics does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be computed in one pass through the data without access to the full array of values. </p> <p> <table> <tr>	Aggregate	Statistics Included	Values stored?
<a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> DescriptiveStatistics</a>	min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median</td>	Yes	Yes
<a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> SummaryStatistics</a>	min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td>	No	No

Commons Math example source code file (stat.xml)

This example Commons Math source code file (stat.xml) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" ^TM.

Java - Commons Math tags/keywords

anova, anova, descriptive, descriptivestatistics, frequency, license, pearson's, summarystatistics, summarystatistics, the, the, this, to, to

The Commons Math stat.xml source code

<?xml version="1.0"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
  -->

<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 925895 $ $Date: 2010-03-21 17:05:20 -0400 (Sun, 21 Mar 2010) $ -->
<document url="stat.html">
  <properties>
    <title>The Commons Math User Guide - Statistics
  </properties>
  <body>
    <section name="1 Statistics">
      <subsection name="1.1 Overview">
        <p>
          The statistics package provides frameworks and implementations for
          basic Descriptive statistics, frequency distributions, bivariate regression,
          and t-, chi-square and ANOVA test statistics.
        </p>
        <p>
         <a href="#a1.2_Descriptive_statistics">Descriptive statistics


         <a href="#a1.3_Frequency_distributions">Frequency distributions


         <a href="#a1.4_Simple_regression">Simple Regression


         <a href="#a1.5_Multiple_linear_regression">Multiple Regression


         <a href="#a1.6_Rank_transformations">Rank transformations


         <a href="#a1.7_Covariance_and_correlation">Covariance and correlation


         <a href="#a1.8_Statistical_tests">Statistical Tests


        </p>
      </subsection>
      <subsection name="1.2 Descriptive statistics">
        <p>
          The stat package includes a framework and default implementations for
           the following Descriptive statistics:
          <ul>
            <li>arithmetic and geometric means
            <li>variance and standard deviation
            <li>sum, product, log sum, sum of squared values
            <li>minimum, maximum, median, and percentiles
            <li>skewness and kurtosis
            <li>first, second, third and fourth moments
          </ul>
        </p>
        <p>
          With the exception of percentiles and the median, all of these
          statistics can be computed without maintaining the full list of input
          data values in memory.  The stat package provides interfaces and
          implementations that do not require value storage as well as
          implementations that operate on arrays of stored values.
        </p>
        <p>
          The top level interface is
          <a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html">
          org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a>
          This interface, implemented by all statistics, consists of
          <code>evaluate() methods that take double[] arrays as arguments
          and return the value of the statistic.   This interface is extended by
          <a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html">
          StorelessUnivariateStatistic</a>, which adds increment(),
          <code>getResult() and associated methods to support
          "storageless" implementations that maintain counters, sums or other
          state information as values are added using the <code>increment()
          method.
        </p>
        <p>
          Abstract implementations of the top level interfaces are provided in
          <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html">
          AbstractUnivariateStatistic</a> and
          <a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html">
          AbstractStorelessUnivariateStatistic</a> respectively.
        </p>
        <p>
          Each statistic is implemented as a separate class, in one of the
          subpackages (moment, rank, summary) and each extends one of the abstract
          classes above (depending on whether or not value storage is required to
          compute the statistic). There are several ways to instantiate and use statistics.
          Statistics can be instantiated and used directly,  but it is generally more convenient
          (and efficient) to access them using the provided aggregates,
          <a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html">
           DescriptiveStatistics</a> and
           <a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html">
           SummaryStatistics.</a>
        </p>
        <p>
           <code>DescriptiveStatistics maintains the input data in memory
           and has the capability of producing "rolling" statistics computed from a
           "window" consisting of the most recently added values.
        </p>
        <p>
           <code>SummaryStatistics does not store the input data values
           in memory, so the statistics included in this aggregate are limited to those
           that can be computed in one pass through the data without access to
           the full array of values.
        </p>
        <p>
          <table>
            <tr>

Aggregate

Statistics Included

Values stored?

<a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html"> DescriptiveStatistics</a> min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median</td> Yes Yes

<a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html"> SummaryStatistics</a> min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td> No No

Copyright 1998-2024 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.

... this post is sponsored by my books ...
#1 New Release!	FP Best Seller

Other Commons Math examples (source code examples)

Commons Math example source code file (stat.xml)

Java - Commons Math tags/keywords

The Commons Math stat.xml source code

new blog posts