|
The Commons Math stat.xml source code
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 925895 $ $Date: 2010-03-21 17:05:20 -0400 (Sun, 21 Mar 2010) $ -->
<document url="stat.html">
<properties>
<title>The Commons Math User Guide - Statistics
</properties>
<body>
<section name="1 Statistics">
<subsection name="1.1 Overview">
<p>
The statistics package provides frameworks and implementations for
basic Descriptive statistics, frequency distributions, bivariate regression,
and t-, chi-square and ANOVA test statistics.
</p>
<p>
<a href="#a1.2_Descriptive_statistics">Descriptive statistics
<a href="#a1.3_Frequency_distributions">Frequency distributions
<a href="#a1.4_Simple_regression">Simple Regression
<a href="#a1.5_Multiple_linear_regression">Multiple Regression
<a href="#a1.6_Rank_transformations">Rank transformations
<a href="#a1.7_Covariance_and_correlation">Covariance and correlation
<a href="#a1.8_Statistical_tests">Statistical Tests
</p>
</subsection>
<subsection name="1.2 Descriptive statistics">
<p>
The stat package includes a framework and default implementations for
the following Descriptive statistics:
<ul>
<li>arithmetic and geometric means
<li>variance and standard deviation
<li>sum, product, log sum, sum of squared values
<li>minimum, maximum, median, and percentiles
<li>skewness and kurtosis
<li>first, second, third and fourth moments
</ul>
</p>
<p>
With the exception of percentiles and the median, all of these
statistics can be computed without maintaining the full list of input
data values in memory. The stat package provides interfaces and
implementations that do not require value storage as well as
implementations that operate on arrays of stored values.
</p>
<p>
The top level interface is
<a href="../apidocs/org/apache/commons/math/stat/descriptive/UnivariateStatistic.html">
org.apache.commons.math.stat.descriptive.UnivariateStatistic.</a>
This interface, implemented by all statistics, consists of
<code>evaluate() methods that take double[] arrays as arguments
and return the value of the statistic. This interface is extended by
<a href="../apidocs/org/apache/commons/math/stat/descriptive/StorelessUnivariateStatistic.html">
StorelessUnivariateStatistic</a>, which adds increment(),
<code>getResult() and associated methods to support
"storageless" implementations that maintain counters, sums or other
state information as values are added using the <code>increment()
method.
</p>
<p>
Abstract implementations of the top level interfaces are provided in
<a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractUnivariateStatistic.html">
AbstractUnivariateStatistic</a> and
<a href="../apidocs/org/apache/commons/math/stat/descriptive/AbstractStorelessUnivariateStatistic.html">
AbstractStorelessUnivariateStatistic</a> respectively.
</p>
<p>
Each statistic is implemented as a separate class, in one of the
subpackages (moment, rank, summary) and each extends one of the abstract
classes above (depending on whether or not value storage is required to
compute the statistic). There are several ways to instantiate and use statistics.
Statistics can be instantiated and used directly, but it is generally more convenient
(and efficient) to access them using the provided aggregates,
<a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html">
DescriptiveStatistics</a> and
<a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html">
SummaryStatistics.</a>
</p>
<p>
<code>DescriptiveStatistics maintains the input data in memory
and has the capability of producing "rolling" statistics computed from a
"window" consisting of the most recently added values.
</p>
<p>
<code>SummaryStatistics does not store the input data values
in memory, so the statistics included in this aggregate are limited to those
that can be computed in one pass through the data without access to
the full array of values.
</p>
<p>
<table>
<tr>Aggregate | Statistics Included | Values stored? |
<th>"Rolling" capability? |
<a href="../apidocs/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html">
DescriptiveStatistics</a> | min, max, mean, geometric mean, n,
sum, sum of squares, standard deviation, variance, percentiles, skewness,
kurtosis, median</td> | Yes | Yes |
<a href="../apidocs/org/apache/commons/math/stat/descriptive/SummaryStatistics.html">
SummaryStatistics</a> | min, max, mean, geometric mean, n,
sum, sum of squares, standard deviation, variance</td> | No | No |
</table>
</p>
<p>
<code>SummaryStatistics can be aggregated using
<a href="../apidocs/org/apache/commons/math/stat/descriptive/AggregateSummaryStatistics.html">
AggregateSummaryStatistics.</a> This class can be used to concurrently gather statistics for multiple
datasets as well as for a combined sample including all of the data.
</p>
<p>
<code>MultivariateSummaryStatistics is similar to