|
Java example source code file (Quantiles.java)
The Quantiles.java Java example source code/* * Copyright (C) 2014 The Guava Authors * * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except * in compliance with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software distributed under the License * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express * or implied. See the License for the specific language governing permissions and limitations under * the License. */ package com.google.common.math; import static com.google.common.base.Preconditions.checkArgument; import static java.lang.Double.NEGATIVE_INFINITY; import static java.lang.Double.NaN; import static java.lang.Double.POSITIVE_INFINITY; import static java.util.Arrays.sort; import static java.util.Collections.unmodifiableMap; import com.google.common.annotations.Beta; import com.google.common.annotations.GwtIncompatible; import com.google.common.primitives.Doubles; import com.google.common.primitives.Ints; import java.math.RoundingMode; import java.util.Collection; import java.util.HashMap; import java.util.Map; /** * Provides a fluent API for calculating * <a href="http://en.wikipedia.org/wiki/Quantile">quantiles. * * <h3>Examples * * <p>To compute the median: * <pre> {@code * * double myMedian = median().compute(myDataset);}</pre> * * where {@link #median()} has been statically imported. * * <p>To compute the 99th percentile: * <pre> {@code * * double myPercentile99 = percentiles().index(99).compute(myDataset);}</pre> * * where {@link #percentiles()} has been statically imported. * * <p>To compute median and the 90th and 99th percentiles: * <pre> {@code * * Map<Integer, Double> myPercentiles = * percentiles().indexes(50, 90, 99).compute(myDataset);}</pre> * * where {@link #percentiles()} has been statically imported: {@code myPercentiles} maps the keys * 50, 90, and 99, to their corresponding quantile values. * * <p>To compute quartiles, use {@link #quartiles()} instead of {@link #percentiles()}. To compute * arbitrary q-quantiles, use {@link #scale scale(q)}. * * <p>These examples all take a copy of your dataset. If you have a double array, you are okay with * it being arbitrarily reordered, and you want to avoid that copy, you can use * {@code computeInPlace} instead of {@code compute}. * * <h3>Definition and notes on interpolation * * <p>The definition of the kth q-quantile of N values is as follows: define x = k * (N - 1) / q; if * x is an integer, the result is the value which would appear at index x in the sorted dataset * (unless there are {@link Double#NaN NaN} values, see below); otherwise, the result is the average * of the values which would appear at the indexes floor(x) and ceil(x) weighted by (1-frac(x)) and * frac(x) respectively. This is the same definition as used by Excel and by S, it is the Type 7 * definition in * <a href="http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html">R, and it is * described by * <a href="http://en.wikipedia.org/wiki/Quantile#Estimating_the_quantiles_of_a_population"> * wikipedia</a> as providing "Linear interpolation of the modes for the order statistics for the * uniform distribution on [0,1]." * * <h3>Handling of non-finite values * * <p>If any values in the input are {@link Double#NaN NaN} then all values returned are * {@link Double#NaN NaN}. (This is the one occasion when the behaviour is not the same as you'd get * from sorting with {@link java.util.Arrays#sort(double[]) Arrays.sort(double[])} or * {@link java.util.Collections#sort(java.util.List) Collections.sort(List<Double>)} and * selecting the required value(s). Those methods would sort {@link Double#NaN NaN} as if it is * greater than any other value and place them at the end of the dataset, even after * {@link Double#POSITIVE_INFINITY POSITIVE_INFINITY}.) * * <p>Otherwise, {@link Double#NEGATIVE_INFINITY NEGATIVE_INFINITY} and * {@link Double#POSITIVE_INFINITY POSITIVE_INFINITY} sort to the beginning and the end of the * dataset, as you would expect. * * <p>If required to do a weighted average between an infinity and a finite value, or between an * infinite value and itself, the infinite value is returned. If required to do a weighted average * between {@link Double#NEGATIVE_INFINITY NEGATIVE_INFINITY} and {@link Double#POSITIVE_INFINITY * POSITIVE_INFINITY}, {@link Double#NaN NaN} is returned (note that this will only happen if the * dataset contains no finite values). * * <h3>Performance * * <p>The average time complexity of the computation is O(N) in the size of the dataset. There is a * worst case time complexity of O(N^2). You are extremely unlikely to hit this quadratic case on * randomly ordered data (the probability decreases faster than exponentially in N), but if you are * passing in unsanitized user data then a malicious user could force it. A light shuffle of the * data using an unpredictable seed should normally be enough to thwart this attack. * * <p>The time taken to compute multiple quantiles on the same dataset using {@link Scale#indexes * indexes} is generally less than the total time taken to compute each of them separately, and * sometimes much less. For example, on a large enough dataset, computing the 90th and 99th * percentiles together takes about 55% as long as computing them separately. * * <p>When calling {@link ScaleAndIndex#compute} (in {@linkplain ScaleAndIndexes#compute either * form}), the memory requirement is 8*N bytes for the copy of the dataset plus an overhead which is * independent of N (but depends on the quantiles being computed). When calling * {@link ScaleAndIndex#computeInPlace computeInPlace} (in * {@linkplain ScaleAndIndexes#computeInPlace either form}), only the overhead is required. The * number of object allocations is independent of N in both cases. * * @author Pete Gillin * @since 20.0 */ @Beta @GwtIncompatible public final class Quantiles { /** * Specifies the computation of a median (i.e. the 1st 2-quantile). */ public static ScaleAndIndex median() { return scale(2).index(1); } /** * Specifies the computation of quartiles (i.e. 4-quantiles). */ public static Scale quartiles() { return scale(4); } /** * Specifies the computation of percentiles (i.e. 100-quantiles). */ public static Scale percentiles() { return scale(100); } /** * Specifies the computation of q-quantiles. * * @param scale the scale for the quantiles to be calculated, i.e. the q of the q-quantiles, which * must be positive */ public static Scale scale(int scale) { return new Scale(scale); } /** * Describes the point in a fluent API chain where only the scale (i.e. the q in q-quantiles) has * been specified. */ public static final class Scale { private final int scale; private Scale(int scale) { checkArgument(scale > 0, "Quantile scale must be positive"); this.scale = scale; } /** * Specifies a single quantile index to be calculated, i.e. the k in the kth q-quantile. * * @param index the quantile index, which must be in the inclusive range [0, q] for q-quantiles */ public ScaleAndIndex index(int index) { return new ScaleAndIndex(scale, index); } /** * Specifies multiple quantile indexes to be calculated, each index being the k in the kth * q-quantile. * * @param indexes the quantile indexes, each of which must be in the inclusive range [0, q] for * q-quantiles; the order of the indexes is unimportant, duplicates will be ignored, and the * set will be snapshotted when this method is called */ public ScaleAndIndexes indexes(int... indexes) { return new ScaleAndIndexes(scale, indexes.clone()); } /** * Specifies multiple quantile indexes to be calculated, each index being the k in the kth * q-quantile. * * @param indexes the quantile indexes, each of which must be in the inclusive range [0, q] for * q-quantiles; the order of the indexes is unimportant, duplicates will be ignored, and the * set will be snapshotted when this method is called */ public ScaleAndIndexes indexes(Collection<Integer> indexes) { return new ScaleAndIndexes(scale, Ints.toArray(indexes)); } } /** * Describes the point in a fluent API chain where the scale and a single quantile index (i.e. the * q and the k in the kth q-quantile) have been specified. */ public static final class ScaleAndIndex { private final int scale; private final int index; private ScaleAndIndex(int scale, int index) { checkIndex(index, scale); this.scale = scale; this.index = index; } /** * Computes the quantile value of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles (with any associated lost of precision), and which will not be mutated by * this call (it is copied instead) * @return the quantile value */ public double compute(Collection<? extends Number> dataset) { return computeInPlace(Doubles.toArray(dataset)); } /** * Computes the quantile value of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will not * be mutated by this call (it is copied instead) * @return the quantile value */ public double compute(double... dataset) { return computeInPlace(dataset.clone()); } /** * Computes the quantile value of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles (with any associated lost of precision), and which will not be mutated by * this call (it is copied instead) * @return the quantile value */ public double compute(long... dataset) { return computeInPlace(longsToDoubles(dataset)); } /** * Computes the quantile value of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles, and which will not be mutated by this call (it is copied instead) * @return the quantile value */ public double compute(int... dataset) { return computeInPlace(intsToDoubles(dataset)); } /** * Computes the quantile value of the given dataset, performing the computation in-place. * * @param dataset the dataset to do the calculation on, which must be non-empty, and which will * be arbitrarily reordered by this method call * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public double computeInPlace(double... dataset) { checkArgument(dataset.length > 0, "Cannot calculate quantiles of an empty dataset"); if (containsNaN(dataset)) { return NaN; } // Calculate the quotient and remainder in the integer division x = k * (N-1) / q, i.e. // index * (dataset.length - 1) / scale. If there is no remainder, we can just find the value // whose index in the sorted dataset equals the quotient; if there is a remainder, we // interpolate between that and the next value. // Since index and (dataset.length - 1) are non-negative ints, their product can be expressed // as a long, without risk of overflow: long numerator = (long) index * (dataset.length - 1); // Since scale is a positive int, index is in [0, scale], and (dataset.length - 1) is a // non-negative int, we can do long-arithmetic on index * (dataset.length - 1) / scale to get // a rounded ratio and a remainder which can be expressed as ints, without risk of overflow: int quotient = (int) LongMath.divide(numerator, scale, RoundingMode.DOWN); int remainder = (int) (numerator - (long) quotient * scale); selectInPlace(quotient, dataset, 0, dataset.length - 1); if (remainder == 0) { return dataset[quotient]; } else { selectInPlace(quotient + 1, dataset, quotient + 1, dataset.length - 1); return interpolate(dataset[quotient], dataset[quotient + 1], remainder, scale); } } } /** * Describes the point in a fluent API chain where the scale and a multiple quantile indexes (i.e. * the q and a set of values for the k in the kth q-quantile) have been specified. */ public static final class ScaleAndIndexes { private final int scale; private final int[] indexes; private ScaleAndIndexes(int scale, int[] indexes) { for (int index : indexes) { checkIndex(index, scale); } this.scale = scale; this.indexes = indexes; } /** * Computes the quantile values of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles (with any associated lost of precision), and which will not be mutated by * this call (it is copied instead) * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public Map<Integer, Double> compute(Collection dataset) { return computeInPlace(Doubles.toArray(dataset)); } /** * Computes the quantile values of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will not * be mutated by this call (it is copied instead) * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public Map<Integer, Double> compute(double... dataset) { return computeInPlace(dataset.clone()); } /** * Computes the quantile values of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles (with any associated lost of precision), and which will not be mutated by * this call (it is copied instead) * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public Map<Integer, Double> compute(long... dataset) { return computeInPlace(longsToDoubles(dataset)); } /** * Computes the quantile values of the given dataset. * * @param dataset the dataset to do the calculation on, which must be non-empty, which will be * cast to doubles, and which will not be mutated by this call (it is copied instead) * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public Map<Integer, Double> compute(int... dataset) { return computeInPlace(intsToDoubles(dataset)); } /** * Computes the quantile values of the given dataset, performing the computation in-place. * * @param dataset the dataset to do the calculation on, which must be non-empty, and which will * be arbitrarily reordered by this method call * @return an unmodifiable map of results: the keys will be the specified quantile indexes, and * the values the corresponding quantile values */ public Map<Integer, Double> computeInPlace(double... dataset) { checkArgument(dataset.length > 0, "Cannot calculate quantiles of an empty dataset"); if (containsNaN(dataset)) { Map<Integer, Double> nanMap = new HashMap Other Java examples (source code examples)Here is a short list of links related to this Java Quantiles.java source code file: |
... this post is sponsored by my books ... | |
#1 New Release! |
FP Best Seller |
Copyright 1998-2021 Alvin Alexander, alvinalexander.com
All Rights Reserved.
A percentage of advertising revenue from
pages under the /java/jwarehouse
URI on this website is
paid back to open source projects.