Aggregations#
Aggregations create a new value by summarizing a Column. For
example, Mean, when applied to a column containing Number
data, returns a single decimal.Decimal value which is the average of
all values in that column.
Aggregations can be applied to single columns using the Table.aggregate()
method. The result is a single value if a one aggregation was applied, or
a tuple of values if a sequence of aggregations was applied.
Aggregations can be applied to instances of TableSet using the
TableSet.aggregate() method. The result is a new Table
with a column for each aggregation and a row for each table in the set.
Aggregations create a new value by summarizing a |
|
Apply an arbitrary function to a column. |
Basic aggregations#
Check if all values in a column pass a test. |
|
Check if any value in a column passes a test. |
|
Count occurences of a value or values. |
|
Check if the column contains null values. |
|
Find the minimum value in a column. |
|
Find the maximum value in a column. |
|
Find the most decimal places present for any value in this column. |
Statistical aggregations#
Calculate the deciles of a column based on its percentiles. |
|
Calculate the interquartile range of a column. |
|
Calculate the median absolute deviation of a column. |
|
Calculate the mean of a column. |
|
Calculate the median of a column. |
|
Calculate the mode of a column. |
|
Divide a column into 100 equal-size groups using the "CDF" method. |
|
Calculate the population standard of deviation of a column. |
|
Calculate the population variance of a column. |
|
Calculate the quartiles of column based on its percentiles. |
|
Calculate the quintiles of a column based on its percentiles. |
|
Calculate the sample standard of deviation of a column. |
|
Calculate the sum of a column. |
|
Calculate the sample variance of a column. |
Text aggregations#
Find the length of the longest string in a column. |
Detailed list#
- class agate.Aggregation#
Bases:
objectAggregations create a new value by summarizing a
Column.Aggregations are applied with
Table.aggregate()andTableSet.aggregate().When creating a custom aggregation, ensure that the values returned by
Aggregation.run()are of the type specified byAggregation.get_aggregate_data_type(). This can be ensured by using theDataType.cast()method. SeeSummaryfor an example.- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.All(column_name, test)#
Bases:
AggregationCheck if all values in a column pass a test.
- Parameters:
column_name – The name of the column to check.
test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.Any(column_name, test)#
Bases:
AggregationCheck if any value in a column passes a test.
- Parameters:
column_name – The name of the column to check.
test – Either a single value that all values in the column are compared against (for equality) or a function that takes a column value and returns True or False.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Count(column_name=None, value=<object object>)#
Bases:
AggregationCount occurences of a value or values.
This aggregation can be used in three ways:
If no arguments are specified, then it will count the number of rows in the table.
If only
column_nameis specified, then it will count the number of non-null values in that column.If both
column_nameandvalueare specified, then it will count occurrences of a specific value.
- Parameters:
column_name – The column containing the values to be counted.
value – Any value to be counted, including
None.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Deciles(column_name)#
Bases:
AggregationCalculate the deciles of a column based on its percentiles.
Deciles will be equivalent to the 10th, 20th … 90th percentiles.
“Zeroth” (min value) and “Tenth” (max value) deciles are included for reference and intuitive indexing.
See
Percentilesfor implementation details.This aggregation can not be applied to a
TableSet.- Parameters:
column_name – The name of a column containing
Numberdata.
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.HasNulls(column_name)#
Bases:
AggregationCheck if the column contains null values.
- Parameters:
column_name – The name of the column to check.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.IQR(column_name)#
Bases:
AggregationCalculate the interquartile range of a column.
- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.MAD(column_name)#
Bases:
AggregationCalculate the median absolute deviation of a column.
- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Min(column_name)#
Bases:
AggregationFind the minimum value in a column.
This aggregation can be applied to columns containing
Date,DateTime, orNumberdata.- Parameters:
column_name – The name of the column to be searched.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Max(column_name)#
Bases:
AggregationFind the maximum value in a column.
This aggregation can be applied to columns containing
Date,DateTime, orNumberdata.- Parameters:
column_name – The name of the column to be searched.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.MaxLength(column_name)#
Bases:
AggregationFind the length of the longest string in a column.
Note: On Python 2.7 this function may miscalcuate the length of unicode strings that contain “wide characters”. For details see this StackOverflow answer: https://stackoverflow.com/a/35462951
- Parameters:
column_name – The name of a column containing
Textdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.MaxPrecision(column_name)#
Bases:
AggregationFind the most decimal places present for any value in this column.
- Parameters:
column_name – The name of the column to be searched.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Mean(column_name)#
Bases:
AggregationCalculate the mean of a column.
- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Median(column_name)#
Bases:
AggregationCalculate the median of a column.
Median is equivalent to the 50th percentile. See
Percentilesfor implementation details.- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Mode(column_name)#
Bases:
AggregationCalculate the mode of a column.
- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Percentiles(column_name)#
Bases:
AggregationDivide a column into 100 equal-size groups using the “CDF” method.
See this explanation of the various methods for computing percentiles.
“Zeroth” (min value) and “Hundredth” (max value) percentiles are included for reference and intuitive indexing.
A reference implementation was provided by pycalcstats.
This aggregation can not be applied to a
TableSet.- Parameters:
column_name – The name of a column containing
Numberdata.
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.PopulationStDev(column_name)#
Bases:
StDevCalculate the population standard of deviation of a column.
For the sample standard of deviation see
StDev.- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.PopulationVariance(column_name)#
Bases:
VarianceCalculate the population variance of a column.
For the sample variance see
Variance.- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Quartiles(column_name)#
Bases:
AggregationCalculate the quartiles of column based on its percentiles.
Quartiles will be equivalent to the the 25th, 50th and 75th percentiles.
“Zeroth” (min value) and “Fourth” (max value) quartiles are included for reference and intuitive indexing.
See
Percentilesfor implementation details.This aggregation can not be applied to a
TableSet.- Parameters:
column_name – The name of a column containing
Numberdata.
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.Quintiles(column_name)#
Bases:
AggregationCalculate the quintiles of a column based on its percentiles.
Quintiles will be equivalent to the 20th, 40th, 60th and 80th percentiles.
“Zeroth” (min value) and “Fifth” (max value) quintiles are included for reference and intuitive indexing.
See
Percentilesfor implementation details.This aggregation can not be applied to a
TableSet.- Parameters:
column_name – The name of a column containing
Numberdata.
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- class agate.StDev(column_name)#
Bases:
AggregationCalculate the sample standard of deviation of a column.
For the population standard of deviation see
PopulationStDev.- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Sum(column_name)#
Bases:
AggregationCalculate the sum of a column.
- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Summary(column_name, data_type, func, cast=True)#
Bases:
AggregationApply an arbitrary function to a column.
- Parameters:
column_name – The name of a column to be summarized.
data_type – The return type of this aggregation.
func – A function which will be passed the column for processing.
cast – If
True, each return value will be cast to the specifieddata_typeto ensure it is valid. Only disable this if you are certain your summary always returns the correct type.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- run(table)#
Execute this aggregation on a given column and return the result.
- class agate.Variance(column_name)#
Bases:
AggregationCalculate the sample variance of a column.
For the population variance see
PopulationVariance.- Parameters:
column_name – The name of a column containing
Numberdata.
- get_aggregate_data_type(table)#
Get the data type that should be used when using this aggregation with a
TableSetto produce a new column.Should raise
UnsupportedAggregationErrorif this column does not support aggregation into aTableSet. (For example, if it does not return a single value.)
- validate(table)#
Perform any checks necessary to verify this aggregation can run on the provided table without errors. This is called by
Table.aggregate()beforerun().
- run(table)#
Execute this aggregation on a given column and return the result.