datascience.tables.Table.group¶
- Table.group(column_or_label, collect=None)[source]¶
Group rows by unique values in a column; count or aggregate others.
- Args:
column_or_label
: values to group (column label or index, or array)collect
: a function applied to values in other columns for each group- Returns:
A Table with each row corresponding to a unique value in
column_or_label
, where the first column contains the unique values fromcolumn_or_label
, and the second contains counts for each of the unique values. Ifcollect
is provided, a Table is returned with all original columns, each containing values calculated by first grouping rows according tocolumn_or_label
, then applyingcollect
to each set of grouped values in the other columns.- Note:
The grouped column will appear first in the result table. If
collect
does not accept arguments with one of the column types, that column will be empty in the resulting table.
>>> marbles = Table().with_columns( ... "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"), ... "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"), ... "Amount", make_array(4, 6, 12, 7, 9, 2), ... "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00)) >>> marbles Color | Shape | Amount | Price Red | Round | 4 | 1.3 Green | Rectangular | 6 | 1.3 Blue | Rectangular | 12 | 2 Red | Round | 7 | 1.75 Green | Rectangular | 9 | 1.4 Green | Round | 2 | 1 >>> marbles.group("Color") # just gives counts Color | count Blue | 1 Green | 3 Red | 2 >>> marbles.group("Color", max) # takes the max of each grouping, in each column Color | Shape max | Amount max | Price max Blue | Rectangular | 12 | 2 Green | Round | 9 | 1.4 Red | Round | 7 | 1.75 >>> marbles.group("Shape", sum) # sum doesn't make sense for strings Shape | Color sum | Amount sum | Price sum Rectangular | | 27 | 4.7 Round | | 13 | 4.05