datascience.tables.Table.pivot_bin

Table.pivot_bin(pivot_columns, value_column, bins=None, **vargs)[source]

Form a table with columns formed by the unique tuples in pivot_columns containing counts per bin of the values associated with each tuple in the value_column.

By default, bins are chosen to contain all values in the value_column. The following named arguments from numpy.histogram can be applied to specialize bin widths:

Args:
bins (int or sequence of scalars): If bins is an int,

it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

range ((float, float)): The lower and upper range of

the bins. If not provided, range contains all values in the table. Values outside the range are ignored.

normed (bool): If False, the result will contain the number of

samples in each bin. If True, the result is normalized such that the integral over the range is 1.

Returns:

New pivot table with unique rows of specified pivot_columns, populated with 0s and 1s with respect to values from value_column distributed into specified bins and range.

Examples:

>>> t = Table.from_records([
...   {
...    'column1':'data1',
...    'column2':86,
...    'column3':'b',
...    'column4':5,
...   },
...   {
...    'column1':'data2',
...    'column2':51,
...    'column3':'c',
...    'column4':3,
...   },
...   {
...    'column1':'data3',
...    'column2':32,
...    'column3':'a',
...    'column4':6,
...   }
... ])
>>> t
column1 | column2 | column3 | column4
data1   | 86      | b       | 5
data2   | 51      | c       | 3
data3   | 32      | a       | 6
>>> t.pivot_bin(pivot_columns='column1',value_column='column2')
bin  | data1 | data2 | data3
32   | 0     | 0     | 1
37.4 | 0     | 0     | 0
42.8 | 0     | 0     | 0
48.2 | 0     | 1     | 0
53.6 | 0     | 0     | 0
59   | 0     | 0     | 0
64.4 | 0     | 0     | 0
69.8 | 0     | 0     | 0
75.2 | 0     | 0     | 0
80.6 | 1     | 0     | 0
... (1 rows omitted)
>>> t.pivot_bin(pivot_columns=['column1','column2'],value_column='column4')
bin  | data1-86 | data2-51 | data3-32
3    | 0        | 1        | 0
3.3  | 0        | 0        | 0
3.6  | 0        | 0        | 0
3.9  | 0        | 0        | 0
4.2  | 0        | 0        | 0
4.5  | 0        | 0        | 0
4.8  | 1        | 0        | 0
5.1  | 0        | 0        | 0
5.4  | 0        | 0        | 0
5.7  | 0        | 0        | 1
... (1 rows omitted)
>>> t.pivot_bin(pivot_columns='column1',value_column='column2',bins=[20,45,100])
bin  | data1 | data2 | data3
20   | 0     | 0     | 1
45   | 1     | 1     | 0
100  | 0     | 0     | 0
>>> t.pivot_bin(pivot_columns='column1',value_column='column2',bins=5,range=[30,60])
bin  | data1 | data2 | data3
30   | 0     | 0     | 1
36   | 0     | 0     | 0
42   | 0     | 0     | 0
48   | 0     | 1     | 0
54   | 0     | 0     | 0
60   | 0     | 0     | 0