datascience.tables.Table.hist_of_counts

Table.hist_of_counts(*columns, overlay=True, bins=None, bin_column=None, group=None, side_by_side=False, width=6, height=4, **vargs)[source]

Plots one count-based histogram for each column in columns. The heights of each bar will represent the counts, and all the bins must be of equal size.

If no column is specified, plot all columns.

Kwargs:
overlay (bool): If True, plots 1 chart with all the histograms

overlaid on top of each other (instead of the default behavior of one histogram for each column in the table). Also adds a legend that matches each bar color to its column. Note that if the histograms are not overlaid, they are not forced to the same scale.

bins (array or int): Lower bound for each bin in the

histogram or number of bins. If None, bins will be chosen automatically.

bin_column (column name or index): A column of bin lower bounds.

All other columns are treated as counts of these bins. If None, each value in each row is assigned a count of 1.

group (column name or index): A column of categories. The rows are

grouped by the values in this column, and a separate histogram is generated for each group. The histograms are overlaid or plotted separately depending on the overlay argument. If None, no such grouping is done.

side_by_side (bool): Whether histogram bins should be plotted side by

side (instead of directly overlaid). Makes sense only when plotting multiple histograms, either by passing several columns or by using the group option.

vargs: Additional arguments that get passed into :func:plt.hist.

See http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist for additional arguments that can be passed into vargs. These include: range, cumulative, and orientation, to name a few.

>>> t = Table().with_columns(
...     'count',  make_array(9, 3, 3, 1),
...     'points', make_array(1, 2, 2, 10))
>>> t
count | points
9     | 1
3     | 2
3     | 2
1     | 10
>>> t.hist_of_counts() 
<histogram of values in count with counts on y-axis>
<histogram of values in points with counts on y-axis>
>>> t = Table().with_columns(
...     'value', make_array(101, 102, 103),
...     'count', make_array(5, 10, 5))
>>> t.hist_of_counts(bin_column='value') 
<histogram of values weighted by corresponding counts>
>>> t = Table().with_columns(
...     'value',    make_array(1,   2,   3,   2,   5  ),
...     'category', make_array('a', 'a', 'a', 'b', 'b'))
>>> t.hist('value', group='category') 
<two overlaid histograms of the data [1, 2, 3] and [2, 5]>