Utility Functions (datascience.util
)¶
Utility functions

datascience.util.
make_array
(*elements)[source]¶ Returns an array containing all the arguments passed to this function. A simple way to make an array with a few elements.
As with any array, all arguments should have the same type.
>>> make_array(0) array([0]) >>> make_array(2, 3, 4) array([2, 3, 4]) >>> make_array("foo", "bar") array(['foo', 'bar'], dtype='<U3') >>> make_array() array([], dtype=float64)

datascience.util.
percentile
(p, arr=None)[source]¶ Returns the pth percentile of the input array (the value that is at least as great as p% of the values in the array).
If arr is not provided, percentile returns itself curried with p
>>> percentile(74.9, [1, 3, 5, 9]) 5 >>> percentile(75, [1, 3, 5, 9]) 5 >>> percentile(75.1, [1, 3, 5, 9]) 9 >>> f = percentile(75) >>> f([1, 3, 5, 9]) 5

datascience.util.
plot_cdf_area
(rbound=None, lbound=None, mean=0, sd=1)¶ Plots a normal curve with specified parameters and area below curve shaded between
lbound
andrbound
. Args:
rbound
(numeric): right boundary of shaded regionlbound
(numeric): left boundary of shaded region; by default is negative infinitymean
(numeric): mean/expectation of normal distributionsd
(numeric): standard deviation of normal distribution

datascience.util.
plot_normal_cdf
(rbound=None, lbound=None, mean=0, sd=1)[source]¶ Plots a normal curve with specified parameters and area below curve shaded between
lbound
andrbound
. Args:
rbound
(numeric): right boundary of shaded regionlbound
(numeric): left boundary of shaded region; by default is negative infinitymean
(numeric): mean/expectation of normal distributionsd
(numeric): standard deviation of normal distribution

datascience.util.
table_apply
(table, func, subset=None)[source]¶ Applies a function to each column and returns a Table.
Uses pandas apply under the hood, then converts back to a Table
 Args:
 table : instance of Table
 The table to apply your function to
 func : function
 Any function that will work with DataFrame.apply
 subset : list  None
 A list of columns to apply the function to. If None, function will be applied to all columns in table
 tab : instance of Table
 A table with the given function applied. It will either be the shape == shape(table), or shape (1, table.shape[1])

datascience.util.
proportions_from_distribution
(table, label, sample_size, column_name='Random Sample')[source]¶ Adds a column named
column_name
containing the proportions of a random draw using the distribution inlabel
.This method uses
np.random.multinomial
to drawsample_size
samples from the distribution intable.column(label)
, then divides bysample_size
to create the resulting column of proportions.Returns a new
Table
and does not modifytable
. Args:
table
: An instance ofTable
.label
: Label of column intable
. This column must contain a distribution (the values must sum to 1).
sample_size
: The size of the sample to draw from the distribution.column_name
: The name of the new column that contains the sampled proportions. Defaults to
'Random Sample'
.
 Returns:
 A copy of
table
with a columncolumn_name
containing the sampled proportions. The proportions will sum to 1.  Throws:
ValueError
: If thelabel
is not in the table, or iftable.column(label)
does not sum to 1.

datascience.util.
minimize
(f, start=None, smooth=False, log=None, array=False, **vargs)[source]¶ Minimize a function f of one or more arguments.
 Args:
f: A function that takes numbers and returns a number
start: A starting value or list of starting values
smooth: Whether to assume that f is smooth and use firstorder info
log: Logging function called on the result of optimization (e.g. print)
vargs: Other named arguments passed to scipy.optimize.minimize
 Returns either:
 the minimizing argument of a oneargument function
 an array of minimizing arguments of a multiargument function