Utility Functions (datascience.util)¶
Utility functions
- datascience.util.is_non_string_iterable(value)[source]¶
 Returns a boolean value representing whether a value is iterable.
- datascience.util.make_array(*elements)[source]¶
 Returns an array containing all the arguments passed to this function. A simple way to make an array with a few elements.
As with any array, all arguments should have the same type.
- Args:
 elements(variadic): elements- Returns:
 A NumPy array of same length as the provided varadic argument
elements
>>> make_array(0) array([0]) >>> make_array(2, 3, 4) array([2, 3, 4]) >>> make_array("foo", "bar") array(['foo', 'bar'], dtype='<U3') >>> make_array(True, False) array([ True, False], dtype=bool) >>> make_array() array([], dtype=float64)
- datascience.util.minimize(f, start=None, smooth=False, log=None, array=False, **vargs)[source]¶
 Minimize a function f of one or more arguments.
- Args:
 f: A function that takes numbers and returns a number
start: A starting value or list of starting values
smooth: Whether to assume that f is smooth and use first-order info
log: Logging function called on the result of optimization (e.g. print)
vargs: Other named arguments passed to scipy.optimize.minimize
- Returns either:
 the minimizing argument of a one-argument function
an array of minimizing arguments of a multi-argument function
- datascience.util.percentile(p, arr=None)[source]¶
 Returns the pth percentile of the input array (the value that is at least as great as p% of the values in the array).
If arr is not provided, percentile returns itself curried with p
>>> percentile(74.9, [1, 3, 5, 9]) 5 >>> percentile(75, [1, 3, 5, 9]) 5 >>> percentile(75.1, [1, 3, 5, 9]) 9 >>> f = percentile(75) >>> f([1, 3, 5, 9]) 5
- datascience.util.plot_cdf_area(rbound=None, lbound=None, mean=0, sd=1)¶
 Plots a normal curve with specified parameters and area below curve shaded between
lboundandrbound.- Args:
 rbound(numeric): right boundary of shaded regionlbound(numeric): left boundary of shaded region; by default is negative infinitymean(numeric): mean/expectation of normal distributionsd(numeric): standard deviation of normal distribution
- datascience.util.plot_normal_cdf(rbound=None, lbound=None, mean=0, sd=1)[source]¶
 Plots a normal curve with specified parameters and area below curve shaded between
lboundandrbound.- Args:
 rbound(numeric): right boundary of shaded regionlbound(numeric): left boundary of shaded region; by default is negative infinitymean(numeric): mean/expectation of normal distributionsd(numeric): standard deviation of normal distribution
- datascience.util.proportions_from_distribution(table, label, sample_size, column_name='Random Sample', seed=None)[source]¶
 Adds a column named
column_namecontaining the proportions of a random draw using the distribution inlabel.This method uses
np.random.Generator.multinomialto drawsample_sizesamples from the distribution intable.column(label), then divides bysample_sizeto create the resulting column of proportions.- Args:
 table: An instance ofTable.label: Label of column intable. This column must contain adistribution (the values must sum to 1).
sample_size: The size of the sample to draw from the distribution.column_name: The name of the new column that contains the sampledproportions. Defaults to
'Random Sample'.
seed: Optional seed for reproducibility. If None, results will be random.- Returns:
 A copy of
tablewith a columncolumn_namecontaining the sampled proportions. The proportions will sum to 1.- Throws:
 ValueError: If thelabelis not in the table, or iftable.column(label)does not sum to 1.
- datascience.util.sample_proportions(sample_size: int, probabilities, seed=None)[source]¶
 Return the proportion of random draws for each outcome in a distribution.
This function is similar to np.random.Generator.multinomial, but returns proportions instead of counts.
- Args:
 sample_size: The size of the sample to draw from the distribution.probabilities: An array of probabilities that forms a distribution.seed: Optional seed for reproducibility. If None, results will be random.- Returns:
 An array with the same length as
probabilitythat sums to 1.
- datascience.util.table_apply(table, func, subset=None)[source]¶
 Applies a function to each column and returns a Table.
- Args:
 table: The table to apply your function to.func: The function to apply to each column.subset: A list of columns to apply the function to; if None,the function will be applied to all columns in table.
- Returns:
 A table with the given function applied. It will either be the shape == shape(table), or shape (1, table.shape[1])