# Utility Functions (`datascience.util`)¶

Utility functions

datascience.util.is_non_string_iterable(value)[source]

Whether a value is iterable.

datascience.util.make_array(*elements)[source]

Returns an array containing all the arguments passed to this function. A simple way to make an array with a few elements.

As with any array, all arguments should have the same type.

```>>> make_array(0)
array()
>>> make_array(2, 3, 4)
array([2, 3, 4])
>>> make_array("foo", "bar")
array(['foo', 'bar'],
dtype='<U3')
>>> make_array()
array([], dtype=float64)
```
datascience.util.minimize(f, start=None, smooth=False, log=None, array=False, **vargs)[source]

Minimize a function f of one or more arguments.

Args:

f: A function that takes numbers and returns a number

start: A starting value or list of starting values

smooth: Whether to assume that f is smooth and use first-order info

log: Logging function called on the result of optimization (e.g. print)

vargs: Other named arguments passed to scipy.optimize.minimize

Returns either:
1. the minimizing argument of a one-argument function

2. an array of minimizing arguments of a multi-argument function

datascience.util.percentile(p, arr=None)[source]

Returns the pth percentile of the input array (the value that is at least as great as p% of the values in the array).

If arr is not provided, percentile returns itself curried with p

```>>> percentile(74.9, [1, 3, 5, 9])
5
>>> percentile(75, [1, 3, 5, 9])
5
>>> percentile(75.1, [1, 3, 5, 9])
9
>>> f = percentile(75)
>>> f([1, 3, 5, 9])
5
```
datascience.util.plot_cdf_area(rbound=None, lbound=None, mean=0, sd=1)

Plots a normal curve with specified parameters and area below curve shaded between `lbound` and `rbound`.

Args:

`rbound` (numeric): right boundary of shaded region

`lbound` (numeric): left boundary of shaded region; by default is negative infinity

`mean` (numeric): mean/expectation of normal distribution

`sd` (numeric): standard deviation of normal distribution

datascience.util.plot_normal_cdf(rbound=None, lbound=None, mean=0, sd=1)[source]

Plots a normal curve with specified parameters and area below curve shaded between `lbound` and `rbound`.

Args:

`rbound` (numeric): right boundary of shaded region

`lbound` (numeric): left boundary of shaded region; by default is negative infinity

`mean` (numeric): mean/expectation of normal distribution

`sd` (numeric): standard deviation of normal distribution

datascience.util.proportions_from_distribution(table, label, sample_size, column_name='Random Sample')[source]

Adds a column named `column_name` containing the proportions of a random draw using the distribution in `label`.

This method uses `np.random.multinomial` to draw `sample_size` samples from the distribution in `table.column(label)`, then divides by `sample_size` to create the resulting column of proportions.

Args:

`table`: An instance of `Table`.

`label`: Label of column in `table`. This column must contain a

distribution (the values must sum to 1).

`sample_size`: The size of the sample to draw from the distribution.

`column_name`: The name of the new column that contains the sampled

proportions. Defaults to `'Random Sample'`.

Returns:

A copy of `table` with a column `column_name` containing the sampled proportions. The proportions will sum to 1.

Throws:
`ValueError`: If the `label` is not in the table, or if

`table.column(label)` does not sum to 1.

datascience.util.sample_proportions(sample_size, probabilities)[source]

Return the proportion of random draws for each outcome in a distribution.

This function is similar to np.random.multinomial, but returns proportions instead of counts.

Args:

`sample_size`: The size of the sample to draw from the distribution.

`probabilities`: An array of probabilities that forms a distribution.

Returns:

An array with the same length as `probability` that sums to 1.

datascience.util.table_apply(table, func, subset=None)[source]

Applies a function to each column and returns a Table.

Args:

`table`: The table to apply your function to.

`func`: The function to apply to each column.

`subset`: A list of columns to apply the function to; if None,

the function will be applied to all columns in table.

Returns:

A table with the given function applied. It will either be the shape == shape(table), or shape (1, table.shape)