datascience.tables.Table.sample

Table.sample(k=None, with_replacement=True, weights=None)[source]

Return a new table where k rows are randomly sampled from the original table.

Args:
k – specifies the number of rows (int) to be sampled from
the table. Default is k equal to number of rows in the table.
with_replacement – (bool) By default True;
Samples k rows with replacement from table, else samples k rows without replacement.
weights – Array specifying probability the ith row of the
table is sampled. Defaults to None, which samples each row with equal probability. weights must be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.
Raises:
ValueError – if weights is not length equal to number of rows
in the table; or, if weights does not sum to 1.
Returns:
A new instance of Table with k rows resampled.
>>> jobs = Table().with_columns(
...     'job',  make_array('a', 'b', 'c', 'd'),
...     'wage', make_array(10, 20, 15, 8))
>>> jobs
job  | wage
a    | 10
b    | 20
c    | 15
d    | 8
>>> jobs.sample() # doctest: +SKIP
job  | wage
b    | 20
b    | 20
a    | 10
d    | 8
>>> jobs.sample(with_replacement=True) # doctest: +SKIP
job  | wage
d    | 8
b    | 20
c    | 15
a    | 10
>>> jobs.sample(k = 2) # doctest: +SKIP
job  | wage
b    | 20
c    | 15
>>> ws =  make_array(0.5, 0.5, 0, 0)
>>> jobs.sample(k=2, with_replacement=True, weights=ws) # doctest: +SKIP
job  | wage
a    | 10
a    | 10
>>> jobs.sample(k=2, weights=make_array(1, 0, 1, 0))
Traceback (most recent call last):
    ...
ValueError: probabilities do not sum to 1

# Weights must be length of table. >>> jobs.sample(k=2, weights=make_array(1, 0, 0)) Traceback (most recent call last):

ValueError: a and p must have same size