datascience.tables.Table.sample¶
- Table.sample(k=None, with_replacement=True, weights=None)[source]¶
Return a new table where k rows are randomly sampled from the original table.
- Args:
k
– specifies the number of rows (int
) to be sampled fromthe table. Default is k equal to number of rows in the table.
with_replacement
– (bool
) By default True;Samples
k
rows with replacement from table, else samplesk
rows without replacement.weights
– Array specifying probability the ith row of thetable is sampled. Defaults to None, which samples each row with equal probability.
weights
must be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.
- Raises:
- ValueError – if
weights
is not length equal to number of rows in the table; or, if
weights
does not sum to 1.
- ValueError – if
- Returns:
A new instance of
Table
withk
rows resampled.
>>> jobs = Table().with_columns( ... 'job', make_array('a', 'b', 'c', 'd'), ... 'wage', make_array(10, 20, 15, 8)) >>> jobs job | wage a | 10 b | 20 c | 15 d | 8 >>> jobs.sample() job | wage b | 20 b | 20 a | 10 d | 8 >>> jobs.sample(with_replacement=True) job | wage d | 8 b | 20 c | 15 a | 10 >>> jobs.sample(k = 2) job | wage b | 20 c | 15 >>> ws = make_array(0.5, 0.5, 0, 0) >>> jobs.sample(k=2, with_replacement=True, weights=ws) job | wage a | 10 a | 10 >>> jobs.sample(k=2, weights=make_array(1, 0, 1, 0)) Traceback (most recent call last): ... ValueError: probabilities do not sum to 1 >>> jobs.sample(k=2, weights=make_array(1, 0, 0)) # Weights must be length of table. Traceback (most recent call last): ... ValueError: 'a' and 'p' must have same size