datascience.tables.Table.sample_from_distribution

Table.sample_from_distribution(distribution, k, proportions=False)[source]

Return a new table with the same number of rows and a new column. The values in the distribution column are define a multinomial. They are replaced by sample counts/proportions in the output.

>>> sizes = Table(['size', 'count']).with_rows([
...     ['small', 50],
...     ['medium', 100],
...     ['big', 50],
... ])
>>> np.random.seed(99)
>>> sizes.sample_from_distribution('count', 1000)
size   | count | count sample
small  | 50    | 228
medium | 100   | 508
big    | 50    | 264
>>> sizes.sample_from_distribution('count', 1000, True)
size   | count | count sample
small  | 50    | 0.261
medium | 100   | 0.491
big    | 50    | 0.248