## Resources

### Final Review!

• Comprehensive review of statistical concepts: steps, examples, purpose of topics like hypothesis testing, confidence intervals, correlation, regression, classification, two-sample inference, central limit theorem, Bayes' Rule, and more. Thanks to Francie McQuarrie for this!

### Lab Slides

Lab slides for Nanxi and Katherine's section (MW, 1-3pm)

Midterms:

Finals:

### Staff Solutions

Please note that you will need to be signed into your berkeley.edu email account as your default account to access the Google Drive folders.

### Discussion Video Walkthroughs

We've compiled a list of additional questions for our datasets here if you'd like more practice or want to do your own independent data investigation.

### Table Functions and Methods

In the examples in the left column, `np` refers to the NumPy module, as usual. Everything else is a function, a method, an example of an argument to a function or method, or an example of an object we might call the method on. For example, `tbl` refers to a table, `array` refers to an array, and `num` refers to a number. `array.item(0)` is an example call for the method `item`, and in that example, `array` is the name previously given to some array.

Example Function Call Chapter Description
`Table()` 5 Creates an empty table, usually to extend with data.
`Table().read_table(filename)` 5 Creates a table from a data file.
`tbl.with_column(name, values)`
`tbl.with_columns(n1, v1, n2, v2, ...)`
5 A table with an additional or replaced column or columns. `name`is a string for the name of a column, ` values ` is an array.
`tbl.column(column_name_or_index)` 5 The values of a column (an array)
`tbl.num_rows` 5 The number of rows in a table.
`tbl.num_columns` 5 The number of columns in a table.
`tbl.labels` 5 A list of the column labels in a table.
`tbl.select(col1, col2, ...)` 5 Creates a copy of a table with only selected columns. Each column is the column name or index.
`tbl.drop(col1, col2, ...)` 5 Creates a copy of a table without selected columns. Each column is the column name or index.
`tbl.relabel(old_label, new_label)` 5 Modifies the existing table in place, changing the column heading in the first argument to the second.
`tbl.relabeled(old_label, new_label)` 5 Returns a new table with the column heading in the first argument changed to the second.
`tbl.sort(column_name_or_index)` 5.1 Creates a copy of a table sorted by the values in a column. Defaults to ascending order unless optional argument "descending = True" is included.
`tbl.where(column, predicate)` 5.2 A table of the rows for which the column satisfies some predicate. See `Table.where predicates ` below.
`tbl.take(row_indices)` 5.2 A table with only the rows at the given indices. `row_indices` is an array of indices.
`tbl.scatter(x_column, y_column)` 6 Draws a scatter plot consisting of one point for each row of the table. Note that `x_column` and `y_column` must be strings specifying column names.
`tbl.barh(categories)`
`tbl.barh(categories, values)`
6.1 Displays a bar chart with bars for each category in a column, with height proportional to the corresponding frequency. values argument unnecessary if table has only a column of categories and a column of values.
`tbl.hist(column, units, bins)` 6.2 Generates a histogram of the numerical values in a column. `units` and `bins` are optional arguments, used to label the axes and group the values into intervals (bins), respectively. Bins have the form [a, b).
`tbl.apply(function, column)` 7.1 Returns an array of values resulting from applying a function to each item in a column.
`tbl.group(column_or_columns, func)` 7.2, 7.3 Group rows by unique values or combinations of values in a column(s). Multiple columns must be entered in array or list form. Other values aggregated by count (default) or optional argument `func`.
`tbl.pivot(col1, col2, vals, collect)`
`tbl.pivot(col1, col2)`
7.3 A pivot table where each unique value in `col1` has its own column and each unique value in `col2` has its own row. Count or aggregate values from a third column, collect with some function. Default `vals`and `collect` return counts in cells.
`tblA.join(colA, tblB, colB)`
`tblA.join(colA, tblB)`
7.4 Generate a table with the columns of tblA and tblB, containing rows for all values of a column that appear in both tables. Default `colB` is `colA`. `colA` and `colB`must be strings specifying column names.
`tbl.sample(n)`
`tbl.sample(n, with_replacement)`
9 A new table where `n` rows are randomly sampled from the original table. Default is with replacement. For sampling without replacement, use argument `with_replacement=False`. For a non-uniform sample, provide a third argument `weights=distribution` where `distribution` is an array or list containing the probability of each row.
`proportions_from_distribution(tbl, prop_col, n)` 10.1 Returns a copy of `tbl` with an additional column ` Random Sample ` containing the proportions of a `n`-sized random sample, drawn using the proportions in `prop_col`.

### Array Functions and Methods

Example Function Call Chapter Description
`max(array)` 3.3 Returns the maximum value of an array.
`min(array)` 3.3 Returns the minimum value of an array.
`sum(array)` 3.3 Returns the sum of the values in an array.
`abs(num)`, `np.abs(array)` 3.3 Take the absolute value of number or each number in an array.
`round(num)`, `np.round(array)` 3.3 Round number or array of numbers to the nearest integer.
`len(array)` 3.3 Returns the length (number of elements) of an array.
`make_array(val1, val2, ...)` 4.4 Makes a numpy array with the values passed in. Values must be the same data type.
`np.average(array), np.mean(array)` 4.4 Returns the average of the values in an array.
`np.diff(array)` 4.4 Returns a new array of size `len(array)-1` with elements equal to the difference between adjacent elements; val_2 - val_1, val_3 - val_2, etc.
`np.sqrt(array)` 4.4 Returns an array with the square root of each element
`np.arange(start, stop, step)`
`np.arange(start, stop)`
`np.arange(stop)`
4.5 An array of numbers starting with `start`, going up in increments of `step`, and going up to but excluding `stop`. When `start` and/or `step` are left out, default values are used in their place. Default step is 1; default start is 0.
`array.item(index)` 4.6 Returns the i-th item in an array (remember Python indices start at 0!)
`np.random.choice(array, n)`
`np.random.choice(array)`
8 An array of items selected at random with replacement from an array. Default number of items is 1 if `n` is not specified.
`np.count_nonzero(array)` 8 Counts the number of non-zero (or `True`) elements in an array.
`np.append(array, item)` 8.2 Returns a copy of the input array with `item` (must be the same type as the other entries in the array) appended to the end.
`percentile(percentile, array)` 11.1 Returns the item at the corresponding percentile of an array.