Resources

Final Review!

Comprehensive review of statistical concepts: steps, examples, purpose of topics like hypothesis testing, confidence intervals, correlation, regression, classification, two-sample inference, central limit theorem, Bayes' Rule, and more. Thanks to Francie McQuarrie for this!

Tutoring Worksheets

Worksheets from tutoring sections this semester are available for review. To use the provided files, you'll have to start by downloading the desired zip file to your computer and double clicking on it to expand it into a folder. Then, navigate to Jupyterhub and upload the files in the folder to your files by clicking the "upload" button in the top righthand corner and selecting the desired files. (Please note, you must upload the files from inside the unzipped folder, not the zip file itself.) Once your files are uploaded to Jupyterhub, open the notebook, which is a .ipynb file, and get to work! Email Emma for technical help with uploading files or come to office hours.

Lab Slides

Lab slides for Nanxi and Katherine's section (MW, 1-3pm)

Practice Exams

Midterms:

Summer 2017 midterm and solution
Spring 2017 midterm, solution, and video walkthrough.
Spring 2017 practice midterm, solution, and video walkthrough. The practice midterm is the Spring 2016 midterm, but modified to only include topics covered in Summer 2017.
Fall 2016 midterm and video walkthrough. This exam only includes questions on topics covered in Summer 2017.

Finals:

Spring 2017 final and solution
Spring 2017 practice final, solution, and video walkthrough. The practice final is the Spring 2016 final, but with modified solutions that use the Summer 2017 version of the datascience module.
Fall 2016 final and video walkthrough. All topics were covered in Summer 2017.
Another practice final and solutions

Exam Study Guides

The midterm study guide will be distributed with the midterm exam and the final exam.
The final study guide will be distributed with the final exam.

Staff Solutions

Please note that you will need to be signed into your berkeley.edu email account as your default account to access the Google Drive folders.

Discussion Video Walkthroughs

Additional Dataset Questions

We've compiled a list of additional questions for our datasets here if you'd like more practice or want to do your own independent data investigation.

Table Functions and Methods

In the examples in the left column, np refers to the NumPy module, as usual. Everything else is a function, a method, an example of an argument to a function or method, or an example of an object we might call the method on. For example, tbl refers to a table, array refers to an array, and num refers to a number. array.item(0) is an example call for the method item, and in that example, array is the name previously given to some array.

Example Function Call	Chapter	Description
`Table()`	5	Creates an empty table, usually to extend with data.
`Table().read_table(filename)`	5	Creates a table from a data file.
`tbl.with_column(name, values)` `tbl.with_columns(n1, v1, n2, v2, ...)`	5	A table with an additional or replaced column or columns. `name`is a string for the name of a column, `values` is an array.
`tbl.column(column_name_or_index)`	5	The values of a column (an array)
`tbl.num_rows`	5	The number of rows in a table.
`tbl.num_columns`	5	The number of columns in a table.
`tbl.labels`	5	A list of the column labels in a table.
`tbl.select(col1, col2, ...)`	5	Creates a copy of a table with only selected columns. Each column is the column name or index.
`tbl.drop(col1, col2, ...)`	5	Creates a copy of a table without selected columns. Each column is the column name or index.
`tbl.relabel(old_label, new_label)`	5	Modifies the existing table in place, changing the column heading in the first argument to the second.
`tbl.relabeled(old_label, new_label)`	5	Returns a new table with the column heading in the first argument changed to the second.
`tbl.sort(column_name_or_index)`	5.1	Creates a copy of a table sorted by the values in a column. Defaults to ascending order unless optional argument "descending = True" is included.
`tbl.where(column, predicate)`	5.2	A table of the rows for which the column satisfies some predicate. See `Table.where predicates` below.
`tbl.take(row_indices)`	5.2	A table with only the rows at the given indices. `row_indices` is an array of indices.
`tbl.scatter(x_column, y_column)`	6	Draws a scatter plot consisting of one point for each row of the table. Note that `x_column` and `y_column` must be strings specifying column names.
`tbl.barh(categories)` `tbl.barh(categories, values)`	6.1	Displays a bar chart with bars for each category in a column, with height proportional to the corresponding frequency. values argument unnecessary if table has only a column of categories and a column of values.
`tbl.hist(column, units, bins)`	6.2	Generates a histogram of the numerical values in a column. `units` and `bins` are optional arguments, used to label the axes and group the values into intervals (bins), respectively. Bins have the form [a, b).
`tbl.apply(function, column)`	7.1	Returns an array of values resulting from applying a function to each item in a column.
`tbl.group(column_or_columns, func)`	7.2, 7.3	Group rows by unique values or combinations of values in a column(s). Multiple columns must be entered in array or list form. Other values aggregated by count (default) or optional argument `func`.
`tbl.pivot(col1, col2, vals, collect)` `tbl.pivot(col1, col2)`	7.3	A pivot table where each unique value in `col1` has its own column and each unique value in `col2` has its own row. Count or aggregate values from a third column, collect with some function. Default `vals`and `collect` return counts in cells.
`tblA.join(colA, tblB, colB)` `tblA.join(colA, tblB)`	7.4	Generate a table with the columns of tblA and tblB, containing rows for all values of a column that appear in both tables. Default `colB` is `colA`. `colA` and `colB`must be strings specifying column names.
`tbl.sample(n)` `tbl.sample(n, with_replacement)`	9	A new table where `n` rows are randomly sampled from the original table. Default is with replacement. For sampling without replacement, use argument `with_replacement=False`. For a non-uniform sample, provide a third argument `weights=distribution` where `distribution` is an array or list containing the probability of each row.
`proportions_from_distribution(tbl, prop_col, n)`	10.1	Returns a copy of `tbl` with an additional column `Random Sample` containing the proportions of a `n`-sized random sample, drawn using the proportions in `prop_col`.

Array Functions and Methods

Example Function Call	Chapter	Description
`max(array)`	3.3	Returns the maximum value of an array.
`min(array)`	3.3	Returns the minimum value of an array.
`sum(array)`	3.3	Returns the sum of the values in an array.
`abs(num)`, `np.abs(array)`	3.3	Take the absolute value of number or each number in an array.
`round(num)`, `np.round(array)`	3.3	Round number or array of numbers to the nearest integer.
`len(array)`	3.3	Returns the length (number of elements) of an array.
`make_array(val1, val2, ...)`	4.4	Makes a numpy array with the values passed in. Values must be the same data type.
`np.average(array), np.mean(array)`	4.4	Returns the average of the values in an array.
`np.diff(array)`	4.4	Returns a new array of size `len(array)-1` with elements equal to the difference between adjacent elements; val_2 - val_1, val_3 - val_2, etc.
`np.sqrt(array)`	4.4	Returns an array with the square root of each element
`np.arange(start, stop, step)` `np.arange(start, stop)` `np.arange(stop)`	4.5	An array of numbers starting with `start`, going up in increments of `step`, and going up to but excluding `stop`. When `start` and/or `step` are left out, default values are used in their place. Default step is 1; default start is 0.
`array.item(index)`	4.6	Returns the i-th item in an array (remember Python indices start at 0!)
`np.random.choice(array, n)` `np.random.choice(array)`	8	An array of items selected at random with replacement from an array. Default number of items is 1 if `n` is not specified.
`np.count_nonzero(array)`	8	Counts the number of non-zero (or `True`) elements in an array.
`np.append(array, item)`	8.2	Returns a copy of the input array with `item` (must be the same type as the other entries in the array) appended to the end.
`percentile(percentile, array)`	11.1	Returns the item at the corresponding percentile of an array.