Midterms:
Finals:
datascience
module.Please note that you will need to be signed into your berkeley.edu email account to access the Google Drive folders.
We've compiled a list of additional questions for our datasets here if you'd like more practice or want to do your own independent data investigation.
In the examples in the left column, np
refers to the NumPy module, as usual. Everything else is a function,
a method, an example of an argument to a function or method, or an example of an object we might call the method on.
For example, tbl
refers to a table, array
refers to an array, and num
refers to a number.
array.item(0)
is an example call for the method item
, and in that example, array
is the name previously given to some array.
Example Function Call  Chapter  Description 

Table() 
5  Creates an empty table, usually to extend with data. 
Table().read_table(filename) 
5  Creates a table from a data file. 
tbl.with_column(name, values) tbl.with_columns(n1, v1, n2, v2, ...) 
5  A table with an additional or replaced column or columns. name is a string for the name of a column, values is an
array. 
tbl.column(column_name_or_index) 
5  The values of a column (an array) 
tbl.num_rows 
5  The number of rows in a table. 
tbl.num_columns 
5  The number of columns in a table. 
tbl.labels 
5  A list of the column labels in a table. 
tbl.select(col1, col2, ...) 
5  Creates a copy of a table with only selected columns. Each column is the column name or index. 
tbl.drop(col1, col2, ...) 
5  Creates a copy of a table without selected columns. Each column is the column name or index. 
tbl.relabel(old_label, new_label) 
5  Modifies the existing table in place, changing the column heading in the first argument to the second. 
tbl.relabeled(old_label, new_label) 
5  Returns a new table with the column heading in the first argument changed to the second. 
tbl.sort(column_name_or_index) 
5.1  Creates a copy of a table sorted by the values in a column. Defaults to ascending order unless optional argument "descending = True" is included. 
tbl.where(column, predicate) 
5.2  A table of the rows for which the column satisfies some predicate. See Table.where predicates below. 
tbl.take(row_indices) 
5.2  A table with only the rows at the given indices. row_indices is an array of indices. 
tbl.scatter(x_column, y_column) 
6  Draws a scatter plot consisting of one point for each row of the table. Note that x_column and y_column must be strings specifying column names. 
tbl.barh(categories) tbl.barh(categories, values)

6.1  Displays a bar chart with bars for each category in a column, with height proportional to the corresponding frequency. values argument unnecessary if table has only a column of categories and a column of values. 
tbl.hist(column, units, bins) 
6.2  Generates a histogram of the numerical values in a column. units and bins are optional arguments, used to label
the axes and group the values into intervals (bins), respectively. Bins have the form [a, b).

tbl.apply(function, column) 
7.1  Returns an array of values resulting from applying a function to each item in a column. 
tbl.group(column_or_columns, func) 
7.2, 7.3  Group rows by unique values or combinations of values in a column(s). Multiple columns must be entered in array or list form. Other values aggregated by count (default) or optional argument func . 
tbl.pivot(col1, col2, vals, collect) tbl.pivot(col1, col2)

7.3  A pivot table where each unique value in col1 has its own column and each unique value in col2 has its own row.
Count or aggregate values from a third column, collect with some function. Default vals and collect return
counts in cells. 
tblA.join(colA, tblB, colB) tblA.join(colA, tblB) 
7.4  Generate a table with the columns of tblA and tblB, containing rows for all values of a column that appear in both
tables. Default colB is colA . colA and colB must be strings specifying column names. 
tbl.sample(n) tbl.sample(n, with_replacement)

9  A new table where n rows are randomly sampled from the original table. Default is with replacement. For sampling
without replacement, use argument with_replacement=False . For a nonuniform sample, provide a third argument
weights=distribution where distribution is an array or list containing the probability of each row. 
proportions_from_distribution(tbl, prop_col, n) 
10.1  Returns a copy of tbl with an additional column Random Sample containing the proportions of a n sized random sample, drawn using the proportions in prop_col . 
Example Function Call  Chapter  Description 

max(array) 
3.3  Returns the maximum value of an array. 
min(array) 
3.3  Returns the minimum value of an array. 
sum(array) 
3.3  Returns the sum of the values in an array. 
abs(num) , np.abs(array) 
3.3  Take the absolute value of number or each number in an array. 
round(num) , np.round(array) 
3.3  Round number or array of numbers to the nearest integer. 
len(array) 
3.3  Returns the length (number of elements) of an array. 
make_array(val1, val2, ...) 
4.4  Makes a numpy array with the values passed in. Values must be the same data type. 
np.average(array), np.mean(array) 
4.4  Returns the average of the values in an array. 
np.diff(array) 
4.4  Returns a new array of size len(array)1 with elements equal to the difference between adjacent elements; val_2  val_1, val_3  val_2, etc. 
np.sqrt(array) 
4.4  Returns an array with the square root of each element 
np.arange(start, stop, step) np.arange(start, stop) np.arange(stop) 
4.5  An array of numbers starting with start , going up in increments of step , and going up to but excluding stop .
When start and/or step are left out, default values are used in their place. Default step is 1; default start is 0. 
array.item(index) 
4.6  Returns the ith item in an array (remember Python indices start at 0!) 
np.random.choice(array, n) np.random.choice(array)

8  An array of items selected at random with replacement from an array. Default number of items is 1 if n is not specified.

np.count_nonzero(array) 
8  Counts the number of nonzero (or True ) elements in an array. 
np.append(array, item) 
8.2  Returns a copy of the input array with item (must be the same type as the other entries in the array) appended to the end. 
percentile(percentile, array) 
11.1  Returns the item at the corresponding percentile of an array. 