Python Reference

Created by Nishant Kheterpal and Jessica Hu

Modified by Yanay Rosen

Table Functions and Methods

In the examples in the left column, `np` refers to the NumPy module, as usual. Everything else is a function, a method, an example of an argument to a function or method, or an example of an object we might call the method on. For example, `tbl` refers to a table, `array` refers to an array, and `num` refers to a number. `array.item(0)` is an example call for the method `item`, and in that example, `array` is the name previously given to some array.

Name Chapter Description Input Output
`Table()` 6 Create an empty table, usually to extend with data None An empty Table
`Table().read_table(filename)` 6 Create a table from a data file string: the name of the file Table with the contents of the data file
`tbl.with_columns(name, values) tbl.with_columns(n1, v1, n2, v2,...)` 6 A table with an additional or replaced column or columns. `name` is a string for the name of a column, `values` is an array 1. string: the name of the new column;
2. array: the values in that column
Table: a copy of the original Table with the new columns added
`tbl.column(column_name_or_index)` 6 The values of a column (an array) string or int: the column name or index array: the values in that column
`tbl.row(i)` 17.3 Returns a row object from the table int: the index of the row to return row: a row object
`tbl.num_rows` 6 Compute the number of rows in a table None int: the number of rows in the table
`tbl.num_columns` 6 Compute the number of columns in a table None int: the number of columns in the table
`tbl.labels` 6 Lists the column labels in a table None array: the names of each column (as strings) in the table
`tbl.select(col1, col2, ...)` 6 Create a copy of a table with only some of the columns. Each column is the column name or index. string or int: column name(s) or index(es) Table with the selected columns
`tbl.drop(col1, col2, ...)` 6 Create a copy of a table without some of the columns. Each column is the column name or index. string or int: column name(s) or index(es) Table without the selected columns
`tbl.relabeled(old_label, new_label)` 6 Modifies the existing table in place, changing the column heading in the first argument to the second 1. string: the old column name
2. string: the new column name
Table: a copy of the original with the changed label
`tbl.show(n)` 6.1 Display `n` rows of a table. If no argument is specified, defaults to displaying the entire table. (Optional) int: number of rows you want to display None: displays a table with `n` rows
`tbl.sort(column_name_or_index)` 6.1 Create a copy of a table sorted by the values in a column. Defaults to ascending order unless `descending = True` is included. 1. string or int: column index or name
2. (Optional) `descending = True`
Table: a copy of the original with the column sorted
`tbl.where(column, predicate)` 6.2 Create a copy of a table with only the rows that match some predicate See `Table.where` predicates below. 1. string or int: column name or index
2. `are.(...)` predicate or array of boolean values corresponding to rows to include
Table: a copy of the original table with only the rows that match the predicate
`tbl.take(row_indices)` 6.2 A table with only the rows at the given indices. `row_indices` is either an array of indices or an integer corresponding to one index. array of ints: the indices of the rows to be included in the Table OR int: the index of the row to be included Table: a copy of the original table with only the rows at the given indices
`tbl.scatter(x_column, y_column, group=)` 7 Draws a scatter plot consisting of one point for each row of the table. Note that `x_column` and `y_column` must be strings specifying column names. 1. string: name of the column on the x-axis
2. string: name of the column on the y-axis
3. string: name of categorical column to color points by.
None: draws a scatter plot
`tbl.plot(x_column, y_column)` 7 Draw a line graph consisting of one point for each row of the table. Draws different color lines for each other numerical column if only the x axis label is specified. 1. string: name of the column on the x-axis
2. string: name of the column on the y-axis.
None: draws a line graph
`tbl.barh(categories) tbl.barh(categories, values)` 7.1 Displays a bar chart with bars for each category in a column, with height proportional to the corresponding frequency. If the values argument is not specified, bars will be drawn with different colors for each other column in the table. 1. string: name of the column with categories
2. (Optional) string: the name of the column with values for corresponding categories
None: draws a bar chart
`tbl.hist(column, unit=, bins=, group=)` 7.2 Generates a histogram of the numerical values in a column. `unit` and `bins` are optional arguments, used to label the axes and group the values into intervals (bins), respectively. Bins have the form `[a, b)`, where `a` is included in the bin and `b` is not. 1. string: name of the column with categories
2. (Optional) string: units of x-axis
3. (Optional) array of ints/floats denoting bin boundaries
4. string: name of categorical column to draw separate overlaid histograms for.
None: draws a histogram
`tbl.bin(column, bins=)` 7.2 Generates a binned table for the numerical values in a column. `bins` is an optional arguments, used to group the values into intervals (bins). Bins have the form `[a, b)`, where `a` is included in the bin and `b` is not. 1. string: name of the column with categories
2. (Optional) array of ints/floats denoting bin boundaries.
Table: a table with the columns `bin`, containing the left ends of each bin, and `column count`, containing the counts of rows that had `column` values fall into the corresponding bin.
`tbl.apply(function, col1, col2, ...)` 8.1 Returns an array of values resulting from applying a function to each item in a column. 1. function: function to apply to column
2. (Optional) string: name of the column to apply function to (if you have multiple columns, the respective column's values will be passed as the corresponding argument to the function), and if there is no argument, your function will be applied to every row in tbl
array: contains an element for each value in the original column after applying the function to it
`tbl.group(column_or_columns, func)` 8.2 Group rows by unique values or combinations of values in a column(s). Multiple columns must be entered in array or list form. Other values aggregated by count (default) or optional argument `func`. 1. string or array of strings: column(s) on which to group
2. (Optional) function: function to aggregate values in cells (defaults to count)
Table: a new Table
`tbl.pivot(col1, col2, values, collect) tbl.pivot(col1, col2)` 8.3 A pivot table where each unique value in `col1` has its own column and each unique value in `col2` has its own row. Count or aggregate values from a third column, collect with some function. Default `values` and `collect` return counts in cells. 1. string: name of column whose unique values will make up columns of pivot table
2. string: name of column whose unique values will make up rows of pivot table
3. (Optional) string: name of column that describes the values of cell
4. (Optional) function: how the values are collected, e.g. `sum` or `np.mean`
Table: a new Table
`tblA.join(colA, tblB, colB) tblA.join(colA, tblB)` 8.4 Generate a table with the columns of tblA and tblB, containing rows for all values of a column that appear in both tables. Default `colB` is `colA`. `colA` and `colB` must be strings specifying column names. 1. string: name of column in tblA with values to join on
2. Table: other Table
3. (Optional) string: if column names are different between Tables, the name of the shared column in tblB
Table: a new Table
`tbl.sample(n) tbl.sample(n, with_replacement)` 10 A new table where `n` rows are randomly sampled from the original table; by default, `n=tbl.num_rows`. Default is with replacement. For sampling without replacement, use argument `with_replacement=False`. For a non-uniform sample, provide a third argument `weights=distribution` where `distribution` is an array or list containing the probability of each row. 1. int: sample size
2. (Optional) `with_replacement=True`
Table: a new Table with `n` rows

String Methods

Name Chapter Description
`str.split(separator)` N/A Splits the string (`str`) into a list based on the `separator` that is passed in
`str.join(array)` N/A Combines each element of `array` into one string, with `str` being in-between each element
`str.replace(old_string, new_string)` 4.2.1 Replaces each occurrence of `old_string` in `str` with the value of `new_string`

Array Functions and Methods

Name Chapter Description
`max(array)` 3.3 Returns the maximum value of an array
`min(array)` 3.3 Returns the minimum value of an array
`sum(array)` 3.3 Returns the sum of the values in an array
`abs(num), np.abs(array)` 3.3 Take the absolute value of number or each number in an array.
`round(num), np.round(array)` 3.3 Round number or array of numbers to the nearest integer.
`len(array)` 3.3 Returns the length (number of elements) of an array
`make_array(val1, val2, ...)` 5 Makes a numpy array with the values passed in
`np.average(array) np.mean(array)` 5.1 Returns the mean value of an array
`np.median(array)` 10.3 Returns the median, 'middle', value of an array
`np.std(array)` 14.2 Returns the standard deviation of an array
`np.diff(array)` 5.1 Returns a new array of size `len(arr)-1` with elements equal to the difference between adjacent elements; val_2 - val_1, val_3 - val_2, etc.
`np.cumsum(array)` 5.1 Returns a new array of size `len(arr)` with elements equal to the cummulative sum of previous elements; val_1, val_1 + val_2, val_1 + val_2 + val_3 etc.
`np.sqrt(array)` 5.1 Returns an array with the square root of each element
`np.arange(start, stop, step) np.arange(start, stop) np.arange(stop)` 5.2 An array of numbers starting with `start`, going up in increments of `step`, and going up to but excluding `stop`. When `start` and/or `step` are left out, default values are used in their place. Default step is 1; default start is 0.
`array.item(index)` 5.3 Returns the i-th item in an array (remember Python indices start at 0!)
`np.random.choice(array, n) np.random.choice(array) np.random.choice(array, n, replace)` 9 Picks one (by default) or some number 'n' of items from an array at random. Default is with replacement. For sampling without replacement, use argument `replace=False.`
`np.count_nonzero(array)` 9 Returns the number of non-zero (or `True`) elements in an array.
`np.append(array, item)` 9.2 Returns a copy of the input array with `item` (must be the same type as the other entries in the array) appended to the end.
`percentile(percentile, array)` 13.1 Returns the corresponding percentile of an array.

`Table.where` Predicates

Any of these predicates can be negated by adding `not_` in front of them, e.g. `are.not_equal_to(Z)` or `are.not_containing(S)`.

Predicate Description
`are.equal_to(Z)` Equal to `Z`
`are.above(x)` Greater than `x`
`are.above_or_equal_to(x)` Greater than or equal to `x`
`are.below(x)` Less than `x`
`are.below_or_equal_to(x)` Less than or equal to `x`
`are.between(x,y)` Greater than or equal to `x` and less than `y`
`are.between_or_equal_to(x,y)` Greater than or equal to `x`, and less than or equal to `y`
`are.contained_in(A)` Is a substring of `A` (if `A` is a string) or an element of `A` (if `A` is a list/array)
`are.containing(S)` Contains the string `S`
`are.strictly_between(x,y)` Greater than `x` and less than `y`

Miscellaneous Functions

These are functions in the `datascience` library that are used in the course that don't fall into any of the categories above.

Name Chapter Description Input Output
`sample_proportions(sample_size, model_proportions)` 11.1 `Sample_size` should be an integer, `model_proportions` an array of probabilities that sum up to 1. The function samples `sample_size` objects from the distribution specified by `model_proportions`. It returns an array with the same size as `model_proportions`. Each item in the array corresponds to the proportion of times it was sampled out of the `sample_size` times. 1. int: sample size
2. array: an array of proportions that should sum to 1
array: each item corresponds to the proportion of times that corresponding item was sampled from model_proportions in sample_size draws, should sum to 1
`minimize(function)` 15.4 Returns an array of values such that if each value in the array was passed into `function` as arguments, it would minimize the output value of `function`. function: name of a function that will be minimized. array: An array in which each element corresponds to an argument that minimizes the output of the function. Values in the array are listed based on the order they are passed into the function; the first element in the array is also going to be the first value passed into the function.