datascience.tables.Table.scatter

Table.scatter(column_for_x, select=None, overlay=True, fit_line=False, group=None, labels=None, sizes=None, width=None, height=None, s=20, **vargs)[source]

Creates scatterplots, optionally adding a line of best fit. Redirects to Table#iscatter if interactive plots are enabled with Table#interactive_plots

args:
column_for_x (str): the column to use for the x-axis values

and label of the scatter plots.

kwargs:
overlay (bool): if true, creates a chart with one color

per data column; if false, each plot will be displayed separately.

fit_line (bool): draw a line of best fit for each set of points.

vargs: additional arguments that get passed into plt.scatter.

see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter for additional arguments that can be passed into vargs. these include: marker and norm, to name a couple.

group: a column of categories to be used for coloring dots per

each category grouping.

labels: a column of text labels to annotate dots.

sizes: a column of values to set the relative areas of dots.

s: size of dots. if sizes is also provided, then dots will be

in the range 0 to 2 * s.

colors: (deprecated) A synonym for group. Retained

temporarily for backwards compatibility. This argument will be removed in future releases.

show (bool): whether to show the figure if using interactive plots; if false,

the figure is returned instead

Raises:

ValueError – Every column, column_for_x or select, must be numerical

Returns:

Scatter plot of values of column_for_x plotted against values for all other columns in self. Each plot uses the values in column_for_x for horizontal positions. One plot is produced for all other columns in self as y (or for the columns designated by select).

>>> table = Table().with_columns(
...     'x', make_array(9, 3, 3, 1),
...     'y', make_array(1, 2, 2, 10),
...     'z', make_array(3, 4, 5, 6))
>>> table
x    | y    | z
9    | 1    | 3
3    | 2    | 4
3    | 2    | 5
1    | 10   | 6
>>> table.scatter('x') 
<scatterplot of values in y and z on x>
>>> table.scatter('x', overlay=False) 
<scatterplot of values in y on x>
<scatterplot of values in z on x>
>>> table.scatter('x', fit_line=True) 
<scatterplot of values in y and z on x with lines of best fit>