Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

from datascience import *
import numpy as np

Tables

A table is given a name (such as nba) in our programming environment so that we can perform operations on the whole table.

# From https://github.com/erikgregorywebb/datasets/blob/master/nba-salaries.csv
nba = Table.read_table('nba_salaries.csv')
nba
Loading...

Manipulating the table creates a new table, but doesn’t change/lose the original.

nba.select('name')
Loading...
nba.drop('rank', 'position').sort('name')
Loading...
nba.sort('name')
Loading...
'nba'

Because each operation creates a new table, multiple operations can be applied.

nba.where('position', 'PG')
Loading...
nba.where('name', "Shaquille O'Neal")
Loading...

The result of applying operations is a table, which can also be given a name (e.g., point_guards) in our programming environment.

point_guards = nba.where('position', 'PG').drop('rank', 'position')
point_guards
Loading...
point_guards.where('season', 2020).sort('name').sort('team')
Loading...
nba

Sorting can be performed in descending order as well.

point_guards.where('season', 2020).sort('salary').show(10)
Loading...
point_guards.where('season', 2020).sort('salary', descending=True).show(10)
Loading...

Numbers

30
30
10 * 3    # int
30
10 / 3    # float
3.3333333333333335
10 / 2
5.0
10 ** 3
1000
10 ** 0.5
3.1622776601683795
1234567 ** 89
139574185988226216352411677967079384654819840770263154148901015711434587555916221871855053939370772085656915237689603767083349991793713346471091572628494106076146419389211087459209170823939332677859093756734470172351895079430772488109105692648444480028380122417140932569432824316430008740958025053716735421344729410520681559715479472541631321425013768978692403188995380725532383634472359562908542469547971146457062422356845701571795811721146126639734809904577967012608961528997012178537266420111446716001493476975934877110973383995456863730247
10 / 3
3.3333333333333335
75892745.215489247589274985712
75892745.21548925
75892745.215489247589274985712 - 75892745.21548925
0.0
(13 ** 0.5) ** 2
12.999999999999998
3.605551275463989 * 3.605551275463989
12.999999999999998
int(10 / 5)
2
int(10 / 4)
2
float(3)
3.0
6 / 4
1.5
6 / 4000
0.0015
6 / 400000000000000000000000000000000000000000000000000000000
1.5e-56
400000000000000000000000000000000000000000000000000000000 * 1.5e-56 
6.0
1.5e-56 
1.5e-56
x = 5
x
5
x + 1
6
2x
  Cell In[64], line 1
    2x
    ^
SyntaxError: invalid decimal literal
2 * x
10

Strings

'Flavor'
'Flavor'
flavor = 2
flavor
2
# The line below causes a name error
# Flavor
"Flavor"
'Flavor'
'Don't always use single quotes'
  Cell In[72], line 1
    'Don't always use single quotes'
                                   ^
SyntaxError: unterminated string literal (detected at line 1)
"Don't always use single quotes"
"Don't always use single quotes"
'straw' 'berry' # concatenation
'strawberry'
'straw' + 'berry' # concatenation
'strawberry'
'Chris' + 'Paul' # spaces aren't added for you
'ChrisPaul'
'Chris' + ' ' + 'Paul'
'Chris Paul'
x = 'straw'
y = 'berry'
x + y
'strawberry'
x y
  Cell In[80], line 1
    x y
      ^
SyntaxError: invalid syntax
'ha' * 100
'hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha'
'lo' * 5.5
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[82], line 1
----> 1 'lo' * 5.5

TypeError: can't multiply sequence by non-int of type 'float'
'ha' + 10
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[83], line 1
----> 1 'ha' + 10

TypeError: can only concatenate str (not "int") to str
'ha' + str(10)
'ha10'
'ha' + '10'
'ha10'
'3' + 5
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[87], line 1
----> 1 '3' + 5

TypeError: can only concatenate str (not "int") to str
int('3') + 5
8
int('3.0')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[90], line 1
----> 1 int('3.0')

ValueError: invalid literal for int() with base 10: '3.0'
'3.0'
'3.0'
float('3.0')
3.0
int(3.0)
3
int(float('3.0'))
3
3
3
dot_oh = '.0'
float('3' + dot_oh)
3.0
float('3' + dot_oh) * 7
21.0

Types

type(10)
int
a = 10
a
10
type(a)
int
type(4.5)
float
type('abc')
str
type(nba)
datascience.tables.Table
type(Table.read_table('nba_salaries.csv'))
datascience.tables.Table
type(True)
bool
type(abs(-5))
int
type(abs)
builtin_function_or_method

Arrays

first_four = make_array(1, 2, 3, 4)
first_four
array([1, 2, 3, 4])
array([1, 2, 3, 4])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[116], line 1
----> 1 array([1, 2, 3, 4])

NameError: name 'array' is not defined
from numpy import array
array([1, 2, 3, 4])
array([1, 2, 3, 4])
first_four
array([1, 2, 3, 4])
first_four * 2
array([2, 4, 6, 8])
first_four ** 2
array([ 1, 4, 9, 16])
(first_four + 1) ** 2
array([ 4, 9, 16, 25])
first_four # array is unchanged, just like when we call show/select/drop on Table
array([1, 2, 3, 4])
next_four = make_array(5, 6, 7, 8)
next_four
array([5, 6, 7, 8])
first_four + next_four
array([ 6, 8, 10, 12])
only_three = make_array(5, 6, 7)
# This line will cause an error 
first_four + only_three
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[129], line 2
      1 # This line will cause an error 
----> 2 first_four + only_three

ValueError: operands could not be broadcast together with shapes (4,) (3,) 
str_array = make_array('ha', 'he', 'ho')
str_array
array(['ha', 'he', 'ho'], dtype='<U2')
str_array * 4
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[132], line 1
----> 1 str_array * 4

UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U2'), dtype('int64')) -> None
next_four
array([5, 6, 7, 8])
next_four.item(0)
5
next_four.item(4)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[138], line 1
----> 1 next_four.item(4)

IndexError: index 4 is out of bounds for axis 0 with size 4
sum(next_four)
np.average(next_four)
len(next_four)