# Initialize Otter
import otter
grader = otter.Notebook("hw05.ipynb")
Homework 5: Chance, Empirical Probability Distribution.¶
Please complete this notebook by filling in the cells provided. Before you begin, execute the previous cell to load the provided tests.
Helpful Resource:
Python Reference: Cheat sheet of helpful array & table methods used in Data 8!
Sampling Methods Guide: Guide for the randomization methods.
Recommended Readings:
Please complete this notebook by filling in the cells provided. Before you begin, execute the cell below to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute this cell again.
For all problems that you must write explanations and sentences for, you must provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use max_temperature in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!
Deadline:
This assignment is due Thursday, 2/26 at 11:00am PT (note that this is one day later than the usual due date of Wednesdays at 11am). Submissions after this time will be accepted for 24 hours and will incur a 20% penalty. Any submissions later than this 24 hour period will not be accepted unless an extension has been granted as per the syllabus page. Turn it in by Wednesday, 2/25 at 11:00am PT for 5 extra credit points.
Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. Refer to the syllabus page to learn more about how to learn cooperatively.
You should start early so that you have time to get help if you’re stuck. Office hours are held Monday through Friday in Warren Hall 101B or online. The office hours schedule appears here.
# Run this cell to set up the notebook, but please don't change it.
# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *
# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)A Nevada roulette wheel has 38 pockets and a small ball that rests on the wheel. When the wheel is spun, the ball comes to rest in one of the 38 pockets. That pocket is declared the winner.
The pockets are labeled 0, 00, 1, 2, 3, 4, ... , 36. Pockets 0 and 00 are green, and the other pockets are alternately red and black. The table wheel is a representation of a Nevada roulette wheel. Note that both columns consist of strings. Below is an example of a roulette wheel!

Run the cell below to load the wheel table.
wheel = Table.read_table('roulette_wheel.csv', dtype=str)
wheelBetting on Red¶
If you bet on red, you are betting that the winning pocket will be red. This bet pays 1 to 1. That means if you place a one-dollar bet on red, then:
If the winning pocket is red, you gain 1 dollar. That is, you get your original dollar back, plus one more dollar.
If the winning pocket is not red, you lose your dollar. In other words, you gain -1 dollars.
Let’s see if you can make money by betting on red at roulette.
Question 1.1. Define a function dollar_bet_on_red that takes the name of a color, and returns your gain in dollars if that color won and you had placed a one-dollar bet on red. Remember that the gain can be negative. Make sure your function returns an integer. (5 points)
Note: You can assume that the only colors that will be passed as arguments are red, black, and green. Your function doesn’t have to check that.
def dollar_bet_on_red(...):
...grader.check("q1_1")Run the cell below to make sure your function is working.
print(dollar_bet_on_red('green'))
print(dollar_bet_on_red('black'))
print(dollar_bet_on_red('red'))Question 1.2. Add a column labeled Winnings: Red to the table wheel. For each pocket, the column should contain your gain in dollars if that pocket won and you had bet one dollar on red. Your code should use the function dollar_bet_on_red. (5 points)
Hint: You should not need a
forloop for this question, instead try using a table method!
red_winnings = ...
wheel = ...
wheelgrader.check("q1_2")Simulating 10 Bets on Red¶
Roulette wheels are set up so that each time they are spun, the winning pocket is equally likely to be any of the 38 pockets regardless of the results of all other spins. Let’s see what would happen if we decided to bet one dollar on red each round.
Question 1.3. Create a table ten_bets by sampling the table wheel to simulate 10 spins of the roulette wheel. Your table should have the same three column labels as in wheel. Once you’ve created that table, set sum_bets to your net gain in all 10 bets, assuming that you bet one dollar on red each time. (5 points)
Note: The Sampling Methods Guide may be helpful!
Hint: It may be helpful to print out ten_bets after you create it!
ten_bets = ...
sum_bets = ...(ten_bets.column('Winnings: Red'))
sum_betsgrader.check("q1_3")Run the cells above a few times to see how much money you would make if you made 10 one-dollar bets on red. Making a negative amount of money doesn’t feel good, but it is a reality in gambling. Casinos are a business, and they make money when gamblers lose.
Question 1.4. Let’s see what would happen if you made more bets. Define a function net_gain_red that takes in a number of bets and returns the net gain (a single integer representing the total amount won from betting on red that many times). (5 points)
Hint: You should use your wheel table within your function.
def net_gain_red(...):
...grader.check("q1_4")Run the cell below a few times to make sure that the results are similar to those you observed in the previous exercise.
net_gain_red(10)Question 1.5. Complete the cell below to simulate the net gain in 200 one-dollar bets on red, repeating the process 10,000 times. After the cell is run, simulated_gains_red should be an array with 10,000 entries, each of which is the net gain in 200 one-dollar bets on red. (5 points)
Hint: Think about which computational tool might be helpful for simulating a process multiple times. Section 9.3 might be helpful!
Note: This cell might take a few seconds to run.
num_bets = ...
repetitions = ...
simulated_gains_red = ...
...
len(simulated_gains_red) # Do not change this line! Check that simulated_gains_red is length 10000.grader.check("q1_5")Run the cell below to visualize the results of your simulation.
gains = Table().with_columns('Net Gain on Red', simulated_gains_red)
gains.hist(bins = np.arange(-80, 41, 4))Question 1.6: Using the histogram above, decide whether the following statement is true or false:
If you make 200 one-dollar bets on red, your chance of losing money is more than 50%.
Assign loss_more_than_50 to either True or False depending on your answer to the question. (5 points)
loss_more_than_50 = ...grader.check("q1_6")Betting on a Split¶
If betting on red doesn’t seem like a good idea, maybe a gambler might want to try a different bet. A bet on a split is a bet on two consecutive numbers such as 5 and 6. This bet pays 17 to 1. That means if you place a one-dollar bet on the split 5 and 6, then:
If the winning pocket is either 5 or 6, your gain is 17 dollars.
If any other pocket wins, you lose your dollar, so your gain is -1 dollars.
Question 1.7. Define a function dollar_bet_on_split that takes a pocket number, and returns your gain in dollars if that pocket won and you had bet one dollar on the 5-6 split. (5 points)
Hint: Remember that the pockets are represented as strings.
Hint: Reference Section 9.1!
def dollar_bet_on_split(...):
...grader.check("q1_7")Run the cell below to check that your function is doing what it should.
print(dollar_bet_on_split('5'))
print(dollar_bet_on_split('6'))
print(dollar_bet_on_split('00'))
print(dollar_bet_on_split('23'))Question 1.8. Add a column Winnings: Split to the wheel table. For each pocket, the column should contain your gain in dollars if that pocket won and you had bet one dollar on the 5-6 split. (5 points)
split_winnings = ...
wheel = ...
wheel.show(5) # Do not change this line.grader.check("q1_8")Question 1.9. Simulate the net gain in 200 one-dollar bets on the 5-6 split, repeating the process 10,000 times and saving your gains in the array simulated_gains_split. (5 points)
Hint: Your code in Questions 1.4 and 1.5 may be helpful here!
simulated_gains_split = ...
...
# Do not change the two lines below
gains = gains.with_columns('Net Gain on Split', simulated_gains_split)
gains.hist(bins = np.arange(-200, 150, 20))grader.check("q1_9")Question 1.10. Look carefully at the visualization above, and assign histogram_statements to an array containing the numbers of each statement below that can be correctly inferred from the overlaid histogram. (5 points)
If you bet one dollar 200 times on a split, your chance of losing money is more than 50%.
If you bet one dollar 200 times in roulette, your chance of making more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.
If you bet one dollar 200 times in roulette, your chance of losing more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.
Hint: We’ve already seen one of these statements in a prior question.
histogram_statements = ...grader.check("q1_10")If this exercise has put you off playing roulette, it has done its job. If you are still curious about other bets, here they all are, and here is the bad news. The house – that is, the casino – always has an edge over the gambler.
Before you do this exercise, make sure you understand the logic behind all the examples in Section 9.5.
Good ways to approach probability calculations include:
Thinking one trial at a time: What does the first one have to be? Then what does the next one have to be?
Breaking up the event into distinct ways in which it can happen.
Seeing if it is easier to find the chance that the event does not happen.
Finding Chances¶
On each spin of a roulette wheel, all 38 pockets are equally likely to be the winner regardless of the results of other spins. Among the 38 pockets, 18 are red, 18 black, and 2 green. In each part below, write an expression that evaluates to the chance of the event described.
Question 2.1. The winning pocket is black on all of the first three spins. (5 points)
first_three_black = ...grader.check("q2_1")Question 2.2. The color green never wins in the first 10 spins. (5 points)
no_green = ...grader.check("q2_2")Question 2.3. The color green wins at least once on the first 10 spins. (5 points)
at_least_one_green = ...grader.check("q2_3")Question 2.4. Two of the three colors never win in the first 10 spins. (6 points)
Hint: What situation(s) lead to two of the three colors never winning in the first 10 spins?
lone_winners = ...grader.check("q2_4")Comparing Chances¶
In each of Questions 2.5-2.7, two events A and B are described. Choose from one of the following three options and set each answer variable to a single integer:
Event A is more likely than Event B
Event B is more likely than Event A
The two events have the same chance.
You should be able to make the choices without calculation. Good ways to approach this exercise include imagining carrying out the chance experiments yourself, one trial at a time, and by thinking about the law of averages.
Question 2.5. A child picks four times at random from a box that has four toy animals: a bear, an elephant, a giraffe, and a kangaroo. (5 points)
Event A: all four different animals are picked (assuming the child picks without replacement)
Event B: all four different animals are picked (assuming the child picks with replacement)
toys_option = ...grader.check("q2_5")Question 2.6. In a lottery, two numbers are drawn at random with replacement from the integers 1 through 1000. (5 points)
Event A: The number 8 is picked on both draws
Event B: The same number is picked on both draws
lottery_option = ...grader.check("q2_6")Question 2.7. A fair coin is tossed repeatedly. (5 points)
Event A: There are 60 or more heads in 100 tosses
Event B: There are 600 or more heads in 1000 tosses
Hint: Think about the law of averages!
coin_option = ...grader.check("q2_7")You have learned three ways to draw random samples using Python:
tbl.sampledraws a random sample of rows from the tabletbl. The output is a table consisting of the sampled rows.np.random.choicedraws a random sample from a population whose elements are in an array. The output is an array consisting of the sampled elements.sample_proportionsdraws from a categorical distribution whose proportions are in an array. The output is an array consisting of the sampled proportions in all the categories.
# Look through this code and run this cell for questions 1 and 2
top_movies = Table.read_table('top_movies_2017.csv').select(0, 1)
top_movies.show(3)# Look through this code and run this cell for questions 1 and 2
studios_with_counts = top_movies.group('Studio').sort('count', descending=True)
studios_with_counts.show(3)# Look through this code and run this cell for questions 1 and 2
studios_of_all_movies = top_movies.column('Studio')
distinct_studios = studios_with_counts.column('Studio')
print("studios_of_all_movies:", studios_of_all_movies[:10], "...")
print("\n distinct_studios:", distinct_studios)# Look through this code and run this cell for questions 1 and 2
studio_counts_only = studios_with_counts.column('count')
studio_proportions_only = studio_counts_only / sum(studio_counts_only)
print("studio_counts_only:", studio_counts_only)
print("\n studio_proportions_only:", studio_proportions_only)In Questions 3.1 and 3.2, we will present a scenario. For each scenario, we will ask whether the desired result can be achieved by using a given function and the following tables/arrays: top_movies, studios_with_counts, studios_of_all_movies, distinct_studios, studio_counts_only and studio_proportions_only. For studio_counts_only and studio_proportions_only, you can assume that we know which index corresponds with the studio containing that count/proportion.
Question 3.1. Simulate a sample of 10 movies drawn at random with replacement from the 200 movies. Using just this sample, do we have enough information to output True if Paramount appears more often than Warner Brothers among studios that released the sampled movies, and False otherwise?
Select all possible solutions and assign them to the array movie_frequency_answer. (7 points)
Yes, by using
.samplewithtop_movies.Yes, by using
.samplewithstudios_with_counts.Yes, by using
np.random.choicewithdistinct_studios.Yes, by using
np.random.choicewithstudios_of_all_movies.Yes, by using
sample_proportionswithstudio_proportions_only.Yes, by using
sample_proportionswithstudio_counts_only.This is not possible with the provided tables/arrays.
Hint: Instead of just looking at the table/array, try and think about what the result would be after calling the sampling function on it!
movie_frequency_answer = make_array(...)grader.check("q3_1")Question 3.2. Simulate a sample of 10 movies drawn at random with replacement from the 200 movies. Using just this sample, do we have enough information to output True if the first sampled movie was released by the same studio as the last sampled movie?
Select all possible solutions and assign them to the array same_studio_answer. (7 points)
Yes, by using
.samplewithtop_movies.Yes, by using
.samplewithstudios_with_counts.Yes, by using
np.random.choicewithdistinct_studios.Yes, by using
np.random.choicewithstudios_of_all_movies.Yes, by using
sample_proportionswithstudio_proportions_only.Yes, by using
sample_proportionswithstudio_counts_only.This is not possible with the provided tables/arrays.
same_studio_answer = make_array(...)grader.check("q3_2")Pets of Data 8¶
Pepper, Ziggy, Milo, and Maui are proud of you for completing the assignment!

Congrats on finishing Homework 5!
To double-check your work, the cell below will rerun all of the autograder tests.
grader.check_all()Submission¶
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please save before exporting!
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)