STAT 94

Foundations of Data Science
Fall 2015

Principal Instructor:
Ani Adhikari
Co-Instructors:
John DeNero
Michael I. Jordan
Tapan Parikh
David Wagner




Course Information and Policies

This introductory course in data science is built on three interrelated perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? How does one collect data to answer questions that one is interested in? Inferential thinking refers to an ability to connect data to underlying phenomena and to the ability to think critically about the conclusions that are drawn from data analysis. Computational thinking refers to the ability to conceive of the abstractions and processes that allow inferential procedures to be embodied in computer programs, and to ensure that such programs are scalable, robust and understandable. In addition to teaching basic skills in computer programming and statistical inference, the course will also involve the hands-on analysis of a variety of real-world datasets, including economic data, document collections, geographical data and social networks, and it will delve into social and legal issues surrounding data analysis, including issues of privacy and data ownership.

Course Format

The course includes many events: lecture, lab, office hours, and review sessions. Weekly lab and lecture are typically the most valuable events to attend.

Lecture: The course includes three 50-minute lectures per week.

Lab: The course includes one laboratory section each week. These sections are run by an amazing group of Graduate Student Instructors. Getting to know your GSI is an excellent way to succeed in this course. Participation in lab is tracked but not required.

Office Hours: Attending office hours is another excellent way to succeed in this course. Office hours are held by GSIs and the instructor each week. A schedule appears on the staff page of the course website.

In office hours, you can ask questions about the material, receive guidance on assignments, and work with peers and course staff in a small group setting.

Grading

Grades for this course are assigned using the following sources.

The midterm exam will be held in class on Monday 10/19. The final exam will be held on Monday 12/14 from 8am to 11am, location to be announced.

Materials

The primary text for this course is Computational and Inferential Thinking, an online textbook.

Homework will be distributed each week on paper (typically at lecture on Friday) and posted online. You need to complete them in the spaces given on the page where they appear, not electronically or on a separate piece of paper. (Having a single format helps us grade them and forces you to write concisely.) If you print your own copy, please print it double-sided.

Homework assignments are due in your lab section the week after they are handed out. (For example, the homework handed out on Friday, 8/28 is due in lab on Wednesday 9/2 or, for those in the Thursday lab, Thursday 9/3.)

Computing Resources

All computing assignments in this course will be completed on ds8.berkeley.edu. You can complete all computing labs and projects using any computer (or device) that has a web browser.

The lab room for the course is 105 Cory. If you would like to use a lab computer to work on an assignment, all students will have 24 hour access to this room whenever there is not another course using it. By enrolling in the course, you will have card-key access to this room using your Cal ID card.