STAT 89A Introduction to Matrices and Graphs in Data Science

A Spring 2017 Data Science Connector Course

CCN: 23428 | Michael Mahoney | Monday 1:00-3:00 PM | Evans B6 | Units: 2

Announcements

  • Welcome to Matrices and Graphs in Data Science!

Course Description

This connector will cover introductory topics in the mathematics of data science, focusing on discrete probability and linear algebra and the connections between them that are useful in modern theory and practice. We will focus on matrices and graphs as popular mathematical structures with which to model data. For examples, as models for term-document corpora, high-dimensional regression problems, ranking/classification of web data, adjacency properties of social network data, etc.


This course connects to the Foundations of Data Science course by providing a unified view of the mathematical methods that underlie the theoretical foundations of data science. Typically, these methods are taught from a statistical perspective, or they are taught from a computer science perspective, or they are taught from a purely mathematical perspective. This connector course will try to drill down in more detail on several key ideas that are used in all these areas. These methods have a rich mathematical underpinnings, and they are also widely useful in practice.


The course will cover some basic mathematics of discrete probability and linear algebra as well as ways they interact in data problems. The choice of discrete rather continuous probability is since it is what is used in practice and since many of the basic results can be illustrated at the freshman level without advanced calculus, etc. Basic insights are developed without getting bogged down into details that matter in traditional numerical presentations of linear algebra but that matter less for data science. Latter topics that use both discrete probability and linear algebra include simple geometric properties of high-dimensional spaces, simple random walks, and spectral methods for clustering, classification, and ranking, all of which have interesting mathematics and are widely-used in practice.


Class Materials

Hello world notebook: here

Introductory notebook: here

Contact Information

Instructor: Michael Mahoney

Email

mmahoney@stat.berkeley.edu

Website

http://statistics.berkeley.edu/people/michael-mahoney

Office

445 Evans