An Educator’s Guide to Data 8

Data 8 is a “Foundations in Data Science” course taught for first year students at UC Berkeley. It combines principles/skills in statistics, programming, inference, modeling, hypothesis testing, visualization, and exploration. It provides a foundation in the many fields encompassed by “data science”, and gives students a practical introduction to the field.

This online resource serves as a “snapshot” of the Data 8 stack, and a guide for others that wish to imitate the Data 8 model. It includes discussion of the pedagogical approach, concepts / topics covered, and technical pieces used in the class.

All of the tools that Data 8 uses are open source, and available for the community to use (either as broader community-run projects, or as Berkeley projects). The Data 8 textbook is free to use at []. If you’d like to modify the textbook’s content for your own course, please shoot us an email!

This book’s structure

See the chapters to the left to navigate this book. Below is a general structure of the material.

Deploying your own Data 8 course from scratch

The first section of this guide is a step-by-step guide to deploy your own version of Data 8. It explains how to create a JupyterHub running in the cloud that students can access, how to connect this JupyterHub with the coding environment that Data 8 students use, and how to incorporate course materials into your class. Check it out here:

Technical and pedagogical decisions for Data 8

These sections are an overview of the technical and pedagogical pieces that make up Data 8. They are a high-level reference for running your own Data 8 course, or for taking a similar technical approach to teaching data science.

  • Teaching and Pedagogy covers the unique blend of computational, coding, and conceptual information taught in Data 8.
  • The Data 8 Tech Stack describes the technical pieces of Data 8, including hosting student environments, grading, and managing the course.

Other Data 8 information

All Data 8 materials are freely available online. The course material in full can be accessed at the following online resources:

To explore the guide, select a section to the left!

Other resources

There are many other resources out there for using open technology to facilitate data analytics pedagogy. Here are a few notable resources:

  • The UC Berkeley JupyterHubs guide contains information about all of the JupyterHubs at UC Berkeley, and is a good reference for how our teams coordinate technical infrastructure across classes and resources.
  • The Jupyter for Education Handbook is a more general resource for teaching using tools in the Jupyter ecosystem, and has several best-practices and tips.