Statistics 506, Fall 2016

Syllabus


LSA Courseguide entry

Contact information

  • Instructor: Kerby Shedden (kshedden@umich.edu)

  • Instructor office: 277 West Hall

  • Instructor office hours: 3-4 Monday, 11:30-12:30 Thursday; course-related emails are welcome

  • GSI: Ketian Yu (yukt@umich.edu)

  • GSI office hours: Thursday 2-5, 1720 Chemistry

Key dates

  • In-class exam: Tuesday, October 25th

  • Project due: TBD

  • Final exam: Tuesday December 20th, 4-6

Grading information:

  • In-class exam: 20%

  • Final exam: 30%

  • Project: 20%

  • ~6 homework sets: 30%

Pre-requisites:

You should have taken an intermediate applied statistics course such as U-M Stat 500, and you should be comfortable programming in at least one programming language.

Course description:

Statistics 506 covers a variety of topics related to the use of computing in data management and data analysis. We will cover the topics listed below, as well as a number of case studies.

  • Overview of computing languages for working with data

  • Basic use of the Linux shell and utilities

  • Basic Stata

  • Basic computer architecture and networking

  • Basic data structures and algorithms

  • Basic R

  • Data file formats and data containers

  • Indexed data structures

  • Merging data

  • dplyr and other tools for split/apply/combine in R

  • Basic numerical linear algebra and other key numerical algorithms

  • Testing, verification and reproducibility of computing-based data analyses

  • Vectorization

  • R internals

  • Profiling and debugging in R

  • Basic SAS

  • Maintenance, extensibility, and documentation of code; designing code libraries for re-use

  • Working with text data

  • Working with geospatial data

  • SQL

  • Distributed, concurrent, and parallel computing

  • Apache Spark and Hadoop

Computing resources

All the software used in this course is available without charge to U-M students. Some of the software (e.g. R) is open source and you can download and run it on your personal machine.