Contact information
Instructor: Kerby Shedden (kshedden@umich.edu)
Instructor office: 277 West Hall
Instructor office hours: 3-4 Monday, 11:30-12:30 Thursday; course-related emails are welcome
GSI: Ketian Yu (yukt@umich.edu)
GSI office hours: Thursday 2-5, 1720 Chemistry
Key dates
In-class exam: Tuesday, October 25th
Project due: TBD
Final exam: Tuesday December 20th, 4-6
Grading information:
In-class exam: 20%
Final exam: 30%
Project: 20%
~6 homework sets: 30%
Pre-requisites:
You should have taken an intermediate applied statistics course such as U-M Stat 500, and you should be comfortable programming in at least one programming language.
Course description:
Statistics 506 covers a variety of topics related to the use of computing in data management and data analysis. We will cover the topics listed below, as well as a number of case studies.
Overview of computing languages for working with data
Basic use of the Linux shell and utilities
Basic Stata
Basic computer architecture and networking
Basic data structures and algorithms
Basic R
Data file formats and data containers
Indexed data structures
Merging data
dplyr and other tools for split/apply/combine in R
Basic numerical linear algebra and other key numerical algorithms
Testing, verification and reproducibility of computing-based data analyses
Vectorization
R internals
Profiling and debugging in R
Basic SAS
Maintenance, extensibility, and documentation of code; designing code libraries for re-use
Working with text data
Working with geospatial data
SQL
Distributed, concurrent, and parallel computing
Apache Spark and Hadoop
Computing resources
All the software used in this course is available without charge to U-M students. Some of the software (e.g. R) is open source and you can download and run it on your personal machine.