Writing assignments #

The following guidelines apply to all writing assignments unless otherwise noted:

Writing assignments should be prepared in Google docs, with a title that follows the pattern “2020 504 assignment # uniqname”, where # is the number of the assignment, and uniqname is your UM uniqname (i.e. the first part of your email address). For example, if I were submitting for assignment 1, the title would be “2020 504 assignment 1 kshedden”. It is very important to adhere to this pattern or we may not be able to find your assignment.
Share your assignment with me (kshedden@umich.edu), with Octavio (omesner@umich.edu) and with our GSI Jinming Li (lijinmin@umich.edu) before midnight on the day that the assignment is due. Do not edit the assignment after the due date. Note that we can see all changes through the document history.
Your writing should be 1.5-2 pages in length in Google docs, using default margins, default font size (11 point), and 1.5 line spacing (not the default). In general you should write in plain text full paragraphs, without additional subsectioning, bulleted lists, etc.
You may include a few graphs or tables if relevant for your discussion. You can paste these graphs directly into the Google doc. The graphs and tables do not count toward the 1.5-2 page limit. Do not include graphs or tables that are not central to the points you are making in your writing.
Include a short, informative title for all pieces of writing done in this class.

The assignments and due dates are below:

Due Friday October 2 Conduct an analysis of either the COVID testing and mortality data by country, or the WHO global mortality data. Scripts for obtaining and pre-processing the data can be found on the course github site. The core of your analysis should use generalized linear models, since that is what we have been focusing on in class. For this first assignment, you have the option to essentially repeat parts of the analysis that I demonstrated in class, although since the data are different, the conclusions may differ, and not everything discussed in class may make sense with these new data. You are also welcome to pursue other directions, as long as you can do so in a meaningful way with one of these two datasets. See the guidelines above and the “writing tips” section of the course web page for other suggestions about how to approach this assignment.
Due Friday October 16 Your client is a journalist from ProPublica who wants to write an article on machine bias and sex. She believes that women unfairly receive higher violence-specific COMPAS scores than men, due to gender bias. She would like to know if the data back up her thesis. She has hired you as a statistical consultant for 10-15 hours of work to analyze the data as a coauthor.

Keeping in mind that your client is a very accomplished journalist but does not have much math or statistics background, so it will be important to explain concepts and findings clearly and concisely without statistical jargon. She would like you to provide her with a 1.5 to 2 page memo on your findings. She would also like to use one or two data visualization graphics to communicate key concepts to her readers. Graphics do not count toward your page limit. The graphic should include a one or two sentence description for the article. Including a table with a description is optional, but if it communicates a finding more effectively than words, she would like to have it for the article.

The raw data can be found here: https://raw.githubusercontent.com/propublica/compas-analysis/master/cox-violent-parsed.csv

If using R, the data can be read in using this command: data <- read.csv(url('https://raw.githubusercontent.com/propublica/compas-analysis/master/cox-violent-parsed.csv’)) without first downloading the data.
Due Sunday October 30 You have the choice to work with either the actigraphy data or the internet data for this assignment. You should choose one of these two datasets and conduct an analysis that is focused around a specific question of your choosing. You should use at least some of the time series methods that we discussed in class. There are many directions to consider for this analysis beyond what we discussed in class, so you should go beyond what we have already discussed in some way. We are providing additional days of data for the internet study, and additional data about the subjects from the actigraphy study. See the course github site for more information. See this link for access to some of the data.
The Ann Arbor Public Schools Board of Education (AAPS) would like to understand the effect that student absences have on mathematical educational performance. As a preliminary analysis, they would like to examine the Student Performance data set from the UCI Machine Learning Repository on student secondary educational achievement. This work will be used to inform further research. AAPS would like you to analyze these data to assess the impact of three or more absences versus less than three on math final grade. Additionally, they would like to identify student attributes that possibly contribution absences.

The data and data dictionary can be found here under ‘student.zip’.

Because absences occur before and after the first and second period grades, these covariates should not be included with the pre-treatment variables.

For this assignment, please include a publication-quality table that shows basic statistics for all covariates similar to shown in the lecture notes with the write up.

Note: This is a semi-colon delimited file.
Due Wednesday, December 2 Conduct an analysis of the NHANES, RLMS, or CHNS (China Health and Nutrition Study) data using modern regression approaches for data that are multilevel, correlated, and/or have a non-additive functional structure. As with all other writing in this course, you should define a specific question and aim to address it primarily using methods discussed in class. You should not address the same questions that we discussed using these data during the lectures (i.e. blood pressure and anthropometry in NHANES, and sex-differences in income in the RLMS). Information about obtaining and basic preparation of the data is available on the course github site.