Lab 6

By the end of this lab you will create formulate and conduct hypothesis tests using randomization.

Getting started

  1. Use these steps to navigate to the STA101 version of RStudio using the Duke Container Manager;

  2. Use these steps to download all of the lab 6 files from our Canvas page, upload them to your RStudio files, and move them into an appropriately named folder (lab-6, for instance);

  3. Once lab-6.qmd is where it needs to be, open it, and verify that you can click the “Render” button in RStudio and get a PDF file. See this answer on Ed if you want more guidance here;

  4. Now proceed to complete the exercises in this lab.

Pointers

  • For all visualizations you create, be sure to include informative titles for the plot, axes, and legend;
  • Respond in complete sentences as much as possible;
  • Be sure to observe good code style:
    • There is a line break after each |> in a pipeline or + in a ggplot;
    • There are spaces around = signs;
    • There is a space after each ,;
    • Code is properly indented;
    • Code doesn’t exceed 80 characters in each line, longer lines of code are spread across multiple lines with appropriately placed line breaks (so in the rendered PDF, your code shouldn’t run off the page);
    • Code chunks are labeled, informatively and without spaces.

Packages

Part 1: Heart transplants - outcome

Today, we will be working with data from The Stanford University Heart Transplant Study (Turnbull, Brown, and Hu 1974):

heart_transplant <- read_csv("data/heart-transplant.csv")
glimpse(heart_transplant)
Rows: 103
Columns: 8
$ id         <dbl> 15, 43, 61, 75, 6, 42, 54, 38, 85, 2, 103, 12, 48, 102, 35,…
$ outcome    <chr> "deceased", "deceased", "deceased", "deceased", "deceased",…
$ transplant <chr> "control", "control", "control", "control", "control", "con…
$ age        <dbl> 53, 43, 52, 52, 54, 36, 47, 41, 47, 51, 39, 53, 56, 40, 43,…
$ survtime   <dbl> 1, 2, 2, 2, 3, 3, 3, 5, 5, 6, 6, 8, 9, 11, 12, 16, 16, 16, …
$ acceptyear <dbl> 68, 70, 71, 72, 68, 70, 71, 70, 73, 68, 67, 68, 71, 74, 70,…
$ prior      <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no",…
$ wait       <dbl> NA, NA, NA, NA, NA, NA, NA, 5, NA, NA, NA, NA, NA, NA, NA, …

The data dictionary is as follows:

Variable Description
id ID number of the patient
outcome Survival status with levels alive and deceased
transplant Transplant group with levels control (did not receive a transplant) and treatment (received a transplant)
age Age of the patient at the beginning of the study
survtime Number of days patients were alive after the date they were determined to be a candidate for a heart transplant until the termination date of the study
acceptyear Year of acceptance as a heart transplant candidate
prior Whether or not the patient had prior surgery with levels yes and no
wait Waiting time for transplant

Exercise 1

IMS - Chapter 11 exercises, #8: Heart transplants.

Exercise 2

Recreate the stacked bar plot in Exercise 1.

Hint

The colors are #569BBD and #114B5F. Use scale_fill_manual() to indicate these are the colors you want to use.

Pause and render

Now that you’ve completed a few exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Exercise 3

Calculate the point estimate for the difference in proportions of patients who died between the treatment and control groups: \(\hat{p}_{treatment} - \hat{p}_{control}\), where \(\hat{p}\) is the observed probability of dying in each group. Do this in two ways:

  1. Construct a frequency table of outcome by transplant and calculate the difference in proportions “by hand”.
  2. Use specify() and calculate() to calculate the difference in proportions as shown below.
obs_diff_outcome <- heart_transplant |>
  specify(response = outcome, explanatory = transplant, success = "deceased") |>
  calculate(stat = "diff in props", order = c("treatment", "control"))

Then, confirm that you obtained the same results with the two methods.

Exercise 4

Using the code below, generate a null distribution for the hypothesis test for the hypothesis you formulated in Exercise 1. Use 100 resamples (reps). However, change the seed to use a different seed.

set.seed(1234)

null_dist_outcome <- heart_transplant |>
  specify(response = outcome, explanatory = transplant, success = "deceased") |>
  hypothesize(null = "independence") |>
  generate(reps = 100, type = "permute") |>
  calculate(stat = "diff in props", order = c("treatment", "control"))

Then, inspect the object null_dist_outcome using either glimpse() or just printing it to screen. How many rows and how many columns does it have? What does each row represent? What does each variable represent?

Exercise 5

Visualize the distribution of simulated differences in proportions of deceased between treatment and control groups. What is the shape of this distribution? What is the center of this distribution? Are these results expected? Explain your reasoning.

Hint

These simulated values are in the stat column of null_dist_outcome.

Exercise 6

Calculate the p-value using the get_p_value function. Interpret the p-value in context of the data and the research question and comment on the statistical discernibility of this result using a discernibility level of 5%.

Pause and render

Now that you’ve completed a few exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Part 2: Heart transplants - survtime

Exercise 7

Recreate the side-by–side box plots in Exercise 1. The variable of interest is survtime, which is the number of days patients were alive after the date they were determined to be a candidate for a heart transplant until the termination date of the study.

Exercise 8

Do these data provide convincing evidence of a difference between average survival times of patient who did and did not get a heart transplant?

Write the hypotheses for answering this question and conduct the hypothesis test using randomization. Use 1,000 resamples. In order to avoid name collision, use object names like obs_diff_survtime, null_dist_survtime, and p_value_survtime. Find and interpret the p-value in context of the data and the research question and comment on the statistical discernability of this result using a discernability level of 5%.

Part 3: More IMS Exercises

The exercises in this section do not require code. Make sure to answer the questions in full sentences.

Exercise 9

IMS - Chapter 11 exercises, #2: Identify the parameter, II.

Exercise 10

IMS - Chapter 11 exercises, #6: Identify hypotheses, II.

Lastly

Recommend some music for us to listen to while we grade this.

Wrap up

Submitting

Important

Before you proceed, first, make sure that you have updated the document YAML with your name! Then, render your document one last time, for good measure.

To submit your assignment to Gradescope:

  • Go to your Files pane and check the box next to the PDF output of your document (lab-6.pdf).

  • Then, in the Files pane, go to More > Export. This will download the PDF file to your computer. Save it somewhere you can easily locate, e.g., your Downloads folder or your Desktop.

  • Go to the course Canvas page and click on Gradescope and then click on the assignment. You’ll be prompted to submit it.

  • Mark the pages associated with each exercise. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).

Warning

If you fail to mark the pages associated with an exercise, that exercise won’t be graded. This means, if you fail to mark the pages for all exercises, you will receive a 0 on the assignment. The TAs can’t mark your pages for you, and for them to be able to grade, you must mark them.

Grading

Exercise Points
Exercise 1 6
Exercise 2 4
Exercise 3 4
Exercise 4 6
Exercise 5 5
Exercise 6 4
Exercise 7 5
Exercise 8 7
Exercise 9 5
Exercise 10 4
Total 50

References

Turnbull, Bruce W., Byron Wm. Brown, and Marie Hu. 1974. “Survivorship Analysis of Heart Transplant Data.” Journal of the American Statistical Association 69 (345): 74–80. https://doi.org/10.1080/01621459.1974.10480130.