Piping

A pipe in R is a way of stitching together many commands to make your code easier for a human to read, and to keep it from running off the pages of the PDF documents you will submit to us.

A very silly example

These two code chunks do exactly the same thing:

sum(1, 2)
[1] 3
1 |>
  sum(2)
[1] 3

So the pipe operator |> passes (or pipes), the number 1 into the sum function as the first input. In a simple example like this, sum(1, 2) is definitely the way I would write the code, but as things get more elaborate, you will want to pipe.

Tallying stuff up in a spreadsheet

Let’s consider the COVID data set from class on 9/3/2024:

library(tidyverse)

delta <- read_csv("delta.csv")
Note

Recalling the information here, be prepared to adjust the file path to match how you have organized your files and folders.

A row in this data set is a person, and we record whether or not that person died from/with COVID, and whether or not they were vaccinated. So there are four categories in all:

  • unvaccinated and died
  • unvaccinated and survived
  • vaccinated and died
  • vaccinated and survived

The following code creates a nifty lil’ table that tallies up the number of people in each group and calculates the proportion of people that died versus survived within each vaccination group:

delta |>
  count(vaccine, outcome) |>
  group_by(vaccine) |>
  mutate(prop = n / sum(n))
# A tibble: 4 × 4
# Groups:   vaccine [2]
  vaccine      outcome       n    prop
  <chr>        <chr>     <int>   <dbl>
1 Unvaccinated died        250 0.00166
2 Unvaccinated survived 150802 0.998  
3 Vaccinated   died        477 0.00407
4 Vaccinated   survived 116637 0.996  

So within each vaccination group, the numbers sum to one. This code is equivalent, but it provides a truly horrific reading experience:

  mutate(group_by(count(delta, vaccine, outcome), vaccine), prop = n / sum(n))
# A tibble: 4 × 4
# Groups:   vaccine [2]
  vaccine      outcome       n    prop
  <chr>        <chr>     <int>   <dbl>
1 Unvaccinated died        250 0.00166
2 Unvaccinated survived 150802 0.998  
3 Vaccinated   died        477 0.00407
4 Vaccinated   survived 116637 0.996  

Now you try

The COVID data contains another variable indicating whether or not the person was older or younger than fifty. Write some code (it will be very similar to the code above) that produces a table that breaks things down by vaccination status, outcome, and age. How many rows should this table contain?