Lab 7 - Inference for proportions and means

Getting started

Use these steps to navigate to the STA101 version of RStudio using the Duke Container Manager;
Use these steps to download all of the lab 7 files from our Canvas page, upload them to your RStudio files, and move them into an appropriately named folder (lab-7, for instance);
Once lab-7.qmd is where it needs to be, open it, and verify that you can click the “Render” button in RStudio and get a PDF file. See this answer on Ed if you want more guidance here;
Now proceed to complete the exercises in this lab.

Pointers

For all visualizations you create, be sure to include informative titles for the plot, axes, and legend;
Respond in complete sentences as much as possible;
Be sure to observe good code style:
- There is a line break after each |> in a pipeline or + in a ggplot;
- There are spaces around = signs;
- There is a space after each ,;
- Code is properly indented;
- Code doesn’t exceed 80 characters in each line, longer lines of code are spread across multiple lines with appropriately placed line breaks (so in the rendered PDF, your code shouldn’t run off the page);
- Code chunks are labeled, informatively and without spaces.

Packages

library(tidyverse) 
library(tidymodels)
library(openintro)
library(dsbox)

Part 1: General Social Survey

The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The GSS has been conducted each year since 1972 by the National Opinion Research Center at the University of Chicago. The GSS collects data on demographic characteristics and attitudes of residents of the United States. The GSS sample is designed as a multistage stratified sample.

For the exercises in this part, we’ll use the gss16 data set from the dsbox package. You can find out more about the dataset by inspecting its documentation, which you can access by running ?gss16 in the Console or using the Help menu in RStudio to search for gss16. You can also find this information here.

In 2016, the GSS added a new question on harassment at work. The question is phrased as the following.

Over the past five years, have you been harassed by your superiors or co-workers at your job, for example, have you experienced any bullying, physical or psychological abuse?

Answers to this question are stored in the harass5 variable in gss16 set.

Exercise 1

Create a subset of the data that only contains Yes and No answers for the harassment question. How many responses chose each of these answers?

Exercise 2

Describe how bootstrapping can be used to estimate the proportion of all Americans who have been harassed by their superiors or co-workers at their job.

Exercise 3

Calculate a 95% bootstrap confidence interval for the proportion of Americans who have been harassed by their superiors or co-workers at their job. Use 1,000 iterations when creating your bootstrap distribution. Set the seed to 1234. Interpret this interval in context of the data. Make sure to visualize this interval as well.
Where was your 95% confidence interval centered? Why does this make sense?
Now, calculate 90% bootstrap confidence interval for the proportion of Americans who have been harassed by their superiors or co-workers at their job. Interpret this interval in context of the data. Is it wider or more narrow than the 95% confidence interval? Additionally, visualize the interval by layering it on the visualization (for the 95% confidence interval) from part (a). You can do this simply by adding another shading layer where the endpoints now come from the new 90% confidence interval. Use a different pair of colors for the color and fill arguments, e.g., darkorange4 and darkorange1.
Now, suppose you created a bootstrap distribution with 50,000 simulations instead of 1,000. What would you expect to change (if anything)? Center of your confidence interval? Width of your confidence interval? Do not run this simulation, just conceptually think about it.

Pause and render

Now that you’ve completed the second part of the exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Part 2: Course scores

Students in large university course that is offered each semester average 32 (out of 50) on the midterm, with a standard deviation of 4. The scores are distributed normally.

Exercise 4

What is the probability a randomly selected student scored less than 28?
What is the probability a randomly selected student scored higher than than 35?
What score did only 20 percent of the class exceed?

Exercise 5

I pick 10 students at random from the class.

[Without looking at the next question] How does the standard error of their average score compare to the standard deviation of the population of all STA 101 students? Explain your reasoning.
The standard error of the mean can be calculated as \(SE = \frac{\sigma}{\sqrt{n}}\). What is the probability the average midterm score of these 10 students is less than 28?
What is the probability the average midterm score of these 10 students is higher than 35?
How do the probabilities you calculated in the previous two questions (less than 28 and higher than 35) compare to the probabilities you calculated in Exercise 4? Why?

Pause and render

Now that you’ve completed the first part of the exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.

Part 3: IMS exercises

The exercises in this section do not require code. Make sure to answer the questions in full sentences.

Exercise 6

IMS - Chapter 16 exercises, #2: Married at 25.

Exercise 7

IMS - Chapter 16 exercises, #5: If I fits, I sits, bootstrap test.

Exercise 8

IMS - Chapter 16 exercises, #6: Legalization of marijuana, bootstrap test.

Exercise 9

IMS - Chapter 16 exercises, #8: Legalization of marijuana, standard errors.

Exercise 10

IMS - Chapter 16 exercises, #22: Legalization of marijuana, mathematical interval.

Lastly

Recommend some music for us to listen to while we grade this.

Wrap up

Submitting

Important

Before you proceed, first, make sure that you have updated the document YAML with your name! Then, render your document one last time, for good measure.

To submit your assignment to Gradescope:

Go to your Files pane and check the box next to the PDF output of your document (lab-7.pdf).
Then, in the Files pane, go to More > Export. This will download the PDF file to your computer. Save it somewhere you can easily locate, e.g., your Downloads folder or your Desktop.
Go to the course Canvas page and click on Gradescope and then click on the assignment. You’ll be prompted to submit it.
Mark the pages associated with each exercise. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).

Warning

If you fail to mark the pages associated with an exercise, that exercise won’t be graded. This means, if you fail to mark the pages for all exercises, you will receive a 0 on the assignment. The TAs can’t mark your pages for you, and for them to be able to grade, you must mark them.

Grading

The lab will be graded out of 50 total points.