Skip to content

Commit f3586e3

Browse files
committed
Adding module 2 and recitation 7 data distributions solutions
1 parent 93f7440 commit f3586e3

File tree

86 files changed

+4476
-529
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+4476
-529
lines changed

_quarto.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -181,8 +181,8 @@ website:
181181
contents:
182182
- section: Module assignment solutions
183183
contents:
184-
# - href: assignments/modules/module2/module_2_solutions.qmd
185-
# text: Module 2 solutions
184+
- href: assignments/modules/module2/module_2_solutions.qmd
185+
text: Module 2 solutions
186186
# - href: assignments/modules/module3/module_3_solutions.qmd
187187
# text: Module 3 solutions
188188
# - href: assignments/modules/module4/module_4_solutions.qmd
@@ -198,8 +198,8 @@ website:
198198
text: Week 5 - ggplot101 solutions
199199
- href: modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.qmd
200200
text: Week 6 - ggplot102 solutions
201-
# - href: modules/module3/07_distributions/07_distributions_recitation_solutions.qmd
202-
# text: Week 7 - Data distributions solutions
201+
- href: modules/module3/07_distributions/07_distributions_recitation_solutions.qmd
202+
text: Week 7 - Data distributions solutions
203203
# - href: modules/module3/08_correlations/08_correlations_recitation_solutions.qmd
204204
# text: Week 8 - Correlations solutions
205205
# - href: modules/module3/09_add-stats/09_add-stats_recitation_solutions.qmd
Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
---
2+
title: "Module 2 Assignment Solutions"
3+
author: "Jessica Cooperstone"
4+
format:
5+
html:
6+
toc: true
7+
toc-depth: 4
8+
---
9+
10+
```{r setup, include=FALSE}
11+
knitr::opts_chunk$set(echo = TRUE)
12+
```
13+
14+
## Introduction
15+
This is your assignment for Module 2, focused on the material you learned in the lectures and recitation activities on RMarkdown, wrangling, ggplot101, and ggplot102.
16+
17+
You will submit this assignment by uploading a knitted .html to Carmen. Make sure you include the Code Download button, and please show your code within your knitted .html as well. Customize the YAML and the document so you like how it looks.
18+
19+
Remember there are often many ways to reach the same end product.
20+
21+
> This assignment will be due on Tuesday, October 1, 2024, at 11:59pm.
22+
23+
### Data
24+
The [data](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-19) we will be using is collected by the National Science Foundation about the fields and number of Ph.D. degrees awarded each year.
25+
```{r, message = FALSE, warning = FALSE}
26+
phd_field <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-19/phd_by_field.csv")
27+
```
28+
29+
Take a look at the data collected by NSF on how which fields give PhDs each year, and how many are awarded.
30+
31+
## Including the code download button
32+
33+
To get a code download button, you need to indicate that you want one in your YAML. You do this by setting `code_download: true` in your YAML. I'm clipping a YAML below which could be copied and edited.
34+
35+
```{r, eval = FALSE}
36+
---
37+
title: "Module 2 assignment"
38+
author: "Jessica Cooperstone"
39+
date: "October 1, 2024"
40+
output:
41+
html_document: # knit to a .html doc
42+
toc: true # creates a table of contents
43+
toc_float: true # has that TOC float so you can see it even when you scroll
44+
number_sections: true # numbers your sections
45+
theme: flatly # set a global theme, this is what i use for this site
46+
code_download: true # insert the code download button
47+
---
48+
```
49+
50+
## Writing in Markdown 1
51+
52+
Using coding in text, write a sentence in markdown that pulls from this data how many total PhDs were awarded in 2017. If you want to make some calculations in a code chunk first that is ok.
53+
54+
```{r, warning = FALSE, message = FALSE}
55+
library(tidyverse)
56+
library(scales) # for using comma format
57+
```
58+
59+
Setting as an object the number for the total Ph.D.s earned in 2017.
60+
```{r}
61+
phds_2017 <- phd_field |>
62+
filter(year == 2017) |>
63+
select(n_phds) |>
64+
colSums(na.rm = TRUE) # calculate a column sum
65+
```
66+
67+
You could also do it this way:
68+
```{r}
69+
phd_field |>
70+
filter(year == 2017) |>
71+
summarize(total_phds = sum(n_phds, # sum n_phds
72+
na.rm = TRUE)) # remove missing values
73+
```
74+
75+
How to write in Markdown:
76+
77+
In 2017, there were were `` `r '\x60r format(phds_2017, scientific = F, big.mark = ",")\x60'` `` Ph.D. degrees awarded in the United States.
78+
79+
In 2017, there were were `` `r '\x60r phd_field |> filter(year == 2017) |> select(n_phds) |> colSums(na.rm = TRUE) |> format(scientific = F, big.mark = ",")\x60'` `` Ph.D. degrees awarded in the United States.
80+
81+
How this will look when rendered to html:
82+
83+
In 2017, there were `r format(phds_2017, scientific = F, big.mark = ",")` Ph.D. degrees awarded in the United States.
84+
85+
In 2017, there were `r phd_field |> filter(year == 2017) |> select(n_phds) |> colSums(na.rm = TRUE) |> format(scientific = F, big.mark = ",")` Ph.D. degrees awarded in the United States.
86+
87+
## Visualization 1
88+
Make a chart to visualize of the total number of PhDs awarded for each `broad_field` across the total time period of this data. You pick the type of chart that you think is appropriate, and make sure your plot is appropriately labelled and you are happy with how it looks. Hint, to do this you'll probably have to do some data wrangling first.
89+
90+
91+
First I will calculate the total number of Ph.D.s awarded across each `broad_field` for the whole time period.
92+
```{r}
93+
broad_field_sum <- phd_field |>
94+
group_by(broad_field) |>
95+
summarize(broad_field_sum = sum(n_phds, na.rm = TRUE)) |>
96+
arrange(-broad_field_sum)
97+
98+
broad_field_sum
99+
```
100+
101+
This summary helps me know what to expect to see in my plots. Now I can create some plots.
102+
103+
The first one will just show the total number of Ph.D.s across each `broad_field`. I decided to put the `broad_field` on the y-axis so that the labels are easier to read. I've also re-ordered `broad_field` based on the total number of PhDs for each category (i.e., `broad_field_sum`). I chose to use no colors because I didn't feel like it was adding much here.
104+
```{r}
105+
broad_field_sum |>
106+
ggplot(aes(x = broad_field_sum, y = fct_reorder(broad_field, broad_field_sum))) +
107+
geom_col(color = "black", fill = "grey") +
108+
scale_x_continuous(labels = comma) + # add a comma to the x-axis breaks
109+
theme_minimal() +
110+
labs(x = "Total number of Ph.D.s",
111+
y = "", # no label on the y
112+
title = "Number of PhDs awarded across different \nbroad disciplines from 2008-2017",
113+
caption = "Data collected by the National Science Foundation")
114+
```
115+
116+
The following few examples weren't exactly the question (which was to show total PhDs across each `broad_field` across the total time period of this data), but I'll show you here how to make some other stuff.
117+
```{r}
118+
broad_field_sumonly_eachyear <- phd_field |>
119+
group_by(year) |>
120+
summarize(all_the_phds = sum(n_phds, na.rm = TRUE))
121+
122+
broad_field_sumonly_eachyear
123+
```
124+
125+
Then we can plot.
126+
```{r}
127+
broad_field_sumonly_eachyear |>
128+
ggplot(aes(x = year, y = all_the_phds)) +
129+
geom_col(color = "black", fill = "grey") +
130+
scale_y_continuous(labels = comma) + # add a comma to the x-axis breaks
131+
scale_x_continuous(breaks = seq(2008, 2017, 1)) +
132+
theme_minimal() +
133+
labs(x = "Year",
134+
y = "Number of Ph.Ds",
135+
title = "Total number of Ph.D.s awarded in the United States per year from 2008-2017",
136+
caption = "Data collected by the National Science Foundation")
137+
```
138+
139+
This wasn't exactly the question, but if we want to see a little bit better this over the time period, first we need to create a df that has total PhDs per broad field per year. We can do this by adding `year` to our `group_by()` statement.
140+
```{r}
141+
broad_field_sum_byyear <- phd_field |>
142+
group_by(broad_field, year) |>
143+
summarize(total_phds = sum(n_phds, na.rm = TRUE))
144+
145+
# what does that look like
146+
head(broad_field_sum_byyear)
147+
```
148+
149+
Now we can plot. We can now make different types of plots. Let's start simpler. Here I've re-ordered the legend to be in the same order as the chart so you can see the disciplines that award the most Ph.D.s just by looking at the order of the legend.
150+
151+
```{r}
152+
broad_field_sum_byyear |>
153+
ggplot(aes(x = year, y = total_phds,
154+
# reorder broad_field by descending total_phds
155+
color = reorder(broad_field, -total_phds))) +
156+
geom_line() +
157+
geom_point() +
158+
scale_x_continuous(breaks = seq(2008, 2017, 2)) +
159+
scale_y_continuous(labels = comma) +
160+
scale_color_brewer(palette = "Set1") +
161+
theme_bw() +
162+
labs(x = "Year",
163+
y = "Total number of Ph.D.s",
164+
color = "Broad Field",
165+
title = "Number of Ph.D.s awarded by discipline in the United States",
166+
subtitle = "From 2008-2017",
167+
caption = "Data collected by the National Science Foundation")
168+
```
169+
170+
Could also try facets.
171+
```{r}
172+
# re-setting factors of broad_field so that it is re-levelled by the discipline with
173+
# the most phds
174+
# you could also do this manually
175+
broad_field_sum_byyear$broad_field <- fct_reorder(broad_field_sum_byyear$broad_field,
176+
-broad_field_sum_byyear$total_phds)
177+
178+
broad_field_sum_byyear |>
179+
ggplot(aes(x = year, y = total_phds, color = broad_field)) +
180+
geom_line() +
181+
geom_point() +
182+
scale_x_continuous(breaks = seq(2008, 2017, 2)) +
183+
scale_y_continuous(labels = comma) +
184+
facet_wrap(vars(broad_field)) +
185+
theme_bw() +
186+
theme(legend.position = "none") +
187+
labs(x = "Year",
188+
y = "Total number of Ph.D.s",
189+
color = "Broad Field",
190+
title = "Number of Ph.D.s awarded by discipline in the United States",
191+
subtitle = "From 2008-2017",
192+
caption = "Data collected by the National Science Foundation")
193+
```
194+
195+
Can also try with "free_y" axes. You can see the number better but can't as easily compare between the disciplines.
196+
```{r}
197+
broad_field_sum_byyear |>
198+
ggplot(aes(x = year, y = total_phds, color = broad_field)) +
199+
geom_line() +
200+
geom_point() +
201+
scale_x_continuous(breaks = seq(2008, 2017, 2)) +
202+
scale_y_continuous(labels = comma) +
203+
facet_wrap(vars(broad_field), scales = "free_y") +
204+
theme_bw() +
205+
theme(legend.position = "none") +
206+
labs(x = "Year",
207+
y = "Total number of Ph.D.s",
208+
color = "Broad Field",
209+
title = "Number of Ph.D.s awarded by discipline in the United States",
210+
subtitle = "From 2008-2017",
211+
caption = "Data collected by the National Science Foundation")
212+
```
213+
214+
215+
## Visualization 2
216+
217+
Pick the `field` that most closely matches the area of your degree. Make a line graph (with points for each datapoint) that shows how the number of PhDs awarded in your `field` has changed from 2008 to 2017. Make sure your x-axis indicates each year for which you have data, your graph is appropriately labelled, and you think it is aesthetically pleasing.
218+
219+
```{r}
220+
# what are the options for broad_field?
221+
phd_field |>
222+
select(broad_field) |>
223+
unique()
224+
225+
# what are the options for life sciences?
226+
phd_field |>
227+
filter(broad_field == "Life sciences") |>
228+
select(major_field) |>
229+
unique()
230+
231+
# what are the options for Agricultural sciences and natural resources?
232+
phd_field |>
233+
filter(major_field == "Agricultural sciences and natural resources") |>
234+
select(field) |>
235+
unique()
236+
```
237+
238+
I'm picking "Horticulture science" as my field.
239+
```{r}
240+
phd_field |>
241+
filter(field == "Horticulture science") |>
242+
ggplot(aes(x = year, y = n_phds)) +
243+
geom_point() +
244+
geom_line() +
245+
scale_x_continuous(breaks = seq(2008, 2017, 1)) +
246+
theme_minimal() +
247+
labs(x = "Year",
248+
y = "Number of Ph.D. Degrees Awarded",
249+
title = "Number of Ph.D. Degrees Awarded in Horticulture Science in the United States",
250+
subtitle = "Data collected by the National Science Foundation")
251+
```
252+
253+
## Visualization 3
254+
255+
Pick at least 3 additional fields (you can use more if you like) that are adjacent to your Ph.D. field. Make a faceted plot to show the number of degrees awarded in each of these disciplines across the same time period. Make sure you label your plot appropriately and you think it is aesthetic (i.e., if you have squished strip text you want to fix that).
256+
257+
First I am making a vector of the fields in plant science.
258+
```{r}
259+
plant_sci <- c("Agricultural and horticultural plant breeding",
260+
"Agronomy and crop science",
261+
"Horticulture science",
262+
"Plant pathology and phytopathology, agricultural",
263+
"Plant sciences, other")
264+
```
265+
266+
Then I can plot. `\n` adds an automatic line break.
267+
```{r}
268+
phd_field |>
269+
filter(field %in% plant_sci) |>
270+
ggplot(aes(x = year, y = n_phds)) +
271+
geom_point() +
272+
geom_line() +
273+
scale_x_continuous(breaks = seq(2008, 2017, 1)) +
274+
facet_wrap(vars(field), labeller = label_wrap_gen()) + # wraps labels
275+
theme_classic() +
276+
theme(axis.text.x = element_text(angle = 90)) +
277+
labs(x = "Year",
278+
y = "Number of Ph.D. Degrees Awarded",
279+
title = "Number of Ph.D. Degrees Awarded in Plant Science Subdisciplines \nin the United States",
280+
subtitle = "Data collected by the National Science Foundation")
281+
```
282+
283+
Or I can make the same style plot but with bars instead of a lines/points.
284+
285+
```{r}
286+
phd_field |>
287+
filter(field %in% plant_sci) |>
288+
ggplot(aes(x = year, y = n_phds)) +
289+
geom_col() +
290+
scale_x_continuous(breaks = seq(2008, 2017, 1)) +
291+
facet_wrap(vars(field), labeller = label_wrap_gen()) + # wraps labels
292+
theme_classic() +
293+
theme(axis.text.x = element_text(angle = 90)) +
294+
labs(x = "Year",
295+
y = "Number of Ph.D. Degrees Awarded",
296+
title = "Number of Ph.D. Degrees Awarded in Plant Science Subdisciplines \nin the United States",
297+
subtitle = "Data collected by the National Science Foundation")
298+
```
299+
300+

docs/about.html

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,10 @@
316316
</a>
317317
<ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">
318318
<li class="dropdown-header">Module assignment solutions</li>
319+
<li>
320+
<a class="dropdown-item" href="./assignments/modules/module2/module_2_solutions.html">
321+
<span class="dropdown-text">Module 2 solutions</span></a>
322+
</li>
319323
<li><hr class="dropdown-divider"></li>
320324
<li class="dropdown-header">Recitation solutions</li>
321325
<li>
@@ -333,6 +337,10 @@
333337
<li>
334338
<a class="dropdown-item" href="./modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
335339
<span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
340+
</li>
341+
<li>
342+
<a class="dropdown-item" href="./modules/module3/07_distributions/07_distributions_recitation_solutions.html">
343+
<span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
336344
</li>
337345
</ul>
338346
</li>

docs/assignments/capstone.html

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,10 @@
282282
</a>
283283
<ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">
284284
<li class="dropdown-header">Module assignment solutions</li>
285+
<li>
286+
<a class="dropdown-item" href="../assignments/modules/module2/module_2_solutions.html">
287+
<span class="dropdown-text">Module 2 solutions</span></a>
288+
</li>
285289
<li><hr class="dropdown-divider"></li>
286290
<li class="dropdown-header">Recitation solutions</li>
287291
<li>
@@ -299,6 +303,10 @@
299303
<li>
300304
<a class="dropdown-item" href="../modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
301305
<span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
306+
</li>
307+
<li>
308+
<a class="dropdown-item" href="../modules/module3/07_distributions/07_distributions_recitation_solutions.html">
309+
<span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
302310
</li>
303311
</ul>
304312
</li>

docs/assignments/modules/module1/module_1_assignment.html

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,10 @@
282282
</a>
283283
<ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">
284284
<li class="dropdown-header">Module assignment solutions</li>
285+
<li>
286+
<a class="dropdown-item" href="../../../assignments/modules/module2/module_2_solutions.html">
287+
<span class="dropdown-text">Module 2 solutions</span></a>
288+
</li>
285289
<li><hr class="dropdown-divider"></li>
286290
<li class="dropdown-header">Recitation solutions</li>
287291
<li>
@@ -299,6 +303,10 @@
299303
<li>
300304
<a class="dropdown-item" href="../../../modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
301305
<span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
306+
</li>
307+
<li>
308+
<a class="dropdown-item" href="../../../modules/module3/07_distributions/07_distributions_recitation_solutions.html">
309+
<span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
302310
</li>
303311
</ul>
304312
</li>

0 commit comments

Comments
 (0)