jcooperstone
diff --git a/‎_quarto.yml‎
Lines changed: 4 additions & 4 deletions b/‎_quarto.yml‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎assignments/modules/module2/module_2_solutions.qmd‎
Lines changed: 300 additions & 0 deletions b/‎assignments/modules/module2/module_2_solutions.qmd‎
Lines changed: 300 additions & 0 deletions
diff --git a/‎docs/about.html‎
Lines changed: 8 additions & 0 deletions b/‎docs/about.html‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/assignments/capstone.html‎
Lines changed: 8 additions & 0 deletions b/‎docs/assignments/capstone.html‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/assignments/modules/module1/module_1_assignment.html‎
Lines changed: 8 additions & 0 deletions b/‎docs/assignments/modules/module1/module_1_assignment.html‎
Lines changed: 8 additions & 0 deletions
@@ -181,8 +181,8 @@ website:
       contents:
         - section: Module assignment solutions
           contents:
-            # - href: assignments/modules/module2/module_2_solutions.qmd
-            #   text: Module 2 solutions
+            - href: assignments/modules/module2/module_2_solutions.qmd
+              text: Module 2 solutions
             # - href: assignments/modules/module3/module_3_solutions.qmd
             #   text: Module 3 solutions
             # - href: assignments/modules/module4/module_4_solutions.qmd
@@ -198,8 +198,8 @@ website:
               text: Week 5 - ggplot101 solutions
             - href: modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.qmd
               text: Week 6 - ggplot102 solutions
-            # - href: modules/module3/07_distributions/07_distributions_recitation_solutions.qmd
-            #   text: Week 7 - Data distributions solutions
+            - href: modules/module3/07_distributions/07_distributions_recitation_solutions.qmd
+              text: Week 7 - Data distributions solutions
             # - href: modules/module3/08_correlations/08_correlations_recitation_solutions.qmd
             #   text: Week 8 - Correlations solutions
             # - href: modules/module3/09_add-stats/09_add-stats_recitation_solutions.qmd
 
@@ -0,0 +1,300 @@
+---
+title: "Module 2 Assignment Solutions"
+author: "Jessica Cooperstone"
+format:
+  html:
+    toc: true
+    toc-depth: 4
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+## Introduction
+This is your assignment for Module 2, focused on the material you learned in the lectures and recitation activities on RMarkdown, wrangling, ggplot101, and ggplot102. 
+
+You will submit this assignment by uploading a knitted .html to Carmen. Make sure you include the Code Download button, and please show your code within your knitted .html as well. Customize the YAML and the document so you like how it looks.
+
+Remember there are often many ways to reach the same end product.
+
+> This assignment will be due on Tuesday, October 1, 2024, at 11:59pm.
+
+### Data
+The [data](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-19) we will be using is collected by the National Science Foundation about the fields and number of Ph.D. degrees awarded each year.
+```{r, message = FALSE, warning = FALSE}
+phd_field <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-19/phd_by_field.csv")
+```
+
+Take a look at the data collected by NSF on how which fields give PhDs each year, and how many are awarded.
+
+## Including the code download button
+
+To get a code download button, you need to indicate that you want one in your YAML. You do this by setting `code_download: true` in your YAML. I'm clipping a YAML below which could be copied and edited.
+
+```{r, eval = FALSE}
+---
+title: "Module 2 assignment"
+author: "Jessica Cooperstone"
+date: "October 1, 2024"
+output: 
+  html_document: # knit to a .html doc
+    toc: true # creates a table of contents
+    toc_float: true # has that TOC float so you can see it even when you scroll
+    number_sections: true # numbers your sections
+    theme: flatly # set a global theme, this is what i use for this site
+    code_download: true # insert the code download button
+---
+```
+
+## Writing in Markdown 1
+
+Using coding in text, write a sentence in markdown that pulls from this data how many total PhDs were awarded in 2017. If you want to make some calculations in a code chunk first that is ok.
+
+```{r, warning = FALSE, message = FALSE}
+library(tidyverse)
+library(scales) # for using comma format 
+```
+
+Setting as an object the number for the total Ph.D.s earned in 2017.
+```{r}
+phds_2017 <- phd_field |>
+  filter(year == 2017) |>
+  select(n_phds) |>
+  colSums(na.rm = TRUE) # calculate a column sum
+```
+
+You could also do it this way:
+```{r}
+phd_field |>
+  filter(year == 2017) |>
+  summarize(total_phds = sum(n_phds, # sum n_phds
+                             na.rm = TRUE)) # remove missing values
+```
+
+How to write in Markdown:
+
+In 2017, there were were `` `r '\x60r format(phds_2017, scientific = F, big.mark = ",")\x60'` `` Ph.D. degrees awarded in the United States.
+
+In 2017, there were were `` `r '\x60r phd_field |> filter(year == 2017) |> select(n_phds) |> colSums(na.rm = TRUE) |> format(scientific = F, big.mark = ",")\x60'` `` Ph.D. degrees awarded in the United States.
+
+How this will look when rendered to html:
+
+In 2017, there were `r format(phds_2017, scientific = F, big.mark = ",")` Ph.D. degrees awarded in the United States.
+
+In 2017, there were `r phd_field |> filter(year == 2017) |> select(n_phds) |> colSums(na.rm = TRUE) |> format(scientific = F, big.mark = ",")` Ph.D. degrees awarded in the United States.
+
+## Visualization 1
+Make a chart to visualize of the total number of PhDs awarded for each `broad_field` across the total time period of this data. You pick the type of chart that you think is appropriate, and make sure your plot is appropriately labelled and you are happy with how it looks. Hint, to do this you'll probably have to do some data wrangling first.
+
+
+First I will calculate the total number of Ph.D.s awarded across each `broad_field` for the whole time period.
+```{r}
+broad_field_sum <- phd_field |>
+  group_by(broad_field) |>
+  summarize(broad_field_sum = sum(n_phds, na.rm = TRUE)) |>
+  arrange(-broad_field_sum)
+
+broad_field_sum
+```
+
+This summary helps me know what to expect to see in my plots. Now I can create some plots.
+
+The first one will just show the total number of Ph.D.s across each `broad_field`. I decided to put the `broad_field` on the y-axis so that the labels are easier to read. I've also re-ordered `broad_field` based on the total number of PhDs for each category (i.e., `broad_field_sum`). I chose to use no colors because I didn't feel like it was adding much here.
+```{r}
+broad_field_sum |>
+  ggplot(aes(x = broad_field_sum, y = fct_reorder(broad_field, broad_field_sum))) +
+  geom_col(color = "black", fill = "grey") +
+  scale_x_continuous(labels = comma) + # add a comma to the x-axis breaks
+  theme_minimal() +
+  labs(x = "Total number of Ph.D.s",
+       y = "", # no label on the y
+       title = "Number of PhDs awarded across different \nbroad disciplines from 2008-2017",
+       caption = "Data collected by the National Science Foundation")
+```
+
+The following few examples weren't exactly the question (which was to show total PhDs across each `broad_field` across the total time period of this data), but I'll show you here how to make some other stuff.
+```{r}
+broad_field_sumonly_eachyear <- phd_field |>
+  group_by(year) |>
+  summarize(all_the_phds = sum(n_phds, na.rm = TRUE)) 
+
+broad_field_sumonly_eachyear
+```
+
+Then we can plot.
+```{r}
+broad_field_sumonly_eachyear |>
+  ggplot(aes(x = year, y = all_the_phds)) +
+  geom_col(color = "black", fill = "grey") +
+  scale_y_continuous(labels = comma) + # add a comma to the x-axis breaks
+  scale_x_continuous(breaks = seq(2008, 2017, 1)) +
+  theme_minimal() +
+  labs(x = "Year",
+       y = "Number of Ph.Ds", 
+       title = "Total number of Ph.D.s awarded in the United States per year from 2008-2017",
+       caption = "Data collected by the National Science Foundation")
+```
+
+This wasn't exactly the question, but if we want to see a little bit better this over the time period, first  we need to create a df that has total PhDs per broad field per year. We can do this by adding `year` to our `group_by()` statement.
+```{r}
+broad_field_sum_byyear <- phd_field |>
+  group_by(broad_field, year) |>
+  summarize(total_phds = sum(n_phds, na.rm = TRUE)) 
+
+# what does that look like
+head(broad_field_sum_byyear)
+```
+
+Now we can plot. We can now make different types of plots. Let's start simpler. Here I've re-ordered the legend to be in the same order as the chart so you can see the disciplines that award the most Ph.D.s just by looking at the order of the legend.
+
+```{r}
+broad_field_sum_byyear |>
+  ggplot(aes(x = year, y = total_phds, 
+    # reorder broad_field by descending total_phds         
+             color = reorder(broad_field, -total_phds))) +
+  geom_line() +
+  geom_point() +
+  scale_x_continuous(breaks = seq(2008, 2017, 2)) +
+  scale_y_continuous(labels = comma) +
+  scale_color_brewer(palette = "Set1") +
+  theme_bw() +
+  labs(x = "Year",
+       y = "Total number of Ph.D.s",
+       color = "Broad Field",
+       title = "Number of Ph.D.s awarded by discipline in the United States",
+       subtitle = "From 2008-2017",
+       caption = "Data collected by the National Science Foundation")
+```
+
+Could also try facets.
+```{r}
+# re-setting factors of broad_field so that it is re-levelled by the discipline with
+# the most phds
+# you could also do this manually
+broad_field_sum_byyear$broad_field <- fct_reorder(broad_field_sum_byyear$broad_field,
+                                                  -broad_field_sum_byyear$total_phds)
+
+broad_field_sum_byyear |>
+  ggplot(aes(x = year, y = total_phds, color = broad_field)) +
+  geom_line() +
+  geom_point() +
+  scale_x_continuous(breaks = seq(2008, 2017, 2)) +
+  scale_y_continuous(labels = comma) +
+  facet_wrap(vars(broad_field)) +
+  theme_bw() +
+  theme(legend.position = "none") +
+  labs(x = "Year",
+       y = "Total number of Ph.D.s",
+       color = "Broad Field",
+       title = "Number of Ph.D.s awarded by discipline in the United States",
+       subtitle = "From 2008-2017",
+       caption = "Data collected by the National Science Foundation")
+```
+
+Can also try with "free_y" axes. You can see the number better but can't as easily compare between the disciplines.
+```{r}
+broad_field_sum_byyear |>
+  ggplot(aes(x = year, y = total_phds, color = broad_field)) +
+  geom_line() +
+  geom_point() +
+  scale_x_continuous(breaks = seq(2008, 2017, 2)) +
+  scale_y_continuous(labels = comma) +
+  facet_wrap(vars(broad_field), scales = "free_y") +
+  theme_bw() +
+  theme(legend.position = "none") +
+  labs(x = "Year",
+       y = "Total number of Ph.D.s",
+       color = "Broad Field",
+       title = "Number of Ph.D.s awarded by discipline in the United States",
+       subtitle = "From 2008-2017",
+       caption = "Data collected by the National Science Foundation")
+```
+
+
+## Visualization 2
+
+Pick the `field` that most closely matches the area of your degree. Make a line graph (with points for each datapoint) that shows how the number of PhDs awarded in your `field` has changed from 2008 to 2017. Make sure your x-axis indicates each year for which you have data, your graph is appropriately labelled, and you think it is aesthetically pleasing.
+
+```{r}
+# what are the options for broad_field?
+phd_field |>
+  select(broad_field) |>
+  unique()
+
+# what are the options for life sciences?
+phd_field |>
+  filter(broad_field == "Life sciences") |>
+  select(major_field) |>
+  unique()
+
+# what are the options for Agricultural sciences and natural resources?
+phd_field |>
+  filter(major_field == "Agricultural sciences and natural resources") |>
+  select(field) |>
+  unique()
+```
+
+I'm picking "Horticulture science" as my field.
+```{r}
+phd_field |>
+  filter(field == "Horticulture science") |>
+  ggplot(aes(x = year, y = n_phds)) +
+  geom_point() +
+  geom_line() +
+  scale_x_continuous(breaks = seq(2008, 2017, 1)) +
+  theme_minimal() +
+  labs(x = "Year",
+       y = "Number of Ph.D. Degrees Awarded",
+       title = "Number of Ph.D. Degrees Awarded in Horticulture Science in the United States",
+       subtitle = "Data collected by the National Science Foundation")
+```
+
+## Visualization 3
+
+Pick at least 3 additional fields (you can use more if you like) that are adjacent to your Ph.D. field. Make a faceted plot to show the number of degrees awarded in each of these disciplines across the same time period. Make sure you label your plot appropriately and you think it is aesthetic (i.e., if you have squished strip text you want to fix that).
+
+First I am making a vector of the fields in plant science.
+```{r}
+plant_sci <- c("Agricultural and horticultural plant breeding",
+               "Agronomy and crop science",
+               "Horticulture science",
+               "Plant pathology and phytopathology, agricultural",
+               "Plant sciences, other")
+```
+
+Then I can plot. `\n` adds an automatic line break.
+```{r}
+phd_field |>
+  filter(field %in% plant_sci) |>
+  ggplot(aes(x = year, y = n_phds)) +
+  geom_point() +
+  geom_line() +
+  scale_x_continuous(breaks = seq(2008, 2017, 1)) +
+  facet_wrap(vars(field), labeller = label_wrap_gen()) + # wraps labels
+  theme_classic() +
+  theme(axis.text.x = element_text(angle = 90)) +
+  labs(x = "Year",
+       y = "Number of Ph.D. Degrees Awarded",
+       title = "Number of Ph.D. Degrees Awarded in Plant Science Subdisciplines \nin the United States",
+       subtitle = "Data collected by the National Science Foundation")
+```
+
+Or I can make the same style plot but with bars instead of a lines/points.
+
+```{r}
+phd_field |>
+  filter(field %in% plant_sci) |>
+  ggplot(aes(x = year, y = n_phds)) +
+  geom_col() +
+  scale_x_continuous(breaks = seq(2008, 2017, 1)) +
+  facet_wrap(vars(field), labeller = label_wrap_gen()) + # wraps labels
+  theme_classic() +
+  theme(axis.text.x = element_text(angle = 90)) +
+  labs(x = "Year",
+       y = "Number of Ph.D. Degrees Awarded",
+       title = "Number of Ph.D. Degrees Awarded in Plant Science Subdisciplines \nin the United States",
+       subtitle = "Data collected by the National Science Foundation")
+```
+
+
@@ -316,6 +316,10 @@
     </a>
     <ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">    
         <li class="dropdown-header">Module assignment solutions</li>
+        <li>
+    <a class="dropdown-item" href="./assignments/modules/module2/module_2_solutions.html">
+ <span class="dropdown-text">Module 2 solutions</span></a>
+  </li>  
         <li><hr class="dropdown-divider"></li>
         <li class="dropdown-header">Recitation solutions</li>
         <li>
@@ -333,6 +337,10 @@
         <li>
     <a class="dropdown-item" href="./modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
  <span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
+  </li>  
+        <li>
+    <a class="dropdown-item" href="./modules/module3/07_distributions/07_distributions_recitation_solutions.html">
+ <span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
   </li>  
     </ul>
   </li>
 
@@ -282,6 +282,10 @@
     </a>
     <ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">    
         <li class="dropdown-header">Module assignment solutions</li>
+        <li>
+    <a class="dropdown-item" href="../assignments/modules/module2/module_2_solutions.html">
+ <span class="dropdown-text">Module 2 solutions</span></a>
+  </li>  
         <li><hr class="dropdown-divider"></li>
         <li class="dropdown-header">Recitation solutions</li>
         <li>
@@ -299,6 +303,10 @@
         <li>
     <a class="dropdown-item" href="../modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
  <span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
+  </li>  
+        <li>
+    <a class="dropdown-item" href="../modules/module3/07_distributions/07_distributions_recitation_solutions.html">
+ <span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
   </li>  
     </ul>
   </li>
 
@@ -282,6 +282,10 @@
     </a>
     <ul class="dropdown-menu" aria-labelledby="nav-menu-solutions">    
         <li class="dropdown-header">Module assignment solutions</li>
+        <li>
+    <a class="dropdown-item" href="../../../assignments/modules/module2/module_2_solutions.html">
+ <span class="dropdown-text">Module 2 solutions</span></a>
+  </li>  
         <li><hr class="dropdown-divider"></li>
         <li class="dropdown-header">Recitation solutions</li>
         <li>
@@ -299,6 +303,10 @@
         <li>
     <a class="dropdown-item" href="../../../modules/module2/06_ggplot102/06_ggplot102_recitation_solutions.html">
  <span class="dropdown-text">Week 6 - ggplot102 solutions</span></a>
+  </li>  
+        <li>
+    <a class="dropdown-item" href="../../../modules/module3/07_distributions/07_distributions_recitation_solutions.html">
+ <span class="dropdown-text">Week 7 - Data distributions solutions</span></a>
   </li>  
     </ul>
   </li>