From 240760887daf64c1ab2a1eabc5b3fe59bb264800 Mon Sep 17 00:00:00 2001 From: Jia Qi Beh Date: Wed, 26 Mar 2025 11:51:41 +1100 Subject: [PATCH 1/2] modified episode-03 --- .../03-data-cleaning-and-transformation.Rmd | 200 +++++++++++------- 1 file changed, 118 insertions(+), 82 deletions(-) diff --git a/episodes/03-data-cleaning-and-transformation.Rmd b/episodes/03-data-cleaning-and-transformation.Rmd index cb51cb58..77921a65 100644 --- a/episodes/03-data-cleaning-and-transformation.Rmd +++ b/episodes/03-data-cleaning-and-transformation.Rmd @@ -7,26 +7,21 @@ source: Rmd ::::::::::::::::::::::::::::::::::::::: objectives -- Describe the purpose of an R package and the **`dplyr`** and **`tidyr`** packages. -- Select certain columns in a data frame with the **`dplyr`** function `select`. -- Select certain rows in a data frame according to filtering conditions with the **`dplyr`** function `filter`. -- Link the output of one **`dplyr`** function to the input of another function with the 'pipe' operator `%>%`. -- Add new columns to a data frame that are functions of existing columns with `mutate`. +- Describe the functions available in the **`dplyr`** and **`tidyr`** packages. +- Recognise and use the following functions: `select()`, `filter()`, `rename()`, +`recode()`, `mutate()` and `arrange()`. +- Combine one or more functions using the 'pipe' operator `%>%`. - Use the split-apply-combine concept for data analysis. -- Use `summarize`, `group_by`, and `count` to split a data frame into groups of observations, apply a summary statistics for each group, and then combine the results. -- Describe the concept of a wide and a long table format and for which purpose those formats are useful. -- Describe what key-value pairs are. -- Reshape a data frame from long to wide format and back with the `pivot_wider` and `pivot_longer` commands from the **`tidyr`** package. - Export a data frame to a csv file. :::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::: questions -- How can I select specific rows and/or columns from a data frame? +- How can I create new columns or remove exisitng columns from a data frame? +- How can I rename variables or assign new values to a variable? - How can I combine multiple commands into a single command? -- How can create new columns or remove existing columns from a data frame? -- How can I reformat a dataframe to meet my needs? +- How can I export my data frame to a csv file? :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -36,48 +31,7 @@ library(readr) books <- read_csv("./data/books.csv") ``` -### Getting set up - -#### Open your R Project file - -If you have not already done so, open your R Project file (`library_carpentry.Rproj`) created in the `Before We Start` lesson. - -**If you did not complete that step** then do the following: - -- Under the `File` menu, click on `New project`, choose `New directory`, then - `New project` -- Enter the name `library_carpentry` for this new folder (or "directory"). This - will be your **working directory** for the rest of the day. -- Click on `Create project` -- Create a new file where we will type our scripts. Go to File > New File > R - script. Click the save icon on your toolbar and save your script as - "`script.R`". -- Copy and paste the below lines of code to create three new subdirectories and download the data: - -```{r create-dirs, eval=FALSE} -library(fs) # https://fs.r-lib.org/. fs is a cross-platform, uniform interface to file system operations via R. -dir_create("data") -dir_create("data_output") -dir_create("fig_output") -download.file("https://ndownloader.figshare.com/files/22031487", - "data/books.csv", mode = "wb") -``` - -#### Load the `tidyverse` and data frame into your R session - -Load the `tidyverse` - -```{r load-data, purl=FALSE} -library(tidyverse) -``` - -And the `books` data we saved in the previous lesson. - -```{r, eval=FALSE, purl=FALSE} -books <- read_csv("data/books.csv") # load the data and assign it to books -``` - -### Transforming data with `dplyr` +## Transforming data with `dplyr` We are now entering the data cleaning and transforming phase. While it is possible to do much of the following using Base R functions (in other words, @@ -105,7 +59,41 @@ We're going to learn some of the most common **`dplyr`** functions: - `arrange()`: sort results - `count()`: count discrete values -### Renaming variables +
+ +::::::::::::::::::::::::::::::::::::::::: prereq + +### Getting set up + +Make sure you have your Rproj and directories set up from the previous episode. + +**If you have not completed that step**, run the following in your Rproj: + +```{r create-dirs, eval=FALSE} +library(fs) # https://fs.r-lib.org/. fs is a cross-platform, uniform interface to file system operations via R. +dir_create("data") +dir_create("data_output") +dir_create("fig_output") +download.file("https://ndownloader.figshare.com/files/22031487", + "data/books.csv", mode = "wb") +``` + +Then load the `tidyverse` package + +```{r load-data, purl=FALSE} +library(tidyverse) +``` + +and the `books` data we saved in the previous lesson. + +```{r, eval=FALSE, purl=FALSE} +books <- read_csv("data/books.csv") # load the data and assign it to books +``` + +::::::::::::::::::::::::::::::::::::::::: + + +## Renaming variables It is often necessary to rename variables to make them more meaningful. If you print the names of the sample `books` dataset you can see that some of the @@ -131,6 +119,15 @@ books <- rename(books, title = X245.ab) ``` +::::::::::::::::::::::::::::::::::::::::: callout + +### Tip: + +When using the `rename()` function, the new variable name comes first. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + ::::::::::::::::::::::::::::::::::::::::: callout ### Side note: @@ -140,9 +137,11 @@ because R variables cannot start with a number, R automatically inserted an X, and because pipes | are not allowed in variable names, R replaced it with a period. - :::::::::::::::::::::::::::::::::::::::::::::::::: + +You can also rename multiple variables at once by running: + ```{r, purl=FALSE} # rename multiple variables at once books <- rename(books, @@ -178,6 +177,7 @@ books <- rename(books, :::::::::::::::::::::::::::::::::::::::::::::::::: + ### Recoding values It is often necessary to recode or reclassify values in your data. For example, @@ -192,9 +192,9 @@ knitr::include_graphics("fig/BCODE1.png") knitr::include_graphics("fig/BCODE2.png") ``` -You can do this easily using the `recode()` function, also in the `dplyr` -package. Unlike `rename()`, the old value comes first here. Also notice that we -are overwriting the `books$subCollection` variable. +You can reassign names to the values easily using the `recode()` function +from the `dplyr` package. Unlike `rename()`, the old value comes first here. +Also notice that we are overwriting the `books$subCollection` variable. ```{r, comment=FALSE} # first print to the console all of the unique values you will need to recode @@ -231,9 +231,17 @@ books$format <- recode(books$format, "4" = "online video") ``` -### Subsetting dataframes +Once you have finished recoding the values for the two variables, examine +the dataset again and check to see if you have different values this time: -#### Subsetting using `filter()` in the `dplyr` package +```{r, purl=FALSE, eval=FALSE} +distinct(books, subCollection, format) +``` + + +## Subsetting dataframes + +### **Subset rows with `filter()`** In the last lesson we learned how to subset a data frame using brackets. As with other R functions, the `dplyr` package makes it much more straightforward, using @@ -277,7 +285,7 @@ serial_microform <- filter(books, format %in% c("serial", "microform")) ::::::::::::::::::::::::::::::::::::::: challenge -### Filtering with `filter()` +### Exercise: Subsetting with `filter()` 1. Use `filter()` to create a data frame called `booksJuv` consisting of `format` books and `subCollection` juvenile materials. @@ -298,7 +306,10 @@ mean(booksJuv$tot_chkout) :::::::::::::::::::::::::::::::::::::::::::::::::: -### Selecting variables +
+ + +### **Subset columns with `select()`** The `select()` function allows you to keep or remove specific columns It also provides a convenient way to reorder variables. @@ -315,7 +326,20 @@ books <- select(books, -location) booksReordered <- select(books, title, tot_chkout, loutdate, everything()) ``` -### Ordering data +::::::::::::::::::::::::::::::::::::::: callout + +### Tips: + +If your variable name has spaces, use `` to close the variable: + +```{r, comment=NA} +select(df, -`my variable`) +``` + +::::::::::::::::::::::::::::::::::::::: + + +## Ordering data with `arrange()` The `arrange()` function in the `dplyr` package allows you to sort your data by alphabetical or numerical order. @@ -330,7 +354,8 @@ booksHighestChkout booksChkoutYear <- arrange(books, desc(tot_chkout), desc(pubyear)) ``` -### Creating new variables + +## Creating new variables with `mutate()` The `mutate()` function allows you to create new variables. Here, we use the `str_sub()` function from the `stringr` package to extract the first character @@ -353,7 +378,9 @@ books <- mutate(books, pubyear = as.integer(pubyear)) We see the error message `NAs introduced by coercion`. This is because non-numerical variables become `NA` and the remainder become integers. -### Putting it all together with %>% + + +## Putting it all together with %>% The [Pipe Operator](https://www.datacamp.com/community/tutorials/pipe-r-tutorial) `%>%` is loaded with the `tidyverse`. It takes the output of one statement and makes it @@ -378,7 +405,7 @@ myBooks ::::::::::::::::::::::::::::::::::::::: challenge -### Playing with pipes `%>%` +### Exercise: Playing with pipes `%>%` 1. Create a new data frame `booksKids` with these conditions: @@ -405,14 +432,18 @@ mean(booksKids$tot_chkout) :::::::::::::::::::::::::::::::::::::::::::::::::: -### Split-apply-combine data analysis and the `summarize()` function + +### Split-apply-combine and the `summarize()` function Many data analysis tasks can be approached using the *split-apply-combine* paradigm: split the data into groups, apply some analysis to each group, and then combine the results. **`dplyr`** makes this very easy through the use of the `group_by()` function. -##### The `summarize()` function +
+ + +#### The `summarize()` function `group_by()` is often used together with `summarize()`, which collapses each group into a single-row summary of that group. `group_by()` takes as arguments @@ -454,7 +485,10 @@ Let's break this down step by step: - Finally, we arrange `sum_tot_chkout` in descending order, so we can see the class with the most total checkouts. We can see it is the `E` class (History of America), followed by `NA` (items with no call number data), followed by `H` (Social Sciences) and `P` (Language and Literature). -### Pattern matching +
+ + +### **Pattern matching** Cleaning text with the `stringr` package is easier when you have a basic understanding of 'regex', or regular expression pattern matching. Regex is especially useful for manipulating strings (alphanumeric data), and is the backbone of search-and-replace operations in most applications. Pattern matching is common to all programming languages but regex syntax is often code-language specific. Below, find an example of using pattern matching to find and replace data in R: @@ -472,7 +506,9 @@ books %>% select(title_modified, title) ``` -### Exporting data + + +## Exporting data Now that you have learned how to use **`dplyr`** to extract information from or summarize your raw data, you may want to export these new data sets to share @@ -481,14 +517,11 @@ them with your collaborators or for archival. Similar to the `read_csv()` function used for reading CSV files into R, there is a `write_csv()` function that generates CSV files from data frames. -Before using `write_csv()`, we are going to create a new folder, `data_output`, -in our working directory that will store this generated dataset. We don't want -to write generated datasets in the same directory as our raw data. It's good -practice to keep them separate. The `data` folder should only contain the raw, +We have previously created the directory `data_output` which we will use to +store our generated dataset. It is good practice to keep your input and output +data in separate folders--the `data` folder should only contain the raw, unaltered data, and should be left alone to make sure we don't delete or modify -it. In contrast, our script will generate the contents of the `data_output` -directory, so even if the files it contains are deleted, we can always -re-generate them. +it. In preparation for our next lesson on plotting, we are going to create a version of the dataset with most of the changes we made above. We will first read in the original, then make all the changes with pipes. @@ -545,6 +578,7 @@ We now write it to a CSV and put it in the `data/output` sub-directory: write_csv(books_reformatted, "./data_output/books_reformatted.csv") ``` + ## Help with dplyr - Read more about `dplyr` at [https://dplyr.tidyverse.org/](https://dplyr.tidyverse.org/). @@ -552,15 +586,17 @@ write_csv(books_reformatted, "./data_output/books_reformatted.csv") - See the [http://r4ds.had.co.nz/transform.html]("Data Transformation" chapter) in Garrett Grolemund and Hadley Wickham's book *R for Data Science.* - Watch this Data School video: [https://www.youtube.com/watch?v=jWjqLW-u3hc](Hands-on dplyr tutorial for faster data manipulation in R.) -## Wrangling dataframes with tidyr :::::::::::::::::::::::::::::::::::::::: keypoints - Use the `dplyr` package to manipulate dataframes. -- Use `select()` to choose variables from a dataframe. -- Use `filter()` to choose data based on values. -- Use `group_by()` and `summarize()` to work with subsets of data. +- Subset data frames using `select()` and `filter()`. +- Rename variables in a data frame using `rename()`. +- Recode values in a data frame using `recode()`. - Use `mutate()` to create new variables. +- Sort data using `arrange()`. +- Use `group_by()` and `summarize()` to work with subsets of data. +- Use pipe (`%>%`) to combine multiple commands. :::::::::::::::::::::::::::::::::::::::::::::::::: From 4a7a9269413afb763ff704c1fb85123ac2b0871c Mon Sep 17 00:00:00 2001 From: Jia Qi Beh Date: Wed, 26 Mar 2025 12:08:30 +1100 Subject: [PATCH 2/2] modified episode-04 --- episodes/04-data-viz-ggplot.Rmd | 163 ++++++++++++++------------------ 1 file changed, 71 insertions(+), 92 deletions(-) diff --git a/episodes/04-data-viz-ggplot.Rmd b/episodes/04-data-viz-ggplot.Rmd index 59fd3d1b..08298aae 100644 --- a/episodes/04-data-viz-ggplot.Rmd +++ b/episodes/04-data-viz-ggplot.Rmd @@ -7,6 +7,7 @@ source: Rmd ::::::::::::::::::::::::::::::::::::::: objectives +- Understand the basic syntax of the `ggplot` function. - Produce scatter plots, boxplots, and time series plots using ggplot. - Set universal plot settings. - Describe what faceting is and apply faceting in ggplot. @@ -32,23 +33,11 @@ books2 <- read_csv("./data/books_reformatted.csv") ## Getting set up -### Set up your directories and data +### **Set up your directories and data** -If you have not already done so, open your R Project file (`library_carpentry.Rproj`) created in the `Before We Start` lesson. +If you have not already done so, open your R Project file (`library_carpentry.Rproj`) created in the `Before We Start` lesson. Ensure that you have the following sub-directories in your Rproj: `data\`, `data_output\` and `fig_output\`. -**If you did not complete that step** then do the following. Only do this if you -didn't complete it in previous lessons. - -- Under the `File` menu, click on `New project`, choose `New directory`, then - `New project` -- Enter the name `library_carpentry` for this new folder (or "directory"). This - will be your **working directory** for the rest of the day. -- Click on `Create project` -- Create a new file where we will type our scripts. Go to File > New File > R - script. Click the save icon on your toolbar and save your script as - "`script.R`". -- Copy and paste the below lines of code to create three new subdirectories and - download the original and the reformatted `books` data: +Run the following code chunk to create the sub-directories and download the dataset. ```{r create-dirs, eval=FALSE} library(fs) # https://fs.r-lib.org/. fs is a cross-platform, uniform interface to file system operations via R. @@ -61,25 +50,24 @@ download.file("https://ndownloader.figshare.com/files/22051506", "data_output/books_reformatted.csv", mode = "wb") ``` -### Load the `tidyverse` and data frame into your R session - -Load the `tidyverse` and the `lubridate` packages. `lubridate` is installed with -the tidyverse, but is not one of the core tidyverse packages loaded with -`library(tidyverse)`, so it needs to be explicitly called. `lubridate` makes -working with dates and times easier in R. +Then load the packages required for this lesson. ```{r load-data, purl=FALSE, results="hold"} library(tidyverse) # load the core tidyverse library(lubridate) # load lubridate ``` +The`lubridate` package is installed with the tidyverse, but is not one of the core tidyverse packages loaded with `library(tidyverse)`, so it needs to be explicitly called. `lubridate` makes working with dates and times easier in R. + We also load the `books_reformatted` data we saved in the previous -lesson. We'll assign it to `books2`. +lesson. Here, we'll assign it to `books2`. You can create the reformatted data by running the codes from [here](https://librarycarpentry.github.io/lc-r/03-data-cleaning-and-transformation.html#exporting-data) or by loading the saved CSV file from previous episode. ```{r, purl=FALSE, eval=FALSE} books2 <- read_csv("data_output/books_reformatted.csv") # load the data and assign it to books ``` + + ## Plotting with **`ggplot2`** Base R contains a number of functions for quick data visualization such as @@ -97,6 +85,10 @@ change or if we decide to change from a bar plot to a scatterplot. This helps in creating publication quality plots with minimal amounts of adjustments and tweaking. +::::::::::::::::::::::::::::::::::::::::: callout + +## Tips on plotting with ggplot2: + `ggplot2` functions like data in the 'long' format, i.e., a column for every dimension, and a row for every observation. Well-structured data will save you lots of time when making figures with `ggplot2` @@ -104,40 +96,35 @@ lots of time when making figures with `ggplot2` ggplot graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots. +::::::::::::::::::::::::::::::::::::::::: + + To build a ggplot, we will use the following basic template that can be used for different types of plots: ``` ggplot(data = , mapping = aes()) + () ``` -- use the `ggplot()` function and bind the plot to a specific data frame using - the `data` argument +- The `data` argument is used to bind the plot to a specific data frame in `ggplot2`. +- The `mapping` argument defines the variables mapped to various aesthetics of the plot, e.g. the x and y axis. +- The `geom_function` argument defines the type of plot, e.g. barplot, scatter plot, boxplot. ::::::::::::::::::::::::::::::::::::::::::::; callout -**Note on Aesthetic Mappings:** -In our basic template the `aes()` function is used inside `ggplot()`. +## `ggplot2` versus `ggplot` -```r -ggplot(data = , mapping = aes()) + () -``` - -This sets **global** aesthetics that apply to all layers you add later, such as, geoms and scales. You might sometimes see the aesthetics defined inside a specific geom function like so: - -```r -ggplot(data = booksPlot) + - geom_histogram(aes(x = tot_chkout), binwidth = 10) + - scale_y_log10() -``` +People are sometimes confused between `ggplot2` and `ggplot`. The former refers to the name of the package, while the latter refers to the function that you run. -In this case, the aesthetic mapping is **local** to `geom_histogram()`. This approach lets you specify or override settings for that particular layer without affecting others. In short, using `aes()` globally means every layer inherits the same settings, while using it locally gives you the flexibility to tailor individual layers as needed. +::::::::::::::::::::::::::::::::::::::::: -:::::::::: When you run the `ggplot()` function, it plots directly to the Plots tab in the Navigation Pane (lower right). Alternatively, you can assign a plot to an R object, then call `print()`to view the plot in the Navigation Pane. + +## Creating your first plot + Let's create a `booksPlot` and limit our visualization to only items in `subCollection` general collection, juvenile, and k-12, and filter out items with `NA` in `call_class`. We do this by using the | key on the keyboard to specify a boolean OR, and use the `!is.na()` function to keep only those items that are NOT `NA` in the `call_class` column. ```{r, purl=FALSE} @@ -149,35 +136,14 @@ booksPlot <- books2 %>% !is.na(call_class)) ``` -**`ggplot2`** -**`ggplot2`** functions like data in the 'long' format, i.e., a column for every -dimension, and a row for every observation. Well-structured data will save you -lots of time when making figures with **`ggplot2`** - -ggplot graphics are built step by step by adding new elements. Adding layers in -this fashion allows for extensive flexibility and customization of plots. - -To build a ggplot, we will use the following basic template that can be used for different types of plots: - -``` -ggplot(data = , mapping = aes()) + () -``` - -Use the `ggplot()` function and bind the plot to a specific data frame using the -`data` argument. +To start with, we first bind our data to the `ggplot()` function using the `data` +argument. ```{r, purl=FALSE} ggplot(data = booksPlot) # a blank canvas ``` -Not very interesting. We need to add layers to it by defining a mapping -aesthetic and adding `geoms`. - -## Define a mapping with `aes()` and display data with `geoms` - -Define a mapping (using the aesthetic (`aes()`) function), by selecting the -variables to be plotted and specifying how to present them in the graph, e.g. as -x/y positions or characteristics such as size, shape, color, etc. +After running this code, a blank canvas is being created. Next, we define a mapping aesthetic with the `mapping` argument (using the aesthetic (`aes()`) function). This defines the variables to be plotted and specifies how to present them in the graph, e.g. as x/y positions or characteristics such as size, shape, color, etc. ```{r, purl=FALSE} ggplot(data = booksPlot, mapping = aes(x = call_class)) # define the x axis aesthetic @@ -186,7 +152,10 @@ ggplot(data = booksPlot, mapping = aes(x = call_class)) # define the x axis aest Here we define the x axes, but because we have not yet added any `geoms`, we still do not see any data being visualized. -Data is visualized in the canvas with "geometric shapes" such as bars and lines; + +## Defining the `geoms` + +Data is visualized with different "geometric shapes" such as bars and lines; what are called *geoms*. In your console, type `geom_` and press the tab key to see the geoms included–there are over 30. For example: @@ -235,10 +204,32 @@ Library of Congress call number classification. beginning of the line containing the new layer, **`ggplot2`** will not add the new layer and will return an error message. - :::::::::::::::::::::::::::::::::::::::::::::::::: -## Univariate geoms +
+ + +## **Applying a different geom** + +This same exact data can be visualized in a couple different ways by replacing the geom\_histogram() function with either `geom_density()` (adding a logarithmic x scale) or `geom_freqpoly()`: + +```{r, purl=FALSE} +# create a density plot +ggplot(data = booksPlot) + + geom_density(aes(x = tot_chkout)) + + scale_y_log10() + + scale_x_log10() + +# create a frequency polygon +ggplot(data = booksPlot) + + geom_freqpoly(aes(x = tot_chkout), binwidth = 30) + + scale_y_log10() +``` + + +## Univariate and bivariate geoms + +### **Univariate geoms** *"Univariate"* refers to a single variable. A histogram is a univariate plot: it shows the frequency counts of each value inside a single variable. Let's say we @@ -287,29 +278,14 @@ ggplot has thus given us an easy way to visualize the distribution of checkouts. If you test this on your own print and ebook usage data, you will likely find something similar. -### Changing the geom - -This same exact data can be visualized in a couple different ways by replacing the geom\_histogram() function with either `geom_density()` (adding a logarithmic x scale) or `geom_freqpoly()`: - -```{r, purl=FALSE} -# create a density plot -ggplot(data = booksPlot) + - geom_density(aes(x = tot_chkout)) + - scale_y_log10() + - scale_x_log10() +
-# create a frequency polygon -ggplot(data = booksPlot) + - geom_freqpoly(aes(x = tot_chkout), binwidth = 30) + - scale_y_log10() -``` - -## Bivariate geoms +### **Bivariate geoms** Bivariate plots visualize two variables. Let's take a look at some higher usage items, but first eliminate the `NA` values and keep only items with more than 10 checkouts, which we will do with `filter()` from the `dplyr` package and assign -it to `booksHighUsage` +it to `booksHighUsage`. ```{r} # filter booksPlot to include only items with over 10 checkouts @@ -417,14 +393,13 @@ ggplot(data = booksHighUsage, aes(x = subCollection, y = tot_chkout)) + :::::::::::::::::::::::::::::::::::::::::::::::::: + ## Add a third variable As we saw in that exercise, you can convey even more information in your visualization by adding a third variable, in addition to the first two on the x and y scales. -### Add a third variable with `aes()` - We can use arguments in `aes()` to map a visual aesthetic in the plot to a variable in the dataset. Specifically, we will map `color` to the `subCollection` variable. Because we are now mapping features of the data to a @@ -482,7 +457,8 @@ ggplot(data = booksHighUsage, aes(x = fct_rev(fct_infreq(call_class)))) + coord_flip() ``` -### Plotting time series data + +## Plotting time series data Let's calculate number of counts per year for each format for items published after 1990 and before 2002 in the `booksHighUsage` data frame created above. @@ -504,7 +480,7 @@ class(booksPlot$pubyear) # integer class(booksPlot$pubyear_ymd) # Date ``` -Next we can use `filter` to remove the `NA` values and get books published +Next, we can use `filter` to remove the `NA` values and get books published between 1990 and 2003. Notice that we use the `&` as an AND operator to indicate that the date must fall between that range. We then need to group the data and count records within each group. @@ -541,7 +517,8 @@ ggplot(data = yearly_counts, mapping = aes(x = pubyear_ymd, y = n, color = subCo geom_line() ``` -### Add a third variable with facets + +### Faceting Rather than creating a single plot with side-by-side bars for each sub-collection, we may want to create multiple plots, where each plot shows the @@ -562,7 +539,7 @@ Both geometries allow to to specify faceting variables specified within `vars()` For example, `facet_wrap(facets = vars(facet_variable))` or `facet_grid(rows = vars(row_variable), cols = vars(col_variable))`. -Here we use `facet_wrap()` to make a time series plot for each subCollection +Here we use `facet_wrap()` to make a time series plot for each subCollection: ```{r first-facet, fig.alt="Three line plots, one each for general collection, juvenile, and K-12 sub-collection materials, showing the relationship of count of books to publication year", purl=FALSE} ggplot(data = yearly_counts, mapping = aes(x = pubyear_ymd, y = n)) + @@ -617,6 +594,7 @@ theme(axis.text.x = element_text(angle = 60, hjust = 1)) :::::::::::::::::::::::::::::::::::::::::::::::::: + ## **`ggplot2`** themes Usually plots with white background look more readable when printed. @@ -649,10 +627,10 @@ The [**`ggplot2`** extensions website](https://exts.ggplot2.tidyverse.org/) prov of packages that extend the capabilities of **`ggplot2`**, including additional themes. + ## Customization -Take a look at the [**`ggplot2`** cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf), and -think of ways you could improve your plots. +Take a look at the [**`ggplot2`** cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf), and think of ways you could improve your plots. For example, by default, the axes labels on a plot are determined by the name of the variable being plotted. We can change names of axes to something more @@ -732,6 +710,7 @@ ggplot(data = yearly_checkouts, mapping = aes(x = pubyear_ymd, y = checkouts_sum :::::::::::::::::::::::::::::::::::::::::::::::::: + ## Save and export After creating your plot, you can save it to a file in your favorite format. The