Skip to content

Commit f726ea5

Browse files
committed
Trello TODO (Thilde corrections) from part1-part4
1 parent 34ecd22 commit f726ea5

20 files changed

+1330
-1351
lines changed

data/.Rhistory

Lines changed: 511 additions & 511 deletions
Large diffs are not rendered by default.

exercises/exercise0_github.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ An R script is a plain text file containing a series of R commands and code used
4545

4646
7. Run `getwd()` again.
4747

48-
8. Type in a few lines of code and some comments and re-save the file.
48+
8. Read in the file from `data/diabetes.csv` using the `read_csv()` function and check the structure of the data with the `str()` function. Re-save the file.
4949

5050
## Quarto
5151

@@ -72,7 +72,7 @@ getwd()
7272
```
7373
:::
7474

75-
13. Create some code chunks, write text and headers. Re-save the file.
75+
13. Create a code chunks and write the same code as you did in **8**. Write a description of what you did above the code chunk. Re-save the file.
7676

7777
14. Render the Quarto document and have a look at the html file.
7878

exercises/exercise1.qmd

Lines changed: 13 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,13 @@ In this exercise you will practice your R skills by loading, inspecting and clea
2020

2121
## Explore the data
2222

23-
3. How many missing values (NA's) are there in each column?
23+
3. Do summary stats by checking the number of observations (rows) and variables (columns) as well as their data types.
2424

25-
4. Check the distribution of each of the variables. Consider that the variables are of different classes. Do any values strike you as odd?
25+
4. For each variable evaluate the data type and change the ones you find necessary.
26+
27+
5. (Include/exclude?) How many missing values (NA's) are there in each column?
28+
29+
6. Check the distribution of each variable by plotting the categorical variables in bar plots and the continuous in box plots. Does any values strike you as odd?
2630

2731
## Clean up the data
2832

@@ -36,39 +40,27 @@ Consider the following:
3640

3741
- Are there zeros in the data? Are they true zeros or errors?
3842

39-
- Do you want to change any of the data types of the variables?
40-
41-
5. Clean the data according to your considerations.
42-
43-
::: {.callout-tip collapse="true"}
44-
## Hint
45-
Have a look at `BloodPressure`, `BMI`, `Sex`, `Diabetes` and `ID`.
46-
:::
43+
7. Clean the data according to your considerations.
4744

4845
## Meta Data
4946

50-
There is some metadata to accompany the dataset you have just cleaned in `diabetes_meta_toy_messy.csv`. This is a csv file, not an excel sheet, so you need to use the `read_delim` function to load it. Load in the dataset and inspect it.
47+
There is some metadata to accompany the dataset you have just cleaned in `diabetes_meta_toy_messy.csv`. This is a csv file, not an excel sheet, so you need to use the `read_csv` function to load it. Load in the dataset and inspect it.
5148

52-
6. Now clean the metadata and do data exploration by repeating step 3-5 from above.
49+
8. Now clean the metadata and do data exploration by repeating step **3**-**7** from above.
5350

5451
## Join the datasets
5552

5653
We will combine both datasets together into one tibble.
5754

58-
7. Consider what variable the datasets should be joined on.
55+
9. Consider which variable the datasets should be joined on.
5956

6057
::: {.callout-tip collapse="true"}
6158
## Hint
6259
The joining variable must be the same type in both datasets.
6360
:::
6461

65-
8. Join the datasets by the variable you selected above.
66-
67-
9. How many rows does the joined dataset have? Explain why.
62+
10. Join the datasets by the variable you selected above.
6863

69-
::: {.callout-tip collapse="true"}
70-
## Hint
71-
Because we used left_join, only the IDs that are in `diabetes_clinical_clean` are kept.
72-
:::
64+
11. How many rows does the joined dataset have? Explain why.
7365

74-
10. Export the joined dataset. Think about which directory you want to save the file in.
66+
12. Export the joined dataset. Think about which directory you want to save the file in.

exercises/exercise2.qmd

Lines changed: 2 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,6 @@ In this exercise you will do some more advance tidyverse operations such as pivo
2020

2121
4. Restructure the glucose dataset into a long format. Name the column that describes which measurement the row refers to, i.e. Glucose_0, Glucose_60 or Glucose_120, `Measurement`. How many rows are there per ID? Does that make sense?
2222

23-
::: {.callout-tip collapse="true"}
24-
## Hint
25-
26-
Remember the flow:
27-
28-
```{r eval = FALSE}
29-
pivot_longer(cols = LIST_WITH_COLUMNS_TO_PIVOT,
30-
names_to = "NEW_COLUMN_CONTAINING_COLUMN_NAMES",
31-
values_to = "NEW_COLUMN_CONTAINING_COLUMN_VALUES")
32-
```
33-
34-
Have a look at slide 16 for a visual overview.
35-
36-
:::
37-
3823
5. In your long format dataframe you should have one column that described which measurement the row refers to, i.e. Glucose_0, Glucose_60 or Glucose_120. Transform this column so that you only have the numerical part, i.e. **only** 0, 60 or 120. Then change the data type of that column to `factor`. Check the order of the factor levels and if necessary change them to the proper order.
3924

4025
::: {.callout-tip collapse="true"}
@@ -44,21 +29,11 @@ The `stringr` packages is a part of tidyverse and has many functions for manipul
4429
Have a look at the help for factors `?factors` to see how to influence the levels.
4530
:::
4631

47-
6. Merge the glucose dataset with the joined diabetes dataset.
32+
6. Merge the long formatted glucose dataset you made in **4** with the joined diabetes dataset you loaded in **2**.
4833

4934
7. Pull the glucose measurements from your favorite ID.
5035

51-
::: {.callout-tip collapse="true"}
52-
## Hint
53-
First `filter` for your favorite ID and then `pull` the columns.
54-
:::
55-
56-
8. Calculate the mean glucose measure for each measurement timepoint.
57-
58-
::: {.callout-tip collapse="true"}
59-
## Hint
60-
You will need to use `group_by()`, and `summerise()`.
61-
:::
36+
8. Calculate the mean glucose measure for each measurement time point.
6237

6338
9. Calculate mean and standard deviation for all numeric columns.
6439

exercises/exercise3A.qmd

Lines changed: 9 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,13 @@ In this exercise you will do a lot of plotting with ggplot. For a reminder on ho
1414

1515
2. Load data from the `.rds` file you created in Exercise 2. Have a guess at what the function is called.
1616

17+
## SECTION IN PROGRESS - CHECK NORMAL DISTRIBUTION
18+
19+
```{r}
20+
21+
```
22+
23+
1724
## Plotting - Part 1
1825

1926
You will first do some basic plots to get started with ggplot again.
@@ -26,34 +33,18 @@ If it has been a while since you worked with ggplot, have a look at the [ggplot
2633

2734
5. Now, create the same two plots as before, but this time stratify them by `Diabetes`. Do you notice any trends?
2835

29-
::: {.callout-tip collapse="true"}
30-
## Hint
31-
32-
You can stratify a plot by a categorical variable in several ways, depending on the type of plot. The purpose of stratification is to distinguish samples based on their categorical values, making patterns or differences easier to identify. This can be done using aesthetics like `color`, `fill`, `shape`.
33-
34-
:::
35-
36-
3736
6. Create a boxplot of `BMI` stratified by `Diabetes.` Give the plot a meaningful title.
3837

39-
40-
7. Create a boxplot of `PhysicalActivity` stratified by `Smoker`. Give the plot a meaningful title.
41-
38+
7. Create a geom_violin of `PhysicalActivity` stratified by `Smoker`. Add horizontal lines at the 25%, 50%, and 75% quantiles of each violin plot. Give the plot a meaningful title.
4239

4340
## Plotting - Part 2
4441

4542
In order to plot the data inside the nested variable, the data needs to be unnested.
4643

47-
8. Create a boxplot of the glucose measurements at time 0 stratified by `Diabetes`. Give the plot a meaningful title.
44+
8. Create a `ggridges::geom_density_ridges` plot of the glucose measurements at time 0 stratified by `Diabetes`. What kind of plot is this? Give the plot a meaningful title.
4845

4946
9. Create these boxplots for each time point (0, 60, 120) by using faceting by `Measurement`. Give the plot a meaningful title.
5047

51-
::: {.callout-tip collapse="true"}
52-
## Hint
53-
Faceting allows you to create multiple plots based on the values of a categorical variable, making it easier to compare patterns across groups. In ggplot2, you can use `facet_wrap` for a single variable or `facet_grid` for multiple variables.
54-
55-
:::
56-
5748
10. Calculate the mean glucose levels for each time point.
5849

5950
::: {.callout-tip collapse="true"}

exercises/exercise4.qmd

Lines changed: 0 additions & 118 deletions
This file was deleted.

0 commit comments

Comments
 (0)