Center-for-Health-Data-Science
diff --git a/‎data/.Rhistory
Lines changed: 511 additions & 511 deletions b/‎data/.Rhistory
Lines changed: 511 additions & 511 deletions
diff --git a/‎exercises/exercise0_github.qmd
Lines changed: 2 additions & 2 deletions b/‎exercises/exercise0_github.qmd
Lines changed: 2 additions & 2 deletions
diff --git a/‎exercises/exercise1.qmd
Lines changed: 13 additions & 21 deletions b/‎exercises/exercise1.qmd
Lines changed: 13 additions & 21 deletions
diff --git a/‎exercises/exercise2.qmd
Lines changed: 2 additions & 27 deletions b/‎exercises/exercise2.qmd
Lines changed: 2 additions & 27 deletions
diff --git a/‎exercises/exercise3A.qmd
Lines changed: 9 additions & 18 deletions b/‎exercises/exercise3A.qmd
Lines changed: 9 additions & 18 deletions
diff --git a/‎exercises/exercise4.qmd
Lines changed: 0 additions & 118 deletions b/‎exercises/exercise4.qmd
Lines changed: 0 additions & 118 deletions
@@ -45,7 +45,7 @@ An R script is a plain text file containing a series of R commands and code used
 
 7. Run `getwd()` again.
 
-8. Type in a few lines of code and some comments and re-save the file.
+8. Read in the file from `data/diabetes.csv` using the `read_csv()` function and check the structure of the data with the `str()` function. Re-save the file.
 
 ## Quarto
 
@@ -72,7 +72,7 @@ getwd()
 ```
 :::
 
-13. Create some code chunks, write text and headers. Re-save the file.
+13. Create a code chunks and write the same code as you did in **8**. Write a description of what you did above the code chunk. Re-save the file.
 
 14. Render the Quarto document and have a look at the html file.
 
 
@@ -20,9 +20,13 @@ In this exercise you will practice your R skills by loading, inspecting and clea
 
 ## Explore the data
 
-3.  How many missing values (NA's) are there in each column?
+3. Do summary stats by checking the number of observations (rows) and variables (columns) as well as their data types.
 
-4.  Check the distribution of each of the variables. Consider that the variables are of different classes. Do any values strike you as odd?
+4. For each variable evaluate the data type and change the ones you find necessary. 
+
+5. (Include/exclude?) How many missing values (NA's) are there in each column? 
+
+6.  Check the distribution of each variable by plotting the categorical variables in bar plots and the continuous in box plots. Does any values strike you as odd?
 
 ## Clean up the data
 
@@ -36,39 +40,27 @@ Consider the following:
 
 -   Are there zeros in the data? Are they true zeros or errors?
 
--   Do you want to change any of the data types of the variables?
-
-5.  Clean the data according to your considerations.
-
-::: {.callout-tip collapse="true"}
-## Hint
-Have a look at `BloodPressure`, `BMI`, `Sex`, `Diabetes` and `ID`. 
-:::
+7.  Clean the data according to your considerations.
 
 ## Meta Data
 
-There is some metadata to accompany the dataset you have just cleaned in `diabetes_meta_toy_messy.csv`. This is a csv file, not an excel sheet, so you need to use the `read_delim` function to load it. Load in the dataset and inspect it. 
+There is some metadata to accompany the dataset you have just cleaned in `diabetes_meta_toy_messy.csv`. This is a csv file, not an excel sheet, so you need to use the `read_csv` function to load it. Load in the dataset and inspect it. 
 
-6. Now clean the metadata and do data exploration by repeating step 3-5 from above. 
+8. Now clean the metadata and do data exploration by repeating step **3**-**7** from above. 
 
 ## Join the datasets
 
 We will combine both datasets together into one tibble.
 
-7. Consider what variable the datasets should be joined on.
+9. Consider which variable the datasets should be joined on.
 
 ::: {.callout-tip collapse="true"}
 ## Hint
 The joining variable must be the same type in both datasets.
 :::
 
-8. Join the datasets by the variable you selected above.
-
-9. How many rows does the joined dataset have? Explain why. 
+10. Join the datasets by the variable you selected above.
 
-::: {.callout-tip collapse="true"}
-## Hint
-Because we used left_join, only the IDs that are in `diabetes_clinical_clean` are kept. 
-:::
+11. How many rows does the joined dataset have? Explain why. 
 
-10. Export the joined dataset. Think about which directory you want to save the file in. 
+12. Export the joined dataset. Think about which directory you want to save the file in. 
@@ -20,21 +20,6 @@ In this exercise you will do some more advance tidyverse operations such as pivo
 
 4.  Restructure the glucose dataset into a long format. Name the column that describes which measurement the row refers to, i.e. Glucose_0, Glucose_60 or Glucose_120, `Measurement`. How many rows are there per ID? Does that make sense?
 
-::: {.callout-tip collapse="true"}
-## Hint
-
-Remember the flow:
-
-```{r eval = FALSE}
-pivot_longer(cols = LIST_WITH_COLUMNS_TO_PIVOT,
-             names_to = "NEW_COLUMN_CONTAINING_COLUMN_NAMES",
-             values_to = "NEW_COLUMN_CONTAINING_COLUMN_VALUES")
-```
-
-Have a look at slide 16 for a visual overview.
-
-:::
-
 5. In your long format dataframe you should have one column that described which measurement the row refers to, i.e. Glucose_0, Glucose_60 or Glucose_120. Transform this column so that you only have the numerical part, i.e. **only** 0, 60 or 120. Then change the data type of that column to `factor`. Check the order of the factor levels and if necessary change them to the proper order.
 
 ::: {.callout-tip collapse="true"}
@@ -44,21 +29,11 @@ The `stringr` packages is a part of tidyverse and has many functions for manipul
 Have a look at the help for factors `?factors` to see how to influence the levels.
 :::
 
-6. Merge the glucose dataset with the joined diabetes dataset.
+6. Merge the long formatted glucose dataset you made in **4** with the joined diabetes dataset you loaded in **2**.
 
 7.  Pull the glucose measurements from your favorite ID.
 
-::: {.callout-tip collapse="true"}
-## Hint
-First `filter` for your favorite ID and then `pull` the columns. 
-:::
-
-8. Calculate the mean glucose measure for each measurement timepoint.
-
-::: {.callout-tip collapse="true"}
-## Hint
-You will need to use `group_by()`, and `summerise()`.
-:::
+8. Calculate the mean glucose measure for each measurement time point.
 
 9. Calculate mean and standard deviation for all numeric columns.
 
 
@@ -14,6 +14,13 @@ In this exercise you will do a lot of plotting with ggplot. For a reminder on ho
 
 2. Load data from the `.rds` file you created in Exercise 2. Have a guess at what the function is called. 
 
+## SECTION IN PROGRESS - CHECK NORMAL DISTRIBUTION
+
+```{r}
+
+```
+
+
 ## Plotting - Part 1
 
 You will first do some basic plots to get started with ggplot again. 
@@ -26,34 +33,18 @@ If it has been a while since you worked with ggplot, have a look at the [ggplot
 
 5. Now, create the same two plots as before, but this time stratify them by `Diabetes`. Do you notice any trends?
 
-::: {.callout-tip collapse="true"}
-## Hint
-
-You can stratify a plot by a categorical variable in several ways, depending on the type of plot. The purpose of stratification is to distinguish samples based on their categorical values, making patterns or differences easier to identify. This can be done using aesthetics like `color`, `fill`, `shape`. 
-
-:::
-
-
 6. Create a boxplot of `BMI` stratified by `Diabetes.` Give the plot a meaningful title.
 
-
-7. Create a boxplot of `PhysicalActivity` stratified by `Smoker`. Give the plot a meaningful title.
-
+7. Create a geom_violin of `PhysicalActivity` stratified by `Smoker`. Add horizontal lines at the 25%, 50%, and 75% quantiles of each violin plot. Give the plot a meaningful title.
 
 ## Plotting - Part 2
 
 In order to plot the data inside the nested variable, the data needs to be unnested. 
 
-8. Create a boxplot of the glucose measurements at time 0 stratified by `Diabetes`. Give the plot a meaningful title.
+8. Create a `ggridges::geom_density_ridges` plot of the glucose measurements at time 0 stratified by `Diabetes`. What kind of plot is this? Give the plot a meaningful title.
 
 9. Create these boxplots for each time point (0, 60, 120) by using faceting by `Measurement`. Give the plot a meaningful title.
 
-::: {.callout-tip collapse="true"}
-## Hint
-Faceting allows you to create multiple plots based on the values of a categorical variable, making it easier to compare patterns across groups. In ggplot2, you can use `facet_wrap` for a single variable or `facet_grid` for multiple variables.
-
-:::
-
 10. Calculate the mean glucose levels for each time point. 
 
 ::: {.callout-tip collapse="true"}