Skip to content

Commit aa801c4

Browse files
committed
Exercises and Solutions 5A and 5B done
1 parent b51badc commit aa801c4

32 files changed

+3204
-699
lines changed

docs/exercises/ExtraExercise5.html

Lines changed: 1067 additions & 0 deletions
Large diffs are not rendered by default.

docs/search.json

Lines changed: 56 additions & 0 deletions
Large diffs are not rendered by default.

docs/solutions/solution1.html

Lines changed: 48 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
88

99

10-
<title>Exercise 1 - Solutions: Data Cleanup (Base R and Tidyverse) – R for Data Science</title>
10+
<title>Exercise 1 - Solutions: Data Cleanup and Summary Statistics – R for Data Science</title>
1111
<style>
1212
code{white-space: pre-wrap;}
1313
span.smallcaps{font-variant: small-caps;}
@@ -196,19 +196,19 @@
196196
</li>
197197
<li class="sidebar-item">
198198
<div class="sidebar-item-container">
199-
<a href="../presentations/presentation4_main_script.html" class="sidebar-item-text sidebar-link">
199+
<a href="../presentations/presentation4_main_script.qmd" class="sidebar-item-text sidebar-link">
200200
<span class="menu-text">Presentation 4: Scripting in R</span></a>
201201
</div>
202202
</li>
203203
<li class="sidebar-item">
204204
<div class="sidebar-item-container">
205-
<a href="../presentations/presentation4_functions.html" class="sidebar-item-text sidebar-link">
205+
<a href="../presentations/presentation4_functions.qmd" class="sidebar-item-text sidebar-link">
206206
<span class="menu-text">Presentation 4, Functions: Scripting in R</span></a>
207207
</div>
208208
</li>
209209
<li class="sidebar-item">
210210
<div class="sidebar-item-container">
211-
<a href="../presentations/presentation5.html" class="sidebar-item-text sidebar-link">
211+
<a href="../presentations/presentation5.qmd" class="sidebar-item-text sidebar-link">
212212
<span class="menu-text">Presentation 5: Intro to Modelling in R</span></a>
213213
</div>
214214
</li>
@@ -267,13 +267,13 @@
267267
</li>
268268
<li class="sidebar-item">
269269
<div class="sidebar-item-container">
270-
<a href="../exercises/exercise4.html" class="sidebar-item-text sidebar-link">
270+
<a href="../exercises/exercise4.qmd" class="sidebar-item-text sidebar-link">
271271
<span class="menu-text">Exercise 4: Scripting in R</span></a>
272272
</div>
273273
</li>
274274
<li class="sidebar-item">
275275
<div class="sidebar-item-container">
276-
<a href="../exercises/exercise5.html" class="sidebar-item-text sidebar-link">
276+
<a href="../exercises/exercise5.qmd" class="sidebar-item-text sidebar-link">
277277
<span class="menu-text">Exercise 5: Modelling in R</span></a>
278278
</div>
279279
</li>
@@ -320,19 +320,19 @@
320320
</li>
321321
<li class="sidebar-item">
322322
<div class="sidebar-item-container">
323-
<a href="../solutions/solution4.html" class="sidebar-item-text sidebar-link">
323+
<a href="../solutions/solution4.qmd" class="sidebar-item-text sidebar-link">
324324
<span class="menu-text">Exercise 4 - Solution</span></a>
325325
</div>
326326
</li>
327327
<li class="sidebar-item">
328328
<div class="sidebar-item-container">
329-
<a href="../solutions/solution4_functions.html" class="sidebar-item-text sidebar-link">
329+
<a href="../solutions/solution4_functions.qmd" class="sidebar-item-text sidebar-link">
330330
<span class="menu-text">Exercise 4, Functions - Solution</span></a>
331331
</div>
332332
</li>
333333
<li class="sidebar-item">
334334
<div class="sidebar-item-container">
335-
<a href="../solutions/solution5.html" class="sidebar-item-text sidebar-link">
335+
<a href="../solutions/solution5.qmd" class="sidebar-item-text sidebar-link">
336336
<span class="menu-text">Exercise 5 - Solution</span></a>
337337
</div>
338338
</li>
@@ -402,7 +402,7 @@ <h2 id="toc-title">On this page</h2>
402402

403403
<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../solutions/solution1.html">Solutions</a></li><li class="breadcrumb-item"><a href="../solutions/solution1.html">Exercise 1 - Solution</a></li></ol></nav>
404404
<div class="quarto-title">
405-
<h1 class="title">Exercise 1 - Solutions: Data Cleanup (Base R and Tidyverse)</h1>
405+
<h1 class="title">Exercise 1 - Solutions: Data Cleanup and Summary Statistics</h1>
406406
</div>
407407

408408

@@ -466,16 +466,16 @@ <h2 class="anchored" data-anchor-id="explore-the-data">Explore the data</h2>
466466
</div>
467467
</div>
468468
<ol start="4" type="1">
469-
<li>Check the ranges and distribution of each of the variables. Consider that the variables are of different classes. Do any values strike you as odd?</li>
469+
<li>Check the ranges and distribution of each of the variables. Remember, the variables might be different types. Do any values seem weird or unexpected?</li>
470470
</ol>
471471
<p>For the categorical variables we can use <code>table</code>:</p>
472472
<p>The <code>Sex</code> values are not consistent.</p>
473473
<div class="cell">
474474
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">table</span>(diabetes_clinical<span class="sc">$</span>Sex)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
475475
<div class="cell-output cell-output-stdout">
476476
<pre><code>
477-
FEMALE Female Male male
478-
2 291 237 2 </code></pre>
477+
Female FEMALE male Male
478+
291 2 2 237 </code></pre>
479479
</div>
480480
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">table</span>(diabetes_clinical<span class="sc">$</span>Smoker)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
481481
<div class="cell-output cell-output-stdout">
@@ -593,7 +593,7 @@ <h2 class="anchored" data-anchor-id="clean-up-the-data">Clean up the data</h2>
593593
<li><p>Do you want to change any of the classes of the variables?</p></li>
594594
</ul>
595595
<ol start="5" type="1">
596-
<li>Clean the data according to your considerations.</li>
596+
<li>Make a clean version of the dataset according to your considerations.</li>
597597
</ol>
598598
<div class="callout callout-style-default callout-tip callout-titled">
599599
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
@@ -659,12 +659,12 @@ <h2 class="anchored" data-anchor-id="meta-data">Meta Data</h2>
659659
<pre><code># A tibble: 6 × 3
660660
ID Married Work
661661
&lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;
662-
1 33879 Yes Self-employed
663-
2 52800 Yes Private
664-
3 16817 Yes Private
665-
4 70676 Yes Self-employed
666-
5 6319 No Public
667-
6 71379 No Public </code></pre>
662+
1 48368 Yes Private
663+
2 36706 No Public
664+
3 32729 Yes Private
665+
4 48272 Yes Private
666+
5 9404 Yes Private
667+
6 16934 Yes Self-employed</code></pre>
668668
</div>
669669
</div>
670670
<p>6.3. How many missing values (NA’s) are there in each column.</p>
@@ -682,13 +682,13 @@ <h2 class="anchored" data-anchor-id="meta-data">Meta Data</h2>
682682
<div class="cell-output cell-output-stdout">
683683
<pre><code>
684684
No No Yes Yes
685-
183 3 345 1 </code></pre>
685+
178 1 332 4 </code></pre>
686686
</div>
687687
<div class="sourceCode cell-code" id="cb42"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1"><a href="#cb42-1" aria-hidden="true" tabindex="-1"></a><span class="fu">table</span>(diabetes_meta<span class="sc">$</span>Work)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
688688
<div class="cell-output cell-output-stdout">
689689
<pre><code>
690690
Private Public Retired Self-employed
691-
283 154 6 89 </code></pre>
691+
273 150 6 86 </code></pre>
692692
</div>
693693
</div>
694694
<p>By investigating the unique values of the <code>Married</code> variable we see that some of the values have whitespace.</p>
@@ -700,7 +700,7 @@ <h2 class="anchored" data-anchor-id="meta-data">Meta Data</h2>
700700
</div>
701701
<ol start="6" type="1">
702702
<li><ol start="5" type="1">
703-
<li>Clean the data according to your considerations.</li>
703+
<li>Make a clean version of the dataset according to your considerations.</li>
704704
</ol></li>
705705
</ol>
706706
<p>My considerations:</p>
@@ -712,7 +712,7 @@ <h2 class="anchored" data-anchor-id="meta-data">Meta Data</h2>
712712
<div class="cell">
713713
<div class="sourceCode cell-code" id="cb46"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1"><a href="#cb46-1" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(diabetes_meta)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
714714
<div class="cell-output cell-output-stdout">
715-
<pre><code>[1] 532</code></pre>
715+
<pre><code>[1] 515</code></pre>
716716
</div>
717717
</div>
718718
<div class="cell">
@@ -731,38 +731,49 @@ <h2 class="anchored" data-anchor-id="meta-data">Meta Data</h2>
731731
<div class="cell">
732732
<div class="sourceCode cell-code" id="cb51"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb51-1"><a href="#cb51-1" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(diabetes_meta_clean)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
733733
<div class="cell-output cell-output-stdout">
734-
<pre><code>[1] 532</code></pre>
734+
<pre><code>[1] 515</code></pre>
735735
</div>
736736
</div>
737737
</section>
738738
<section id="join-the-datasets" class="level2">
739739
<h2 class="anchored" data-anchor-id="join-the-datasets">Join the datasets</h2>
740740
<ol start="7" type="1">
741-
<li>Consider what variable the datasets should be joined on.</li>
741+
<li><p>Consider which variable the datasets should be joined on.</p></li>
742+
<li><p>Consider how you want to join the datasets. Do you want to use <code>full_join</code>, <code>inner_join</code>, <code>left_join</code> and <code>rigth_join</code>?</p></li>
742743
</ol>
743744
<p>The joining variable must be the same type in both datasets.</p>
744-
<ol start="8" type="1">
745-
<li>Join the datasets by the variable you selected above.</li>
745+
<ol start="9" type="1">
746+
<li>Join the cleaned versions of the clinical and meta dataset by the variable and with the function you considered above.</li>
746747
</ol>
747748
<div class="cell">
748-
<div class="sourceCode cell-code" id="cb53"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a>diabetes_join <span class="ot">&lt;-</span> diabetes_clinical_clean <span class="sc">%&gt;%</span> </span>
749-
<span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">left_join</span>(diabetes_meta_clean, <span class="at">by =</span> <span class="st">'ID'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
749+
<div class="sourceCode cell-code" id="cb53"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a><span class="co"># We use full_join to keep all observations before we know which variables we are interested in. </span></span>
750+
<span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a>diabetes_join <span class="ot">&lt;-</span> diabetes_clinical_clean <span class="sc">%&gt;%</span> </span>
751+
<span id="cb53-3"><a href="#cb53-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">full_join</span>(diabetes_meta_clean, <span class="at">by =</span> <span class="st">'ID'</span>)</span>
752+
<span id="cb53-4"><a href="#cb53-4" aria-hidden="true" tabindex="-1"></a></span>
753+
<span id="cb53-5"><a href="#cb53-5" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(diabetes_join)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
754+
<div class="cell-output cell-output-stdout">
755+
<pre><code>[1] 531</code></pre>
750756
</div>
751-
<ol start="9" type="1">
752-
<li>How many rows does the joined dataset have? Explain why.</li>
757+
</div>
758+
<ol start="10" type="1">
759+
<li>How many rows does the joined dataset have? Explain how the join-function you used resulted in the given number of rows.</li>
753760
</ol>
754-
<p>Because we used <code>left_join</code>, only the IDs that are in <code>diabetes_clinical_clean</code> are kept.</p>
755761
<div class="cell">
756-
<div class="sourceCode cell-code" id="cb54"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1"><a href="#cb54-1" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(diabetes_join)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
762+
<div class="sourceCode cell-code" id="cb55"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Because we used `full_join`, all the unique IDs across both data sets are kept.</span></span>
763+
<span id="cb55-2"><a href="#cb55-2" aria-hidden="true" tabindex="-1"></a><span class="fu">c</span>(diabetes_clinical_clean<span class="sc">$</span>ID, diabetes_meta_clean<span class="sc">$</span>ID) <span class="sc">%&gt;%</span> <span class="fu">unique</span>() <span class="sc">%&gt;%</span> <span class="fu">length</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
757764
<div class="cell-output cell-output-stdout">
758-
<pre><code>[1] 490</code></pre>
765+
<pre><code>[1] 531</code></pre>
759766
</div>
767+
<div class="sourceCode cell-code" id="cb57"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1"><a href="#cb57-1" aria-hidden="true" tabindex="-1"></a><span class="fu">nrow</span>(diabetes_join)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
768+
<div class="cell-output cell-output-stdout">
769+
<pre><code>[1] 531</code></pre>
760770
</div>
761-
<ol start="10" type="1">
771+
</div>
772+
<ol start="11" type="1">
762773
<li>Export the joined dataset. Think about which directory you want to save the file in.</li>
763774
</ol>
764775
<div class="cell">
765-
<div class="sourceCode cell-code" id="cb56"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1"><a href="#cb56-1" aria-hidden="true" tabindex="-1"></a>writexl<span class="sc">::</span><span class="fu">write_xlsx</span>(diabetes_join, <span class="st">'../out/diabetes_join.xlsx'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
776+
<div class="sourceCode cell-code" id="cb59"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1"><a href="#cb59-1" aria-hidden="true" tabindex="-1"></a>writexl<span class="sc">::</span><span class="fu">write_xlsx</span>(diabetes_join, <span class="st">'../out/diabetes_join.xlsx'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
766777
</div>
767778

768779

0 commit comments

Comments
 (0)