You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h1class="title">Exercise 1 - Solutions: Data Cleanup (Base R and Tidyverse)</h1>
405
+
<h1class="title">Exercise 1 - Solutions: Data Cleanup and Summary Statistics</h1>
406
406
</div>
407
407
408
408
@@ -466,16 +466,16 @@ <h2 class="anchored" data-anchor-id="explore-the-data">Explore the data</h2>
466
466
</div>
467
467
</div>
468
468
<olstart="4" type="1">
469
-
<li>Check the ranges and distribution of each of the variables. Consider that the variables are of different classes. Do any values strike you as odd?</li>
469
+
<li>Check the ranges and distribution of each of the variables. Remember, the variables might be different types. Do any values seem weird or unexpected?</li>
470
470
</ol>
471
471
<p>For the categorical variables we can use <code>table</code>:</p>
472
472
<p>The <code>Sex</code> values are not consistent.</p>
473
473
<divclass="cell">
474
474
<divclass="sourceCode cell-code" id="cb6"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb6-1"><ahref="#cb6-1" aria-hidden="true" tabindex="-1"></a><spanclass="fu">table</span>(diabetes_clinical<spanclass="sc">$</span>Sex)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
475
475
<divclass="cell-output cell-output-stdout">
476
476
<pre><code>
477
-
FEMALE Female Male male
478
-
2291237 2</code></pre>
477
+
Female FEMALE male Male
478
+
291 2 2237</code></pre>
479
479
</div>
480
480
<divclass="sourceCode cell-code" id="cb8"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb8-1"><ahref="#cb8-1" aria-hidden="true" tabindex="-1"></a><spanclass="fu">table</span>(diabetes_clinical<spanclass="sc">$</span>Smoker)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
481
481
<divclass="cell-output cell-output-stdout">
@@ -593,7 +593,7 @@ <h2 class="anchored" data-anchor-id="clean-up-the-data">Clean up the data</h2>
593
593
<li><p>Do you want to change any of the classes of the variables?</p></li>
594
594
</ul>
595
595
<olstart="5" type="1">
596
-
<li>Clean the data according to your considerations.</li>
596
+
<li>Make a clean version of the dataset according to your considerations.</li>
<divclass="sourceCode cell-code" id="cb51"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb51-1"><ahref="#cb51-1" aria-hidden="true" tabindex="-1"></a><spanclass="fu">nrow</span>(diabetes_meta_clean)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
733
733
<divclass="cell-output cell-output-stdout">
734
-
<pre><code>[1] 532</code></pre>
734
+
<pre><code>[1] 515</code></pre>
735
735
</div>
736
736
</div>
737
737
</section>
738
738
<sectionid="join-the-datasets" class="level2">
739
739
<h2class="anchored" data-anchor-id="join-the-datasets">Join the datasets</h2>
740
740
<olstart="7" type="1">
741
-
<li>Consider what variable the datasets should be joined on.</li>
741
+
<li><p>Consider which variable the datasets should be joined on.</p></li>
742
+
<li><p>Consider how you want to join the datasets. Do you want to use <code>full_join</code>, <code>inner_join</code>, <code>left_join</code> and <code>rigth_join</code>?</p></li>
742
743
</ol>
743
744
<p>The joining variable must be the same type in both datasets.</p>
744
-
<olstart="8" type="1">
745
-
<li>Join the datasets by the variable you selected above.</li>
745
+
<olstart="9" type="1">
746
+
<li>Join the cleaned versions of the clinical and meta dataset by the variable and with the function you considered above.</li>
<spanid="cb53-2"><ahref="#cb53-2" aria-hidden="true" tabindex="-1"></a><spanclass="fu">left_join</span>(diabetes_meta_clean, <spanclass="at">by =</span><spanclass="st">'ID'</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
749
+
<divclass="sourceCode cell-code" id="cb53"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb53-1"><ahref="#cb53-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># We use full_join to keep all observations before we know which variables we are interested in. </span></span>
<spanid="cb53-5"><ahref="#cb53-5" aria-hidden="true" tabindex="-1"></a><spanclass="fu">nrow</span>(diabetes_join)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
754
+
<divclass="cell-output cell-output-stdout">
755
+
<pre><code>[1] 531</code></pre>
750
756
</div>
751
-
<olstart="9" type="1">
752
-
<li>How many rows does the joined dataset have? Explain why.</li>
757
+
</div>
758
+
<olstart="10" type="1">
759
+
<li>How many rows does the joined dataset have? Explain how the join-function you used resulted in the given number of rows.</li>
753
760
</ol>
754
-
<p>Because we used <code>left_join</code>, only the IDs that are in <code>diabetes_clinical_clean</code> are kept.</p>
755
761
<divclass="cell">
756
-
<divclass="sourceCode cell-code" id="cb54"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb54-1"><ahref="#cb54-1" aria-hidden="true" tabindex="-1"></a><spanclass="fu">nrow</span>(diabetes_join)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
762
+
<divclass="sourceCode cell-code" id="cb55"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb55-1"><ahref="#cb55-1" aria-hidden="true" tabindex="-1"></a><spanclass="co"># Because we used `full_join`, all the unique IDs across both data sets are kept.</span></span>
763
+
<spanid="cb55-2"><ahref="#cb55-2" aria-hidden="true" tabindex="-1"></a><spanclass="fu">c</span>(diabetes_clinical_clean<spanclass="sc">$</span>ID, diabetes_meta_clean<spanclass="sc">$</span>ID) <spanclass="sc">%>%</span><spanclass="fu">unique</span>() <spanclass="sc">%>%</span><spanclass="fu">length</span>()</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
757
764
<divclass="cell-output cell-output-stdout">
758
-
<pre><code>[1] 490</code></pre>
765
+
<pre><code>[1] 531</code></pre>
759
766
</div>
767
+
<divclass="sourceCode cell-code" id="cb57"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb57-1"><ahref="#cb57-1" aria-hidden="true" tabindex="-1"></a><spanclass="fu">nrow</span>(diabetes_join)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
768
+
<divclass="cell-output cell-output-stdout">
769
+
<pre><code>[1] 531</code></pre>
760
770
</div>
761
-
<olstart="10" type="1">
771
+
</div>
772
+
<olstart="11" type="1">
762
773
<li>Export the joined dataset. Think about which directory you want to save the file in.</li>
763
774
</ol>
764
775
<divclass="cell">
765
-
<divclass="sourceCode cell-code" id="cb56"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb56-1"><ahref="#cb56-1" aria-hidden="true" tabindex="-1"></a>writexl<spanclass="sc">::</span><spanclass="fu">write_xlsx</span>(diabetes_join, <spanclass="st">'../out/diabetes_join.xlsx'</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
776
+
<divclass="sourceCode cell-code" id="cb59"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb59-1"><ahref="#cb59-1" aria-hidden="true" tabindex="-1"></a>writexl<spanclass="sc">::</span><spanclass="fu">write_xlsx</span>(diabetes_join, <spanclass="st">'../out/diabetes_join.xlsx'</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
0 commit comments