Skip to content

Commit 6288de0

Browse files
committed
source commit: b6a8179
0 parents  commit 6288de0

24 files changed

Lines changed: 2728 additions & 0 deletions

00-sql-introduction.md

Lines changed: 287 additions & 0 deletions
Large diffs are not rendered by default.

01-sql-basic-queries.md

Lines changed: 359 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,359 @@
1+
---
2+
title: Accessing Data With Queries
3+
teaching: 30
4+
exercises: 5
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Write and build queries.
10+
- Filter data given various criteria.
11+
- Sort the results of a query.
12+
13+
::::::::::::::::::::::::::::::::::::::::::::::::::
14+
15+
:::::::::::::::::::::::::::::::::::::::: questions
16+
17+
- How do I write a basic query in SQL?
18+
19+
::::::::::::::::::::::::::::::::::::::::::::::::::
20+
21+
## Writing my first query
22+
23+
Let's start by using the **surveys** table. Here we have data on every
24+
individual that was captured at the site, including when they were captured,
25+
what plot they were captured on, their species ID, sex and weight in grams.
26+
27+
Let's write an SQL query that selects all of the columns in the surveys table. SQL queries can be written in the box located under the "Execute SQL" tab. Click on the right arrow above the query box to execute the query. (You can also use the keyboard shortcut "Cmd-Enter" on a Mac or "Ctrl-Enter" on a Windows machine to execute a query.) The results are displayed in the box below your query. If you want to display all of the columns in a table, use the wildcard \*.
28+
29+
```sql
30+
SELECT *
31+
FROM surveys;
32+
```
33+
34+
We have capitalized the words SELECT and FROM because they are SQL keywords.
35+
SQL is case insensitive, but it helps for readability, and is good style.
36+
37+
If we want to select a single column, we can type the column name instead of the wildcard \*.
38+
39+
```sql
40+
SELECT year
41+
FROM surveys;
42+
```
43+
44+
If we want more information, we can add more columns to the list of fields,
45+
right after SELECT:
46+
47+
```sql
48+
SELECT year, month, day
49+
FROM surveys;
50+
```
51+
52+
### Limiting results
53+
54+
Sometimes you don't want to see all the results, you just want to get a sense of what's being returned. In that case, you can use a `LIMIT` clause. In particular, you would want to do this if you were working with large databases.
55+
56+
```sql
57+
SELECT *
58+
FROM surveys
59+
LIMIT 10;
60+
```
61+
62+
### Unique values
63+
64+
If we want only the unique values so that we can quickly see what species have
65+
been sampled we use `DISTINCT`
66+
67+
```sql
68+
SELECT DISTINCT species_id
69+
FROM surveys;
70+
```
71+
72+
If we select more than one column, then the distinct pairs of values are
73+
returned
74+
75+
```sql
76+
SELECT DISTINCT year, species_id
77+
FROM surveys;
78+
```
79+
80+
### Calculated values
81+
82+
We can also do calculations with the values in a query.
83+
For example, if we wanted to look at the mass of each individual
84+
on different dates, but we needed it in kg instead of g we would use
85+
86+
```sql
87+
SELECT year, month, day, weight / 1000
88+
FROM surveys;
89+
```
90+
91+
When we run the query, the expression `weight / 1000` is evaluated for each
92+
row and appended in a new column to the table returned by the query. Note that
93+
the new column only exists in the query results—the surveys table itself is
94+
not changed. If we used the `INTEGER` data type for the weight field then
95+
integer division would have been done, to obtain the correct results in that
96+
case divide by `1000.0`. Expressions can use any fields, any arithmetic
97+
operators (`+`, `-`, `*`, and `/`) and a variety of built-in functions. For
98+
example, we could round the values to make them easier to read.
99+
100+
```sql
101+
SELECT plot_id, species_id, sex, weight, ROUND(weight / 1000, 2)
102+
FROM surveys;
103+
```
104+
105+
::::::::::::::::::::::::::::::::::::::: challenge
106+
107+
## Challenge
108+
109+
- Write a query that returns the year, month, day, species\_id and weight in mg.
110+
111+
::::::::::::::: solution
112+
113+
## Solution
114+
115+
```sql
116+
SELECT day, month, year, species_id, weight * 1000
117+
FROM surveys;
118+
```
119+
120+
:::::::::::::::::::::::::
121+
122+
::::::::::::::::::::::::::::::::::::::::::::::::::
123+
124+
## Filtering
125+
126+
Databases can also filter data – selecting only the data meeting certain
127+
criteria. For example, let's say we only want data for the species
128+
*Dipodomys merriami*, which has a species code of DM. We need to add a
129+
`WHERE` clause to our query:
130+
131+
```sql
132+
SELECT *
133+
FROM surveys
134+
WHERE species_id='DM';
135+
```
136+
137+
We can do the same thing with numbers.
138+
Here, we only want the data since 2000:
139+
140+
```sql
141+
SELECT * FROM surveys
142+
WHERE year >= 2000;
143+
```
144+
145+
If we used the `TEXT` data type for the year, the `WHERE` clause should
146+
be `year >= '2000'`.
147+
148+
We can use more sophisticated conditions by combining tests
149+
with `AND` and `OR`. For example, suppose we want the data on *Dipodomys merriami*
150+
starting in the year 2000:
151+
152+
```sql
153+
SELECT *
154+
FROM surveys
155+
WHERE (year >= 2000) AND (species_id = 'DM');
156+
```
157+
158+
Note that the parentheses are not needed, but again, they help with
159+
readability. They also ensure that the computer combines `AND` and `OR`
160+
in the way that we intend.
161+
162+
If we wanted to get data for any of the *Dipodomys* species, which have
163+
species codes `DM`, `DO`, and `DS`, we could combine the tests using OR:
164+
165+
```sql
166+
SELECT *
167+
FROM surveys
168+
WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS');
169+
```
170+
171+
::::::::::::::::::::::::::::::::::::::: challenge
172+
173+
## Challenge
174+
175+
- Produce a table listing the data for all individuals in Plot 1
176+
that weighed more than 75 grams, telling us the date, species id code, and weight
177+
(in kg).
178+
179+
::::::::::::::: solution
180+
181+
## Solution
182+
183+
```sql
184+
SELECT day, month, year, species_id, weight / 1000
185+
FROM surveys
186+
WHERE (plot_id = 1) AND (weight > 75);
187+
```
188+
189+
:::::::::::::::::::::::::
190+
191+
::::::::::::::::::::::::::::::::::::::::::::::::::
192+
193+
## Building more complex queries
194+
195+
Now, let's combine the above queries to get data for the 3 *Dipodomys* species from
196+
the year 2000 on. This time, let's use IN as one way to make the query easier
197+
to understand. It is equivalent to saying `WHERE (species_id = 'DM') OR (species_id = 'DO') OR (species_id = 'DS')`, but reads more neatly:
198+
199+
```sql
200+
SELECT *
201+
FROM surveys
202+
WHERE (year >= 2000) AND (species_id IN ('DM', 'DO', 'DS'));
203+
```
204+
205+
We started with something simple, then added more clauses one by one, testing
206+
their effects as we went along. For complex queries, this is a good strategy,
207+
to make sure you are getting what you want. Sometimes it might help to take a
208+
subset of the data that you can easily see in a temporary database to practice
209+
your queries on before working on a larger or more complicated database.
210+
211+
When the queries become more complex, it can be useful to add comments. In SQL,
212+
comments are started by `--`, and end at the end of the line. For example, a
213+
commented version of the above query can be written as:
214+
215+
```sql
216+
-- Get post 2000 data on Dipodomys' species
217+
-- These are in the surveys table, and we are interested in all columns
218+
SELECT * FROM surveys
219+
-- Sampling year is in the column `year`, and we want to include 2000
220+
WHERE (year >= 2000)
221+
-- Dipodomys' species have the `species_id` DM, DO, and DS
222+
AND (species_id IN ('DM', 'DO', 'DS'));
223+
```
224+
225+
Although SQL queries often read like plain English, it is *always* useful to add
226+
comments; this is especially true of more complex queries.
227+
228+
## Sorting
229+
230+
We can also sort the results of our queries by using `ORDER BY`.
231+
For simplicity, let's go back to the **species** table and alphabetize it by taxa.
232+
233+
First, let's look at what's in the **species** table. It's a table of the species\_id and the full genus, species and taxa information for each species\_id. Having this in a separate table is nice, because we didn't need to include all
234+
this information in our main **surveys** table.
235+
236+
```sql
237+
SELECT *
238+
FROM species;
239+
```
240+
241+
Now let's order it by taxa.
242+
243+
```sql
244+
SELECT *
245+
FROM species
246+
ORDER BY taxa ASC;
247+
```
248+
249+
The keyword `ASC` tells us to order it in ascending order.
250+
We could alternately use `DESC` to get descending order.
251+
252+
```sql
253+
SELECT *
254+
FROM species
255+
ORDER BY taxa DESC;
256+
```
257+
258+
`ASC` is the default.
259+
260+
We can also sort on several fields at once.
261+
To truly be alphabetical, we might want to order by genus then species.
262+
263+
```sql
264+
SELECT *
265+
FROM species
266+
ORDER BY genus ASC, species ASC;
267+
```
268+
269+
::::::::::::::::::::::::::::::::::::::: challenge
270+
271+
## Challenge
272+
273+
- Write a query that returns year, species\_id, and weight in kg from
274+
the surveys table, sorted with the largest weights at the top.
275+
276+
::::::::::::::: solution
277+
278+
## Solution
279+
280+
```sql
281+
SELECT year, species_id, weight / 1000
282+
FROM surveys
283+
ORDER BY weight DESC;
284+
```
285+
286+
:::::::::::::::::::::::::
287+
288+
::::::::::::::::::::::::::::::::::::::::::::::::::
289+
290+
## Order of execution
291+
292+
Another note for ordering. We don't actually have to display a column to sort by
293+
it. For example, let's say we want to order the birds by their species ID, but
294+
we only want to see genus and species.
295+
296+
```sql
297+
SELECT genus, species
298+
FROM species
299+
WHERE taxa = 'Bird'
300+
ORDER BY species_id ASC;
301+
```
302+
303+
We can do this because sorting occurs earlier in the computational pipeline than
304+
field selection.
305+
306+
The computer is basically doing this:
307+
308+
1. Filtering rows according to WHERE
309+
2. Sorting results according to ORDER BY
310+
3. Displaying requested columns or expressions.
311+
312+
Clauses are written in a fixed order: `SELECT`, `FROM`, `WHERE`, then `ORDER BY`.
313+
314+
:::::::::::::::::::::::::::::::::::::: discussion
315+
316+
## Multiple statements
317+
318+
It is possible to write a query as a single line, but for readability, we recommend to put each clause on its own line.
319+
The standard way to separate a whole SQL statement is with a semicolon. This allows more than one SQL statement to be executed together.
320+
321+
322+
::::::::::::::::::::::::::::::::::::::::::::::::::
323+
324+
::::::::::::::::::::::::::::::::::::::: challenge
325+
326+
## Challenge
327+
328+
- Let's try to combine what we've learned so far in a single
329+
query. Using the surveys table, write a query to display the three date fields,
330+
`species_id`, and weight in kilograms (rounded to two decimal places), for
331+
individuals captured in 1999, ordered alphabetically by the `species_id`.
332+
- Write the query as a single line, then put each clause on its own line, and
333+
see how more legible the query becomes!
334+
335+
::::::::::::::: solution
336+
337+
## Solution
338+
339+
```sql
340+
SELECT year, month, day, species_id, ROUND(weight / 1000, 2)
341+
FROM surveys
342+
WHERE year = 1999
343+
ORDER BY species_id;
344+
```
345+
346+
:::::::::::::::::::::::::
347+
348+
::::::::::::::::::::::::::::::::::::::::::::::::::
349+
350+
:::::::::::::::::::::::::::::::::::::::: keypoints
351+
352+
- It is useful to apply conventions when writing SQL queries to aid readability.
353+
- Use logical connectors such as AND or OR to create more complex queries.
354+
- Calculations using mathematical symbols can also be performed on SQL queries.
355+
- Adding comments in SQL helps keep complex queries understandable.
356+
357+
::::::::::::::::::::::::::::::::::::::::::::::::::
358+
359+

0 commit comments

Comments
 (0)