Skip to content

Commit 0e0f26f

Browse files
author
Cuihtlauac ALVARADO
committed
Import Rewrote Set Tutorial from V2 PR
@NebuPookins rewrote the set tutorial for [V2](https://github.com/ocaml/v2.ocaml.org) in 2021. The PR was neither merged nor rejected: ocaml/v2.ocaml.org#1596
1 parent c7cab97 commit 0e0f26f

File tree

1 file changed

+230
-123
lines changed

1 file changed

+230
-123
lines changed

data/tutorials/ds_02_set.md

Lines changed: 230 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -1,146 +1,253 @@
1-
---
2-
id: sets
3-
title: Sets
4-
description: >
5-
The standard library's Set module
6-
category: "data-structures"
7-
date: 2021-05-27T21:07:30-00:00
8-
---
9-
10-
# Sets
11-
12-
## Module Set
13-
To make a set of strings:
14-
15-
```ocaml
16-
# module SS = Set.Make(String);;
17-
module SS :
18-
sig
19-
type elt = string
20-
type t = Set.Make(String).t
21-
val empty : t
22-
val is_empty : t -> bool
23-
val mem : elt -> t -> bool
24-
val add : elt -> t -> t
25-
val singleton : elt -> t
26-
val remove : elt -> t -> t
27-
val union : t -> t -> t
28-
val inter : t -> t -> t
29-
val disjoint : t -> t -> bool
30-
val diff : t -> t -> t
31-
val compare : t -> t -> int
32-
val equal : t -> t -> bool
33-
val subset : t -> t -> bool
34-
val iter : (elt -> unit) -> t -> unit
35-
val map : (elt -> elt) -> t -> t
36-
val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a
37-
val for_all : (elt -> bool) -> t -> bool
38-
val exists : (elt -> bool) -> t -> bool
39-
val filter : (elt -> bool) -> t -> t
40-
val filter_map : (elt -> elt option) -> t -> t
41-
val partition : (elt -> bool) -> t -> t * t
42-
val cardinal : t -> int
43-
val elements : t -> elt list
44-
val min_elt : t -> elt
45-
val min_elt_opt : t -> elt option
46-
val max_elt : t -> elt
47-
val max_elt_opt : t -> elt option
48-
val choose : t -> elt
49-
val choose_opt : t -> elt option
50-
val split : elt -> t -> t * bool * t
51-
val find : elt -> t -> elt
52-
val find_opt : elt -> t -> elt option
53-
val find_first : (elt -> bool) -> t -> elt
54-
val find_first_opt : (elt -> bool) -> t -> elt option
55-
val find_last : (elt -> bool) -> t -> elt
56-
val find_last_opt : (elt -> bool) -> t -> elt option
57-
val of_list : elt list -> t
58-
val to_seq_from : elt -> t -> elt Seq.t
59-
val to_seq : t -> elt Seq.t
60-
val to_rev_seq : t -> elt Seq.t
61-
val add_seq : elt Seq.t -> t -> t
62-
val of_seq : elt Seq.t -> t
63-
end
64-
```
1+
<!-- ((! set title Set !)) ((! set learn !)) -->
2+
<!-- ((! set center !)) -->
653

66-
To create a set you need to start somewhere so here is the empty set:
4+
# Set
675

68-
```ocaml
69-
# let s = SS.empty;;
70-
val s : SS.t = <abstr>
71-
```
6+
`Set` is a functor, which means that it is a module that is parameterized
7+
by another module. More concretely, this means you cannot directly create
8+
a set; instead, you must first specify what type of elements your set will
9+
contain.
7210

73-
Alternatively if we know an element to start with we can create a set
74-
like
11+
The `Set` functor provides a function `Make` which accepts a module as a
12+
parameter, and returns a new module representing a set whose elements have
13+
the type that you passed in. For example, if you want to work with sets of
14+
strings, you can invoke `Set.Make(String)` which will return you a new module
15+
which you can assign the name `SS` (short for "String Set"). Note: Be sure to
16+
pay attention to the case; you need to type `Set.Make(String)` and not
17+
`Set.Make(string)`. The reason behind this is explained in the
18+
"Technical Details" section at the bottom.
7519

76-
```ocaml
77-
# let s = SS.singleton "hello";;
78-
val s : SS.t = <abstr>
79-
```
20+
Doing this in the OCaml's top level will yield a lot of output:
8021

81-
To add some elements to the set we can do.
22+
```ocamltop
23+
module SS = Set.Make(String);;
24+
```
8225

83-
```ocaml
84-
# let s =
85-
List.fold_right SS.add ["hello"; "world"; "community"; "manager";
86-
"stuff"; "blue"; "green"] s;;
87-
val s : SS.t = <abstr>
26+
What happened here is that after assigning your newly created module to the name
27+
`SS`, OCaml's top level then displayed the module, which in this case contains
28+
a large number of convenience functions for working with sets (for example `is_empty`
29+
for checking if you set is empty, `add` to add an element to your set, `remove` to
30+
remove an element from your set, and so on).
31+
32+
Note also that this module defines two types: `type elt = String.t` representing
33+
the type of the elements, and `type t = Set.Make(String).t` representing the type of
34+
the set itself. It's important to note this, because these types are used in the
35+
signatures of many of the functions defined in this module.
36+
37+
For example, the `add` function has the signature `elt -> t -> t`, which means
38+
that it expects an element (a String), and a set of strings, and will return to you
39+
a set of strings. As you gain more experience in OCaml and other function languages,
40+
the type signature of functions are often the most convenient form of documentation
41+
on how to use those functions.
42+
43+
## Creating a Set
44+
45+
You've created your module representing a set of strings, but now you actually want
46+
to create an instance of a set of strings. So how do we go about doing this? Well, you
47+
could search through the documentation for the original `Set` functor to try and
48+
find what function or value you should use to do this, but this is an excellent
49+
opportunity to practice reading the type signatures and inferring the answer from them.
50+
51+
You want to create a new set (as opposed to modifying an existing set). So you should
52+
look for functions whose return result has type `t` (the type representing the set),
53+
and which *does not* require a parameter of type `t`.
54+
55+
Skimming through the list of functions in the module, there's only a handful of functions
56+
that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t`
57+
and `of_seq : elt Seq.t -> t`.
58+
59+
Perhaps you already know how to work with lists and sequences in OCaml or
60+
perhaps you don't. For now, let's assume you don't know, and so we'll focus
61+
our attention on the first two functions in that list: `empty` and `singleton`.
62+
63+
The type signature for `empty` says that it simply returns `t`, i.e. an instance
64+
of our set, without requiring any parameters at all. By intuition, you might
65+
guess that the only reasonable set that a library function could return when
66+
given zero parameters is the empty set. And the fact that the function is named
67+
`empty` reinforces this theory.
68+
69+
Is there a way to test this theory? Perhaps if we had a function which
70+
could print out the size of a set, then we could check if the set we get
71+
from `empty` has a size of zero. In other words, we want a function which
72+
receives a set as a parameter, and returns an integer as a result. Again,
73+
skimming through the list of functions in the module, we see there is a
74+
function which matches this signature: `cardinal : t -> int`. If you're
75+
not familiar with the word "cardinal", you can look it up on Wikipedia
76+
and notice that it basically refers to the size of sets, so this reinforces
77+
the idea that this is exactly the function we want.
78+
79+
So let's test our hypothesis:
80+
81+
```ocamltop
82+
let s = SS.empty;;
83+
SS.cardinal s;;
8884
```
8985

90-
Now if we are playing around with sets we will probably want to see what
91-
is in the set that we have created. To do this we can write a function
92-
that will print the set out.
86+
Excellent, it looks like `SS.empty` does indeed create an empty set,
87+
and `SS.cardinal` does indeed print out the size of a set.
88+
89+
What about that other function we saw, `singleton : elt -> t`? Again,
90+
using our intuition, if we provide the function with a single element,
91+
and the function returns a set, then probably the function will return
92+
a set containing that element (or else what else would it do with the
93+
parameter we gave it?). The name of the function is `singleton`, and
94+
again if you're unfamiliar with what word, you can look it up on
95+
Wikipedia and see that the word means "a set with exactly one element".
96+
It sounds like we're on the right track again. Let's test our theory.
97+
98+
```ocamltop
99+
let s = SS.singleton "hello";;
100+
SS.cardinal s;;
101+
```
93102

94-
```ocaml
95-
# let print_set s =
96-
SS.iter print_endline s;;
97-
val print_set : SS.t -> unit = <fun>
103+
It looks like we were right again!
104+
105+
## Working with Sets
106+
107+
Now let's say we want to build bigger and more complex sets. Specifically,
108+
let's say we want to add another element to our existing set. So we're
109+
looking for a function with two parameters: One of the parameters should
110+
be the element we wish to add, and the other parameter should be the set
111+
that we're adding to. For the return value, we would expect it to either
112+
return unit (if the function modifies the set in place), or it returns a
113+
new set representing the result of adding the new element. So we're
114+
looking for signatures that look something like `elt -> t -> unit` or
115+
`t -> elt -> unit` (since we don't know what order the two parameters
116+
should appear in), or `elt -> t -> t` or `t -> elt -> t`.
117+
118+
Skimming through the list, we see 2 functions with matching signatures:
119+
`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names,
120+
`add` is probably the function we're looking for. `remove` probably removes
121+
an element from a set, and using our intuition again, it does seem like
122+
the type signature makes sense: To remove an element from a set, you need
123+
to tell it what set you want to perform the removal on and what element
124+
you want to remove; and the return result will be the resulting set after
125+
the removal.
126+
127+
Furthermore, because we see that these functions return `t` and not `unit`,
128+
we can infer that these functions do not modify the set in place, but
129+
instead return a new set. Again, we can test this theory:
130+
131+
```ocamltop
132+
let firstSet = SS.singleton "hello";;
133+
let secondSet = SS.add "world" firstSet;;
134+
SS.cardinal firstSet;;
135+
SS.cardinal secondSet;;
98136
```
99137

100-
If we want to remove a specific element of a set there is a remove
101-
function. However if we want to remove several elements at once we could
102-
think of it as doing a 'filter'. Let's filter out all words that are
103-
longer than 5 characters.
138+
It looks like our theories were correct!
104139

105-
This can be written as:
140+
## Sets of With Custom Comparators
106141

107-
```ocaml
108-
# let my_filter str =
109-
String.length str <= 5;;
110-
val my_filter : string -> bool = <fun>
111-
# let s2 = SS.filter my_filter s;;
112-
val s2 : SS.t = <abstr>
113-
```
142+
The `SS` module we created uses the built-in comparison function provided
143+
by the `String` module, which performs a case-sensitive comparison. We
144+
can test that with the following code:
114145

115-
or using an anonymous function:
146+
```ocamltop
147+
let firstSet = SS.singleton "hello";;
148+
let secondSet = SS.add "HELLO" firstSet;;
149+
SS.cardinal firstSet;;
150+
SS.cardinal secondSet;;
151+
```
116152

117-
```ocaml
118-
# let s2 = SS.filter (fun str -> String.length str <= 5) s;;
119-
val s2 : SS.t = <abstr>
153+
As we can see, the `secondSet` has a cardinality of 2, indicating that
154+
`"hello"` and `"HELLO"` are considered two distinct elements.
155+
156+
Let's say we want to create a set which performs a case-insensitive
157+
comparison instead. To do this, we simply have to change the parameter
158+
that we pass to the `Set.Make` function.
159+
160+
The `Set.Make` function expects a struct with two fields: a type `t`
161+
that represents the type of the element, and a function `compare`
162+
whose signature is `t -> t -> int` and essentially returns 0 if two
163+
values are equal, and non-zero if they are non-equal. It just so happens
164+
that the `String` module matches that structure, which is why we could
165+
directly pass `String` as a parameter to `Set.Make`. Incidentally, many
166+
other modules also have that structure, including `Int` and `Float`,
167+
and so they too can be directly passed into `Set.Make` to construct a
168+
set of integers, or a set of floating point numbers.
169+
170+
For our use case, we still want our elements to be of type string, but
171+
we want to change the comparison function to ignore the case of the
172+
strings. We can accomplish this by directly passing in a literal struct
173+
to the `Set.Make` function:
174+
175+
```ocamltop
176+
module CISS = Set.Make(struct
177+
type t = string
178+
let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b)
179+
end);;
120180
```
121181

122-
If we want to check and see if an element is in the set it might look
123-
like this.
182+
We name the resulting module CISS (short for "Case Insensitive String Set").
183+
We can now test whether this module has the desired behavior:
124184

125-
```ocaml
126-
# SS.mem "hello" s2;;
127-
- : bool = true
128-
```
129185

130-
The Set module also provides the set theoretic operations union,
131-
intersection and difference. For example, the difference of the original
132-
set and the set with short strings (≤ 5 characters) is the set of long
133-
strings:
186+
```ocamltop
187+
let firstSet = CISS.singleton "hello";;
188+
let secondSet = CISS.add "HELLO" firstSet;;
189+
CISS.cardinal firstSet;;
190+
CISS.cardinal secondSet;;
191+
```
134192

135-
```ocaml
136-
# print_set (SS.diff s s2);;
137-
community
138-
manager
139-
- : unit = ()
193+
Success! `secondSet` has a cardinality of 1, showing that `"hello"`
194+
and `"HELLO"` are now considered to be the same element in this set.
195+
We now have a set of strings whose compare function performs a case
196+
insensitive comparison.
197+
198+
Note that this technique can also be used to allow arbitrary types
199+
to be used as the element type for set, as long as you can define a
200+
meaningful compare operation:
201+
202+
```ocamltop
203+
type color = Red | Green | Blue;;
204+
205+
module SC = Set.Make(struct
206+
type t = color
207+
let compare a b =
208+
match (a, b) with
209+
| (Red, Red) -> 0
210+
| (Red, Green) -> 1
211+
| (Red, Blue) -> 1
212+
| (Green, Red) -> -1
213+
| (Green, Green) -> 0
214+
| (Green, Blue) -> 1
215+
| (Blue, Red) -> -1
216+
| (Blue, Green) -> -1
217+
| (Blue, Blue) -> 0
218+
end);;
140219
```
141220

142-
Note that the Set module provides a purely functional data structure:
143-
removing an element from a set does not alter that set but, rather,
144-
returns a new set that is very similar to (and shares much of its
145-
internals with) the original set.
221+
## Technical Details
222+
223+
### Set.Make, types and modules
224+
225+
As mentioned in a previous section, the `Set.Make` function accepts a structure
226+
with two specific fields, `t` and `compare`. Modules have structure, and thus
227+
it's possible (but not guaranteed) for a module to have the structure that
228+
`Set.Make` expects. On the other hand, types do not have structure, and so you
229+
can never pass a type to the `Set.Make` function. In OCaml, modules start with
230+
an upper case letter and types start with a lower case letter. This is why
231+
when creating a set of strings, you have to use `Set.Make(String)` (passing in
232+
the module named `String`), and not `Set.Make(string)` (which would be attempting
233+
to pass in the type named `string`, which will not work).
234+
235+
### Purely Functional Data Structures
236+
237+
The data structure implemented by the Set functor is a purely functional one.
238+
What exactly that means is a big topic in itself (feel free to search for
239+
"Purely Functional Data Structure" in Google or Wikipedia to learn more). As a
240+
short oversimplification, this means that all instances of the data structure
241+
that you create are immutable. The functions like `add` and `remove` do not
242+
actually modify the set you pass in, but instead return a new set representing
243+
the results of having performed the corresponding operation.
244+
245+
### Full API documentation
246+
247+
This tutorial focused on teaching how to quickly find a function that does what
248+
you want by looking at the type signature. This is often the quickest and most
249+
convenient way to discover useful functions. However, sometimes you do want to
250+
see the formal documentation for the API provided by a module. For sets, the
251+
API documentation you probably want to look at is at
252+
https://ocaml.org/api/Set.Make.html
146253

0 commit comments

Comments
 (0)