|
1 |
| ---- |
2 |
| -id: sets |
3 |
| -title: Sets |
4 |
| -description: > |
5 |
| - The standard library's Set module |
6 |
| -category: "data-structures" |
7 |
| -date: 2021-05-27T21:07:30-00:00 |
8 |
| ---- |
9 |
| - |
10 |
| -# Sets |
11 |
| - |
12 |
| -## Module Set |
13 |
| -To make a set of strings: |
14 |
| - |
15 |
| -```ocaml |
16 |
| -# module SS = Set.Make(String);; |
17 |
| -module SS : |
18 |
| - sig |
19 |
| - type elt = string |
20 |
| - type t = Set.Make(String).t |
21 |
| - val empty : t |
22 |
| - val is_empty : t -> bool |
23 |
| - val mem : elt -> t -> bool |
24 |
| - val add : elt -> t -> t |
25 |
| - val singleton : elt -> t |
26 |
| - val remove : elt -> t -> t |
27 |
| - val union : t -> t -> t |
28 |
| - val inter : t -> t -> t |
29 |
| - val disjoint : t -> t -> bool |
30 |
| - val diff : t -> t -> t |
31 |
| - val compare : t -> t -> int |
32 |
| - val equal : t -> t -> bool |
33 |
| - val subset : t -> t -> bool |
34 |
| - val iter : (elt -> unit) -> t -> unit |
35 |
| - val map : (elt -> elt) -> t -> t |
36 |
| - val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a |
37 |
| - val for_all : (elt -> bool) -> t -> bool |
38 |
| - val exists : (elt -> bool) -> t -> bool |
39 |
| - val filter : (elt -> bool) -> t -> t |
40 |
| - val filter_map : (elt -> elt option) -> t -> t |
41 |
| - val partition : (elt -> bool) -> t -> t * t |
42 |
| - val cardinal : t -> int |
43 |
| - val elements : t -> elt list |
44 |
| - val min_elt : t -> elt |
45 |
| - val min_elt_opt : t -> elt option |
46 |
| - val max_elt : t -> elt |
47 |
| - val max_elt_opt : t -> elt option |
48 |
| - val choose : t -> elt |
49 |
| - val choose_opt : t -> elt option |
50 |
| - val split : elt -> t -> t * bool * t |
51 |
| - val find : elt -> t -> elt |
52 |
| - val find_opt : elt -> t -> elt option |
53 |
| - val find_first : (elt -> bool) -> t -> elt |
54 |
| - val find_first_opt : (elt -> bool) -> t -> elt option |
55 |
| - val find_last : (elt -> bool) -> t -> elt |
56 |
| - val find_last_opt : (elt -> bool) -> t -> elt option |
57 |
| - val of_list : elt list -> t |
58 |
| - val to_seq_from : elt -> t -> elt Seq.t |
59 |
| - val to_seq : t -> elt Seq.t |
60 |
| - val to_rev_seq : t -> elt Seq.t |
61 |
| - val add_seq : elt Seq.t -> t -> t |
62 |
| - val of_seq : elt Seq.t -> t |
63 |
| - end |
64 |
| -``` |
| 1 | +<!-- ((! set title Set !)) ((! set learn !)) --> |
| 2 | +<!-- ((! set center !)) --> |
65 | 3 |
|
66 |
| -To create a set you need to start somewhere so here is the empty set: |
| 4 | +# Set |
67 | 5 |
|
68 |
| -```ocaml |
69 |
| -# let s = SS.empty;; |
70 |
| -val s : SS.t = <abstr> |
71 |
| -``` |
| 6 | +`Set` is a functor, which means that it is a module that is parameterized |
| 7 | +by another module. More concretely, this means you cannot directly create |
| 8 | +a set; instead, you must first specify what type of elements your set will |
| 9 | +contain. |
72 | 10 |
|
73 |
| -Alternatively if we know an element to start with we can create a set |
74 |
| -like |
| 11 | +The `Set` functor provides a function `Make` which accepts a module as a |
| 12 | +parameter, and returns a new module representing a set whose elements have |
| 13 | +the type that you passed in. For example, if you want to work with sets of |
| 14 | +strings, you can invoke `Set.Make(String)` which will return you a new module |
| 15 | +which you can assign the name `SS` (short for "String Set"). Note: Be sure to |
| 16 | +pay attention to the case; you need to type `Set.Make(String)` and not |
| 17 | +`Set.Make(string)`. The reason behind this is explained in the |
| 18 | +"Technical Details" section at the bottom. |
75 | 19 |
|
76 |
| -```ocaml |
77 |
| -# let s = SS.singleton "hello";; |
78 |
| -val s : SS.t = <abstr> |
79 |
| -``` |
| 20 | +Doing this in the OCaml's top level will yield a lot of output: |
80 | 21 |
|
81 |
| -To add some elements to the set we can do. |
| 22 | +```ocamltop |
| 23 | +module SS = Set.Make(String);; |
| 24 | +``` |
82 | 25 |
|
83 |
| -```ocaml |
84 |
| -# let s = |
85 |
| - List.fold_right SS.add ["hello"; "world"; "community"; "manager"; |
86 |
| - "stuff"; "blue"; "green"] s;; |
87 |
| -val s : SS.t = <abstr> |
| 26 | +What happened here is that after assigning your newly created module to the name |
| 27 | +`SS`, OCaml's top level then displayed the module, which in this case contains |
| 28 | +a large number of convenience functions for working with sets (for example `is_empty` |
| 29 | +for checking if you set is empty, `add` to add an element to your set, `remove` to |
| 30 | +remove an element from your set, and so on). |
| 31 | + |
| 32 | +Note also that this module defines two types: `type elt = String.t` representing |
| 33 | +the type of the elements, and `type t = Set.Make(String).t` representing the type of |
| 34 | +the set itself. It's important to note this, because these types are used in the |
| 35 | +signatures of many of the functions defined in this module. |
| 36 | + |
| 37 | +For example, the `add` function has the signature `elt -> t -> t`, which means |
| 38 | +that it expects an element (a String), and a set of strings, and will return to you |
| 39 | +a set of strings. As you gain more experience in OCaml and other function languages, |
| 40 | +the type signature of functions are often the most convenient form of documentation |
| 41 | +on how to use those functions. |
| 42 | + |
| 43 | +## Creating a Set |
| 44 | + |
| 45 | +You've created your module representing a set of strings, but now you actually want |
| 46 | +to create an instance of a set of strings. So how do we go about doing this? Well, you |
| 47 | +could search through the documentation for the original `Set` functor to try and |
| 48 | +find what function or value you should use to do this, but this is an excellent |
| 49 | +opportunity to practice reading the type signatures and inferring the answer from them. |
| 50 | + |
| 51 | +You want to create a new set (as opposed to modifying an existing set). So you should |
| 52 | +look for functions whose return result has type `t` (the type representing the set), |
| 53 | +and which *does not* require a parameter of type `t`. |
| 54 | + |
| 55 | +Skimming through the list of functions in the module, there's only a handful of functions |
| 56 | +that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t` |
| 57 | +and `of_seq : elt Seq.t -> t`. |
| 58 | + |
| 59 | +Perhaps you already know how to work with lists and sequences in OCaml or |
| 60 | +perhaps you don't. For now, let's assume you don't know, and so we'll focus |
| 61 | +our attention on the first two functions in that list: `empty` and `singleton`. |
| 62 | + |
| 63 | +The type signature for `empty` says that it simply returns `t`, i.e. an instance |
| 64 | +of our set, without requiring any parameters at all. By intuition, you might |
| 65 | +guess that the only reasonable set that a library function could return when |
| 66 | +given zero parameters is the empty set. And the fact that the function is named |
| 67 | +`empty` reinforces this theory. |
| 68 | + |
| 69 | +Is there a way to test this theory? Perhaps if we had a function which |
| 70 | +could print out the size of a set, then we could check if the set we get |
| 71 | +from `empty` has a size of zero. In other words, we want a function which |
| 72 | +receives a set as a parameter, and returns an integer as a result. Again, |
| 73 | +skimming through the list of functions in the module, we see there is a |
| 74 | +function which matches this signature: `cardinal : t -> int`. If you're |
| 75 | +not familiar with the word "cardinal", you can look it up on Wikipedia |
| 76 | +and notice that it basically refers to the size of sets, so this reinforces |
| 77 | +the idea that this is exactly the function we want. |
| 78 | + |
| 79 | +So let's test our hypothesis: |
| 80 | + |
| 81 | +```ocamltop |
| 82 | +let s = SS.empty;; |
| 83 | +SS.cardinal s;; |
88 | 84 | ```
|
89 | 85 |
|
90 |
| -Now if we are playing around with sets we will probably want to see what |
91 |
| -is in the set that we have created. To do this we can write a function |
92 |
| -that will print the set out. |
| 86 | +Excellent, it looks like `SS.empty` does indeed create an empty set, |
| 87 | +and `SS.cardinal` does indeed print out the size of a set. |
| 88 | + |
| 89 | +What about that other function we saw, `singleton : elt -> t`? Again, |
| 90 | +using our intuition, if we provide the function with a single element, |
| 91 | +and the function returns a set, then probably the function will return |
| 92 | +a set containing that element (or else what else would it do with the |
| 93 | +parameter we gave it?). The name of the function is `singleton`, and |
| 94 | +again if you're unfamiliar with what word, you can look it up on |
| 95 | +Wikipedia and see that the word means "a set with exactly one element". |
| 96 | +It sounds like we're on the right track again. Let's test our theory. |
| 97 | + |
| 98 | +```ocamltop |
| 99 | +let s = SS.singleton "hello";; |
| 100 | +SS.cardinal s;; |
| 101 | +``` |
93 | 102 |
|
94 |
| -```ocaml |
95 |
| -# let print_set s = |
96 |
| - SS.iter print_endline s;; |
97 |
| -val print_set : SS.t -> unit = <fun> |
| 103 | +It looks like we were right again! |
| 104 | + |
| 105 | +## Working with Sets |
| 106 | + |
| 107 | +Now let's say we want to build bigger and more complex sets. Specifically, |
| 108 | +let's say we want to add another element to our existing set. So we're |
| 109 | +looking for a function with two parameters: One of the parameters should |
| 110 | +be the element we wish to add, and the other parameter should be the set |
| 111 | +that we're adding to. For the return value, we would expect it to either |
| 112 | +return unit (if the function modifies the set in place), or it returns a |
| 113 | +new set representing the result of adding the new element. So we're |
| 114 | +looking for signatures that look something like `elt -> t -> unit` or |
| 115 | +`t -> elt -> unit` (since we don't know what order the two parameters |
| 116 | +should appear in), or `elt -> t -> t` or `t -> elt -> t`. |
| 117 | + |
| 118 | +Skimming through the list, we see 2 functions with matching signatures: |
| 119 | +`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names, |
| 120 | +`add` is probably the function we're looking for. `remove` probably removes |
| 121 | +an element from a set, and using our intuition again, it does seem like |
| 122 | +the type signature makes sense: To remove an element from a set, you need |
| 123 | +to tell it what set you want to perform the removal on and what element |
| 124 | +you want to remove; and the return result will be the resulting set after |
| 125 | +the removal. |
| 126 | + |
| 127 | +Furthermore, because we see that these functions return `t` and not `unit`, |
| 128 | +we can infer that these functions do not modify the set in place, but |
| 129 | +instead return a new set. Again, we can test this theory: |
| 130 | + |
| 131 | +```ocamltop |
| 132 | +let firstSet = SS.singleton "hello";; |
| 133 | +let secondSet = SS.add "world" firstSet;; |
| 134 | +SS.cardinal firstSet;; |
| 135 | +SS.cardinal secondSet;; |
98 | 136 | ```
|
99 | 137 |
|
100 |
| -If we want to remove a specific element of a set there is a remove |
101 |
| -function. However if we want to remove several elements at once we could |
102 |
| -think of it as doing a 'filter'. Let's filter out all words that are |
103 |
| -longer than 5 characters. |
| 138 | +It looks like our theories were correct! |
104 | 139 |
|
105 |
| -This can be written as: |
| 140 | +## Sets of With Custom Comparators |
106 | 141 |
|
107 |
| -```ocaml |
108 |
| -# let my_filter str = |
109 |
| - String.length str <= 5;; |
110 |
| -val my_filter : string -> bool = <fun> |
111 |
| -# let s2 = SS.filter my_filter s;; |
112 |
| -val s2 : SS.t = <abstr> |
113 |
| -``` |
| 142 | +The `SS` module we created uses the built-in comparison function provided |
| 143 | +by the `String` module, which performs a case-sensitive comparison. We |
| 144 | +can test that with the following code: |
114 | 145 |
|
115 |
| -or using an anonymous function: |
| 146 | +```ocamltop |
| 147 | +let firstSet = SS.singleton "hello";; |
| 148 | +let secondSet = SS.add "HELLO" firstSet;; |
| 149 | +SS.cardinal firstSet;; |
| 150 | +SS.cardinal secondSet;; |
| 151 | +``` |
116 | 152 |
|
117 |
| -```ocaml |
118 |
| -# let s2 = SS.filter (fun str -> String.length str <= 5) s;; |
119 |
| -val s2 : SS.t = <abstr> |
| 153 | +As we can see, the `secondSet` has a cardinality of 2, indicating that |
| 154 | +`"hello"` and `"HELLO"` are considered two distinct elements. |
| 155 | + |
| 156 | +Let's say we want to create a set which performs a case-insensitive |
| 157 | +comparison instead. To do this, we simply have to change the parameter |
| 158 | +that we pass to the `Set.Make` function. |
| 159 | + |
| 160 | +The `Set.Make` function expects a struct with two fields: a type `t` |
| 161 | +that represents the type of the element, and a function `compare` |
| 162 | +whose signature is `t -> t -> int` and essentially returns 0 if two |
| 163 | +values are equal, and non-zero if they are non-equal. It just so happens |
| 164 | +that the `String` module matches that structure, which is why we could |
| 165 | +directly pass `String` as a parameter to `Set.Make`. Incidentally, many |
| 166 | +other modules also have that structure, including `Int` and `Float`, |
| 167 | +and so they too can be directly passed into `Set.Make` to construct a |
| 168 | +set of integers, or a set of floating point numbers. |
| 169 | + |
| 170 | +For our use case, we still want our elements to be of type string, but |
| 171 | +we want to change the comparison function to ignore the case of the |
| 172 | +strings. We can accomplish this by directly passing in a literal struct |
| 173 | +to the `Set.Make` function: |
| 174 | + |
| 175 | +```ocamltop |
| 176 | +module CISS = Set.Make(struct |
| 177 | + type t = string |
| 178 | + let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b) |
| 179 | +end);; |
120 | 180 | ```
|
121 | 181 |
|
122 |
| -If we want to check and see if an element is in the set it might look |
123 |
| -like this. |
| 182 | +We name the resulting module CISS (short for "Case Insensitive String Set"). |
| 183 | +We can now test whether this module has the desired behavior: |
124 | 184 |
|
125 |
| -```ocaml |
126 |
| -# SS.mem "hello" s2;; |
127 |
| -- : bool = true |
128 |
| -``` |
129 | 185 |
|
130 |
| -The Set module also provides the set theoretic operations union, |
131 |
| -intersection and difference. For example, the difference of the original |
132 |
| -set and the set with short strings (≤ 5 characters) is the set of long |
133 |
| -strings: |
| 186 | +```ocamltop |
| 187 | +let firstSet = CISS.singleton "hello";; |
| 188 | +let secondSet = CISS.add "HELLO" firstSet;; |
| 189 | +CISS.cardinal firstSet;; |
| 190 | +CISS.cardinal secondSet;; |
| 191 | +``` |
134 | 192 |
|
135 |
| -```ocaml |
136 |
| -# print_set (SS.diff s s2);; |
137 |
| -community |
138 |
| -manager |
139 |
| -- : unit = () |
| 193 | +Success! `secondSet` has a cardinality of 1, showing that `"hello"` |
| 194 | +and `"HELLO"` are now considered to be the same element in this set. |
| 195 | +We now have a set of strings whose compare function performs a case |
| 196 | +insensitive comparison. |
| 197 | + |
| 198 | +Note that this technique can also be used to allow arbitrary types |
| 199 | +to be used as the element type for set, as long as you can define a |
| 200 | +meaningful compare operation: |
| 201 | + |
| 202 | +```ocamltop |
| 203 | +type color = Red | Green | Blue;; |
| 204 | +
|
| 205 | +module SC = Set.Make(struct |
| 206 | + type t = color |
| 207 | + let compare a b = |
| 208 | + match (a, b) with |
| 209 | + | (Red, Red) -> 0 |
| 210 | + | (Red, Green) -> 1 |
| 211 | + | (Red, Blue) -> 1 |
| 212 | + | (Green, Red) -> -1 |
| 213 | + | (Green, Green) -> 0 |
| 214 | + | (Green, Blue) -> 1 |
| 215 | + | (Blue, Red) -> -1 |
| 216 | + | (Blue, Green) -> -1 |
| 217 | + | (Blue, Blue) -> 0 |
| 218 | +end);; |
140 | 219 | ```
|
141 | 220 |
|
142 |
| -Note that the Set module provides a purely functional data structure: |
143 |
| -removing an element from a set does not alter that set but, rather, |
144 |
| -returns a new set that is very similar to (and shares much of its |
145 |
| -internals with) the original set. |
| 221 | +## Technical Details |
| 222 | + |
| 223 | +### Set.Make, types and modules |
| 224 | + |
| 225 | +As mentioned in a previous section, the `Set.Make` function accepts a structure |
| 226 | +with two specific fields, `t` and `compare`. Modules have structure, and thus |
| 227 | +it's possible (but not guaranteed) for a module to have the structure that |
| 228 | +`Set.Make` expects. On the other hand, types do not have structure, and so you |
| 229 | +can never pass a type to the `Set.Make` function. In OCaml, modules start with |
| 230 | +an upper case letter and types start with a lower case letter. This is why |
| 231 | +when creating a set of strings, you have to use `Set.Make(String)` (passing in |
| 232 | +the module named `String`), and not `Set.Make(string)` (which would be attempting |
| 233 | +to pass in the type named `string`, which will not work). |
| 234 | + |
| 235 | +### Purely Functional Data Structures |
| 236 | + |
| 237 | +The data structure implemented by the Set functor is a purely functional one. |
| 238 | +What exactly that means is a big topic in itself (feel free to search for |
| 239 | +"Purely Functional Data Structure" in Google or Wikipedia to learn more). As a |
| 240 | +short oversimplification, this means that all instances of the data structure |
| 241 | +that you create are immutable. The functions like `add` and `remove` do not |
| 242 | +actually modify the set you pass in, but instead return a new set representing |
| 243 | +the results of having performed the corresponding operation. |
| 244 | + |
| 245 | +### Full API documentation |
| 246 | + |
| 247 | +This tutorial focused on teaching how to quickly find a function that does what |
| 248 | +you want by looking at the type signature. This is often the quickest and most |
| 249 | +convenient way to discover useful functions. However, sometimes you do want to |
| 250 | +see the formal documentation for the API provided by a module. For sets, the |
| 251 | +API documentation you probably want to look at is at |
| 252 | +https://ocaml.org/api/Set.Make.html |
146 | 253 |
|
0 commit comments