Skip to content

Commit b8a5546

Browse files
Docs: Improve README to describe parallel execution (#91)
1 parent 1725f45 commit b8a5546

File tree

1 file changed

+83
-2
lines changed

1 file changed

+83
-2
lines changed

README.md

Lines changed: 83 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ the files, so it is not able to report which values are different.
3636

3737
### Compare two directories
3838

39+
#### Basic Directory Comparison
40+
3941
Let's compare two directories with the following structures:
4042

4143
```bash
@@ -76,6 +78,8 @@ was detected. As mentioned previously, this is because `dir-content-diff` is onl
7678
in the compared directory that are also present in the reference directory, so the file
7779
`sub_file_3.b` is just ignored in this case.
7880

81+
#### Using Custom Comparators
82+
7983
If ``reference_dir/file_1.c`` is the following JSON-like file:
8084

8185
```json
@@ -117,6 +121,8 @@ The previous code will output the following dictionary:
117121
}
118122
```
119123

124+
#### Assertion-based Comparison
125+
120126
It is also possible to check whether the two directories are equal or not with the following code:
121127

122128
```python
@@ -135,7 +141,9 @@ Changed the value of '[a]' from 1 to 2.
135141
Changed the value of '[b][0]' from 1 to 10.
136142
```
137143

138-
Finally, the comparators have parameters that can be passed either to be used for all files of a
144+
#### Advanced Configuration Options
145+
146+
The comparators have parameters that can be passed either to be used for all files of a
139147
given extension or only for a specific file:
140148

141149
```python
@@ -163,6 +171,8 @@ dir_content_diff.assert_equal_trees(
163171

164172
Each comparator has different arguments that are detailed in the documentation.
165173

174+
##### File-specific Comparators
175+
166176
It's also possible to specify a arbitrary comparator for a specific file:
167177

168178
```python
@@ -174,6 +184,8 @@ specific_args = {
174184
}
175185
```
176186

187+
##### Pattern-based Configuration
188+
177189
Another possibility is to use regular expressions to associate specific arguments to
178190
a set of files:
179191

@@ -186,7 +198,9 @@ specific_args = {
186198
}
187199
```
188200

189-
And last but not least, it's possible to filter files from the reference directory (for example
201+
##### File Filtering
202+
203+
Last but not least, it's possible to filter files from the reference directory (for example
190204
because the reference directory contains temporary files that should not be compared). For
191205
example, the following code will ignore all files whose name does not start with `file_` and does
192206
not ends with `_tmp.yaml`:
@@ -203,6 +217,73 @@ dir_content_diff.compare_trees(
203217
```
204218

205219

220+
### Parallel Execution
221+
222+
By default, `dir-content-diff` runs file comparisons sequentially. However, for improved performance when comparing large numbers of files, parallel execution is available using either thread-based or process-based concurrency.
223+
224+
#### Configuration Options
225+
226+
Parallel execution can be configured using the following parameters:
227+
228+
- **`executor_type`**: Controls the type of parallel execution:
229+
- `"sequential"` (default): No parallel execution, files are compared one by one
230+
- `"thread"`: Uses `ThreadPoolExecutor` (recommended for I/O-bound tasks)
231+
- `"process"`: Uses `ProcessPoolExecutor` (recommended for CPU-intensive comparisons)
232+
233+
- **`max_workers`**: Maximum number of worker threads/processes. If `None` (default), it defaults to `min(32, (os.cpu_count() or 1) + 4)`.
234+
235+
#### Usage Examples
236+
237+
Enable thread-based parallel execution:
238+
239+
```python
240+
import dir_content_diff
241+
242+
dir_content_diff.compare_trees(
243+
"reference_dir",
244+
"compared_dir",
245+
executor_type="thread",
246+
max_workers=8
247+
)
248+
```
249+
250+
Enable process-based parallel execution with automatic worker count:
251+
252+
```python
253+
import dir_content_diff
254+
255+
dir_content_diff.compare_trees(
256+
"reference_dir",
257+
"compared_dir",
258+
executor_type="process"
259+
)
260+
```
261+
262+
Using a configuration object:
263+
264+
```python
265+
import dir_content_diff
266+
267+
config = dir_content_diff.ComparisonConfig(
268+
executor_type="thread",
269+
max_workers=4
270+
)
271+
272+
dir_content_diff.compare_trees(
273+
"reference_dir",
274+
"compared_dir",
275+
config=config
276+
)
277+
```
278+
279+
#### Performance Considerations
280+
281+
- **Thread-based execution** (`executor_type="thread"`) is generally recommended for most use cases as file comparisons are typically I/O-bound operations
282+
- **Process-based execution** (`executor_type="process"`) may be beneficial when using computationally intensive comparators or when dealing with very large files
283+
- Parallel execution is automatically disabled for single file comparisons and falls back to sequential execution when only one file needs to be compared
284+
- The optimal number of workers depends on your system's capabilities and the nature of your files; too many workers may actually decrease performance due to overhead
285+
286+
206287
### Export formatted data
207288

208289
Some comparators have to format the data before comparing them. For example, if one wants to

0 commit comments

Comments
 (0)