You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [TTDB-831] fixed communication with pg_dump and pg_restore
Added:
- added a postgres version definition to use suitable pg_dump versions
- run option `--config` for configuration file with paths to specific versions of pg_dump/pg_restore utils
Updated:
- moved all modules with pg_anon modes into package `modes`
- actualized pyproject.toml
- tests. Function `anon_funcs.random_inn()` in tests replaced cause this function can't guarantee values unique
* [TTDB-831] pg_anon modes refactoring
Updated:
- modes `init` and `create-dict` packed into classes
- class `MainRoutine` decomposed on small methods
- class `PgAnonResult` now using only in `MainRoutine` and includes methods for updating statuses
- actualized tests
* [TTDB-831] Improving regexp searching
Added:
- for `skip_rules` and `include_rules` searching by `schema_mask` and `table_mask`
Updated:
- search in `data_regex.rules` for searching in text including `\n`
- search in `data_const.partial_constants` now case-insensitive
- actualized tests and README.md
* [TTDB-831] Optimization for reading data from database
* [TTDB-831] Excluded `view-fields` and `view-data` modes from checking postgres utils
* [TTDB-831] Updated phone regexp in examples and tests
* [TTDB-831] Added postgres version requirement into README.md
---------
Co-authored-by: Max Ibragimov <maxim.ibragimov@tantorlabs.ru>
|`--output-sens-dict-file`| Output file with sensitive fields will be saved to this value |
410
415
|`--output-no-sens-dict-file`| Output file with not sensitive fields will be saved to this value (Optional) |
411
416
|`--scan-mode`| defines whether to scan all data or only part of it ["full", "partial"] (default "partial") |
412
-
|`--scan-partial-rows`| In `--scan-mode partial` defines amount of rows to scan (default 10000)|
417
+
|`--scan-partial-rows`| In `--scan-mode partial` defines amount of rows to scan (default 10000). Actual rows count can be smaller after getting unique values|
413
418
414
419
#### Requirements for input --meta-dict-file (metadict):
415
420
@@ -428,24 +433,24 @@ var = {
428
433
},
429
434
"skip_rules": [ # List of schemas, tables, and fields to skip
430
435
{
431
-
# possibly some schema or table contains a lot of data that is not worth scanning. Skipped objects will not be automatically included in the resulting dictionary. Masks are not supported in this object.
432
-
"schema": "schm_mask_ext_exclude_2", #Schema specification is mandatory
433
-
"table": "card_numbers", # Optional. If there is no "table", the entire schema will be skipped.
436
+
# possibly some schema or table contains a lot of data that is not worth scanning. Skipped objects will not be automatically included in the resulting dictionary
437
+
"schema": "schm_mask_ext_exclude_2", #Can use "schema" for full name matching or "schema_mask" for regexp matching. Required one of them
438
+
"table": "card_numbers", # Optional. Can use "table" for full name matching or "table_mask" for regexp matching. If there is no "table"/"table_mask", the entire schema will be skipped.
434
439
"fields": ["val_skip"] # Optional. If there are no "fields", the entire table will be skipped.
435
440
}
436
441
],
437
442
"include_rules": [ # List of schemas, tables, and fields which will be scanning
438
443
{
439
444
# possibly you need specific fields for scanning or you can debug some functions on specific field
440
-
"schema": "schm_other_2", # Required. Schema specification is mandatory
441
-
"table": "tbl_test_anon_functions", # Optional. If there is no "table", the entire schema will be included.
442
-
"fields": ["fld_5_email"] # Optional. If there are no "fields", the entire table will be included.
445
+
"schema": "schm_other_2", # Can use "schema" for full name matching or "schema_mask" for regexp matching. Required one of them
446
+
"table": "tbl_test_anon_functions", # Optional. Can use "table" for full name matching or "table_mask" for regexp matching. If there is no "table"/"table_mask", the entire schema will be skipped.
447
+
"fields": ["fld_5_email"] # Optional. If there are no "fields", the entire table will be skipped.
443
448
}
444
449
],
445
450
"data_regex": { # List of regular expressions to search for sensitive data
|`--prepared-sens-dict-file`| Input file or file list with sensitive fields, which was obtained in previous use by option `--output-sens-dict-file` or prepared manually |
551
-
|`--dbg-stage-1-validate-dict`| Validate dictionary, show the tables and run SQL queries without data export (default false) |
552
-
|`--dbg-stage-2-validate-data`| Validate data, show the tables and run SQL queries with data export in prepared database (default false) |
553
-
|`--dbg-stage-3-validate-full`| Makes all logic with "limit" in SQL queries (default false) |
554
-
|`--clear-output-dir`| In dump mode clears output dict from previous dump or another files. (default true) |
555
-
|`--pg-dump`| Path to the `pg_dump` Postgres tool (default `/usr/bin/pg_dump`). |
556
-
|`--output-dir`| Output directory for dump files. (default "") |
|`--prepared-sens-dict-file`| Input file or file list with sensitive fields, which was obtained in previous use by option `--output-sens-dict-file` or prepared manually |
556
+
|`--dbg-stage-1-validate-dict`| Validate dictionary, show the tables and run SQL queries without data export (default false) |
557
+
|`--dbg-stage-2-validate-data`| Validate data, show the tables and run SQL queries with data export in prepared database (default false) |
558
+
|`--dbg-stage-3-validate-full`| Makes all logic with "limit" in SQL queries (default false) |
559
+
|`--clear-output-dir`| In dump mode clears output dict from previous dump or another files. (default true) |
560
+
|`--pg-dump`| Path to the `pg_dump` Postgres tool (default `/usr/bin/pg_dump`). |
561
+
|`--output-dir`| Output directory for dump files. (default "") |
557
562
558
563
### Run restore mode
559
564
@@ -721,6 +726,40 @@ from (
721
726
]
722
727
```
723
728
729
+
### Configuring of pg_anon
730
+
731
+
For specifying `pg_dump` and `pg_restore` utils, the parameters `--pg-dump` and `--pg-restore` can be used.
732
+
Also `--config` can be used for advanced configuring. This parameter accept YAML file in this format:
In case of mismatch current postgres version with this config, will be used version of `pg_dump` and `pg_restore` from `default` section. For example `pg_anon` can be run with this config on postgres 16. In this case will be used `pg_dump 17` and `pg_restore 17`, i.e. from `default` section.
0 commit comments