Skip to content

Commit 88df72a

Browse files
authored
Profile data dumping (#6723)
Data dumping for groups and profiles (#6723) This PR adds functionality to incrementally dump profile and group data into a human-readable output folder, and refactors the internal logic of the recently released process dumping. ### Public API The data dumping feature can be used from the CLI via the `verdi {profile|group|process} dump` commands. Furthermore, the classes `aiida.manage.configuration.Profile`, `aiida.orm.Group` and `aiida.orm.ProcessNode` are extended by a new public member function `dump` that takes the same dumping options as the CLI entry points. The internal implementation of the feature is contained in the private module `aiida.tools._dumping`, which is currently excluded from `codecov`. Further testing and modifications will be applied based on user feedback in smaller, more manageable PRs. ### Configuration of dumping To organize the extensive options, a data class for the config options is created using `pydantic` `BaseModel` in the `aiida.tools._dumping.config` module. For each type of dumping (process/group/profile) different options are available contained in three config classes `ProcessDumpConfig`, `ProfileDumpConfig`, `GroupDumpConfig` all inheriting from `BaseDumpConfig`. The `*DumpConfig` classes use the mixin pattern to organize different options via the `TimeFilterMixin`, `EntityFilterMixin`, `ProcessHandlingMixin` and `GroupManagementMixin` since they are not available for each type of entity being dumped. The new CLI entry points and the new member function `dump` all map their inputs to the respective config class. By mapping both inputs to the `*DumpConfig` classes the validation process is unified, reducing code duplication. ### State of dumping folder The dumping functionality tracks the state of the dumped folder wrt. to the AiiDA database. This requires a persistent storage of the current state of the dumping folder as well as logic comparing the state of the dumping folder and the database. To prevent expensive file reads of the dumping folder, the state is stored in a `json` file after the dumping process. The logic to evaluate and track the state and compare it with the database is contained in the modules `aiida.tools._dumping.tracking` and `aiida.tools._dumping.mapping`, and changes since the last dump (new/deleted nodes/groups, relabeled groups, node membership changes, etc.) are then picked up via the module `aiida.tools._dumping.detector`. This enables incremental dumping of data in a way that the human-readable output folder of the dumping feature tracks the state of the AiiDA DB as it evolves. ### Execution of the dump The `aiida.tools._dumping.engine` module is responsible for the top-level orchestration of the dumping process (including reading in the `json` state file from the previous dump, or deleting it, if overwrite mode is selected), as well as common setup and teardown operations. For group and profile dumping, changes in AiiDA's DB since the last dump that are not yet reflected in the dumping output folder, are then carried out incrementally. This includes deleting output directories of nodes and groups that were previously dumped but were since deleted from AiiDA's DB, applying group relabeling carried out by the user, as well as dumping new nodes and groups. This functionality is contained in the `aiida.tools._dumping.executors` that provides the `DeletionExecutor`, `ProcessDumpExecutor`, `GroupDumpExecutor`, and `ProfileDumpExecutor`, classes and presents the meat of the feature implementation. Finally, code that was previously contained in the `ProcessDumper` class is now moved to the `ProcessDumpExecutor`, while a `ProcessDumper` "facade" class is still provided via the `aiida.tools._dumping.facades` module, and exposed as public API for backwards compatibility.
1 parent cf07e9f commit 88df72a

39 files changed

+7825
-1159
lines changed

codecov.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,6 @@ coverage:
1515
patch:
1616
default:
1717
threshold: 0.1%
18+
19+
ignore:
20+
- src/aiida/tools/_dumping/**/*

docs/source/howto/data.rst

Lines changed: 379 additions & 46 deletions
Large diffs are not rendered by default.

docs/source/reference/command_line.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,7 @@ Below is a list with all available subcommands.
223223
create Create an empty group with a given label.
224224
delete Delete groups and (optionally) the nodes they contain.
225225
description Change the description of a group.
226+
dump Dump data of an AiiDA group to disk.
226227
list Show a list of existing groups.
227228
move-nodes Move the specified NODES from one group to another.
228229
path Inspect groups of nodes, with delimited label paths.
@@ -397,6 +398,7 @@ Below is a list with all available subcommands.
397398
Commands:
398399
configure-rabbitmq Configure RabbitMQ for a profile.
399400
delete Delete one or more profiles.
401+
dump Dump all data in an AiiDA profile's storage to disk.
400402
list Display a list of all available profiles.
401403
set-default Set a profile as the default profile.
402404
setdefault (Deprecated) Set a profile as the default profile.

pyproject.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,11 @@ Documentation = 'https://aiida.readthedocs.io'
279279
Home = 'http://www.aiida.net/'
280280
Source = 'https://github.com/aiidateam/aiida-core'
281281

282+
[tool.coverage.run]
283+
omit = [
284+
"src/aiida/tools/_dumping/**/*"
285+
]
286+
282287
[tool.flit.module]
283288
name = 'aiida'
284289

src/aiida/cmdline/commands/cmd_group.py

Lines changed: 109 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ def group_move_nodes(source_group, target_group, force, nodes, all_entries):
137137

138138
if not force:
139139
click.confirm(
140-
f'Are you sure you want to move {len(nodes)} nodes from {source_group} ' f'to {target_group}?', abort=True
140+
f'Are you sure you want to move {len(nodes)} nodes from {source_group} to {target_group}?', abort=True
141141
)
142142

143143
source_group.remove_nodes(nodes)
@@ -325,6 +325,11 @@ def group_relabel(group, label):
325325
echo.echo_critical(str(exception))
326326
else:
327327
echo.echo_success(f"Label changed to '{label}'")
328+
msg = (
329+
'Note that if you are dumping your profile data to disk, to reflect the relabeling of the group, '
330+
'run your `verdi profile dump` command again.'
331+
)
332+
echo.echo_report(msg)
328333

329334

330335
@verdi_group.command('description')
@@ -632,3 +637,106 @@ def group_path_ls(path, type_string, recursive, as_table, no_virtual, with_descr
632637
if no_virtual and child.is_virtual:
633638
continue
634639
echo.echo(child.path, bold=not child.is_virtual)
640+
641+
642+
@verdi_group.command('dump')
643+
@arguments.GROUP()
644+
@options.PATH()
645+
@options.DRY_RUN()
646+
@options.OVERWRITE()
647+
@options.PAST_DAYS()
648+
@options.START_DATE()
649+
@options.END_DATE()
650+
@options.FILTER_BY_LAST_DUMP_TIME()
651+
@options.ONLY_TOP_LEVEL_CALCS()
652+
@options.ONLY_TOP_LEVEL_WORKFLOWS()
653+
@options.DELETE_MISSING()
654+
@options.SYMLINK_CALCS()
655+
@options.INCLUDE_INPUTS()
656+
@options.INCLUDE_OUTPUTS()
657+
@options.INCLUDE_ATTRIBUTES()
658+
@options.INCLUDE_EXTRAS()
659+
@options.FLAT()
660+
@options.DUMP_UNSEALED()
661+
@click.pass_context
662+
@with_dbenv()
663+
def group_dump(
664+
ctx,
665+
group,
666+
path,
667+
dry_run,
668+
overwrite,
669+
past_days,
670+
start_date,
671+
end_date,
672+
filter_by_last_dump_time,
673+
delete_missing,
674+
only_top_level_calcs,
675+
only_top_level_workflows,
676+
symlink_calcs,
677+
include_inputs,
678+
include_outputs,
679+
include_attributes,
680+
include_extras,
681+
flat,
682+
dump_unsealed,
683+
):
684+
"""Dump data of an AiiDA group to disk."""
685+
686+
import traceback
687+
from pathlib import Path
688+
689+
from aiida.cmdline.utils import echo
690+
from aiida.tools._dumping.utils import DumpPaths
691+
692+
warning_msg = (
693+
'This is a new feature which is still in its testing phase. '
694+
'If you encounter unexpected behavior or bugs, please report them via Discourse or GitHub.'
695+
)
696+
echo.echo_warning(warning_msg)
697+
698+
try:
699+
if path is None:
700+
group_path = DumpPaths.get_default_dump_path(group)
701+
dump_base_output_path = Path.cwd() / group_path
702+
echo.echo_report(f'No output path specified. Using default: `{dump_base_output_path}`')
703+
else:
704+
dump_base_output_path = Path(path).resolve()
705+
echo.echo_report(f'Using specified output path: `{dump_base_output_path}`')
706+
707+
# --- Logical Checks ---
708+
if dry_run and overwrite:
709+
msg = (
710+
'`--dry-run` and `--overwrite` selected (or set in config). Overwrite operation will NOT be performed.'
711+
)
712+
echo.echo_warning(msg)
713+
714+
# Run the dumping
715+
group.dump(
716+
output_path=dump_base_output_path,
717+
dry_run=dry_run,
718+
overwrite=overwrite,
719+
past_days=past_days,
720+
start_date=start_date,
721+
end_date=end_date,
722+
filter_by_last_dump_time=filter_by_last_dump_time,
723+
only_top_level_calcs=only_top_level_calcs,
724+
only_top_level_workflows=only_top_level_workflows,
725+
symlink_calcs=symlink_calcs,
726+
include_inputs=include_inputs,
727+
include_outputs=include_outputs,
728+
include_attributes=include_attributes,
729+
include_extras=include_extras,
730+
flat=flat,
731+
dump_unsealed=dump_unsealed,
732+
)
733+
734+
if not dry_run:
735+
msg = f'Raw files for group `{group.label}` dumped into folder `{dump_base_output_path.name}`.'
736+
echo.echo_success(msg)
737+
else:
738+
echo.echo_success('Dry run completed.')
739+
740+
except Exception as e:
741+
msg = f'Unexpected error during dump of group {group.label}:\n ({e!s}).\n'
742+
echo.echo_critical(msg + traceback.format_exc())

src/aiida/cmdline/commands/cmd_process.py

Lines changed: 59 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from aiida.cmdline.params import arguments, options, types
1515
from aiida.cmdline.params.options.overridable import OverridableOption
1616
from aiida.cmdline.utils import decorators, echo
17+
from aiida.cmdline.utils.decorators import with_dbenv
1718
from aiida.common.log import LOG_LEVELS, capture_logging
1819

1920
REPAIR_INSTRUCTIONS = """\
@@ -583,98 +584,94 @@ def process_repair(manager, broker, dry_run):
583584
@verdi_process.command('dump')
584585
@arguments.PROCESS()
585586
@options.PATH()
587+
@options.DRY_RUN()
586588
@options.OVERWRITE()
587-
@click.option(
588-
'--include-inputs/--exclude-inputs',
589-
default=True,
590-
show_default=True,
591-
help='Include the linked input nodes of the `CalculationNode`(s).',
592-
)
593-
@click.option(
594-
'--include-outputs/--exclude-outputs',
595-
default=False,
596-
show_default=True,
597-
help='Include the linked output nodes of the `CalculationNode`(s).',
598-
)
599-
@click.option(
600-
'--include-attributes/--exclude-attributes',
601-
default=True,
602-
show_default=True,
603-
help='Include attributes in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
604-
)
605-
@click.option(
606-
'--include-extras/--exclude-extras',
607-
default=True,
608-
show_default=True,
609-
help='Include extras in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
610-
)
611-
@click.option(
612-
'-f',
613-
'--flat',
614-
is_flag=True,
615-
default=False,
616-
show_default=True,
617-
help='Dump files in a flat directory for every step of the workflow.',
618-
)
619-
@click.option(
620-
'--dump-unsealed',
621-
is_flag=True,
622-
default=False,
623-
show_default=True,
624-
help='Also allow the dumping of unsealed process nodes.',
625-
)
626-
@options.INCREMENTAL()
589+
@options.INCLUDE_INPUTS()
590+
@options.INCLUDE_OUTPUTS()
591+
@options.INCLUDE_ATTRIBUTES()
592+
@options.INCLUDE_EXTRAS()
593+
@options.FLAT()
594+
@options.DUMP_UNSEALED()
595+
@click.pass_context
596+
@with_dbenv()
627597
def process_dump(
598+
ctx,
628599
process,
629600
path,
601+
dry_run,
630602
overwrite,
631603
include_inputs,
632604
include_outputs,
633605
include_attributes,
634606
include_extras,
635607
flat,
636608
dump_unsealed,
637-
incremental,
638609
) -> None:
639610
"""Dump process input and output files to disk.
640611
641612
Child calculations/workflows (also called `CalcJob`s/`CalcFunction`s and `WorkChain`s/`WorkFunction`s in AiiDA
642613
jargon) run by the parent workflow are contained in the directory tree as sub-folders and are sorted by their
643-
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
614+
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
644615
running `verdi process status <pk>` on the command line.
645616
646617
By default, input and output files of each calculation can be found in the corresponding "inputs" and
647618
"outputs" directories (the former also contains the hidden ".aiida" folder with machine-readable job execution
648619
settings). Additional input and output files (depending on the type of calculation) are placed in the "node_inputs"
649620
and "node_outputs", respectively.
650621
651-
Lastly, every folder also contains a hidden, human-readable `.aiida_node_metadata.yaml` file with the relevant AiiDA
622+
Lastly, every folder also contains a hidden, human-readable `aiida_node_metadata.yaml` file with the relevant AiiDA
652623
node data for further inspection.
653624
"""
625+
import traceback
626+
from pathlib import Path
654627

628+
from aiida.cmdline.utils import echo
629+
from aiida.tools._dumping.utils import DumpPaths
655630
from aiida.tools.archive.exceptions import ExportValidationError
656-
from aiida.tools.dumping.processes import ProcessDumper
657-
658-
process_dumper = ProcessDumper(
659-
include_inputs=include_inputs,
660-
include_outputs=include_outputs,
661-
include_attributes=include_attributes,
662-
include_extras=include_extras,
663-
overwrite=overwrite,
664-
flat=flat,
665-
dump_unsealed=dump_unsealed,
666-
incremental=incremental,
631+
632+
warning_msg = (
633+
'This is a new feature which is still in its testing phase. '
634+
'If you encounter unexpected behavior or bugs, please report them via Discourse or GitHub.'
667635
)
636+
echo.echo_warning(warning_msg)
668637

638+
# Check for dry_run + overwrite
639+
if overwrite and dry_run:
640+
msg = 'Both `dry_run` and `overwrite` set to true. Operation will NOT be performed.'
641+
echo.echo_warning(msg)
642+
return
643+
644+
if path is None:
645+
process_path = DumpPaths.get_default_dump_path(process)
646+
dump_base_output_path = Path.cwd() / process_path
647+
msg = f'No output path specified. Using default: `{dump_base_output_path}`'
648+
echo.echo_report(msg)
649+
else:
650+
echo.echo_report(f'Using specified output path: `{path}`')
651+
dump_base_output_path = Path(path).resolve()
652+
653+
if dry_run:
654+
echo.echo_success('Dry run completed.')
655+
return
656+
657+
# Execute dumping
669658
try:
670-
dump_path = process_dumper.dump(process_node=process, output_path=path)
671-
except FileExistsError:
672-
echo.echo_critical(
673-
'Dumping directory exists and overwrite is False. Set overwrite to True, or delete directory manually.'
659+
process.dump(
660+
output_path=dump_base_output_path,
661+
dry_run=dry_run,
662+
overwrite=overwrite,
663+
include_inputs=include_inputs,
664+
include_outputs=include_outputs,
665+
include_attributes=include_attributes,
666+
include_extras=include_extras,
667+
flat=flat,
668+
dump_unsealed=dump_unsealed,
674669
)
670+
671+
msg = f'Raw files for process `{process.pk}` dumped into folder `{dump_base_output_path.name}`.'
672+
echo.echo_success(msg)
675673
except ExportValidationError as e:
676-
echo.echo_critical(f'{e!s}')
674+
echo.echo_critical(f'Data validation error during dump: {e!s}')
677675
except Exception as e:
678-
echo.echo_critical(f'Unexpected error while dumping {process.__class__.__name__} <{process.pk}>:\n ({e!s}).')
679-
680-
echo.echo_success(f'Raw files for {process.__class__.__name__} <{process.pk}> dumped into folder `{dump_path}`.')
676+
msg = f'Unexpected error during dump of process {process.pk}:\n ({e!s}).\n'
677+
echo.echo_critical(msg + traceback.format_exc())

0 commit comments

Comments
 (0)