add notebook to compare MCS datasets from daps and core processing #112

crispy-wonton · 2025-06-09T16:54:24Z

Fixes #111

Summary

Compare the MCS and MCS-EPC joined output files from the daps and core processing pipelines in a notebook to identify any differences.

New files:

analysis/compare_processing/compare_mcs_installations_processing.py - notebook to compare processing outputs

Please note: the code is very repetitive due to using a notebook layout so I have left comments on the file on github to show where code is repeated from a previous section to save you from reviewing the exact same code twice!

Instructions for reviewer:

To create the notebook, you can use the following lines of code

pip install jupytext
jupytext --to notebook asf_core_data/analysis/compare_processing/compare_mcs_installations_processing.py

You can then run the notebook as normal from your chosen IDE.

Please pay special attention to:

The files that are being loaded to be sure that we are comparing the correct datasets
The creation of unique IDs
The preprocessing on the last 2 comparison datasets (MCS-EPC full and MCS-EPC most relevant) before row-by-row comparison

Is there anything missing that you think would be good to investigate?

Checklist:

crispy-wonton · 2025-06-09T16:56:00Z

asf_core_data/analysis/compare_processing/compare_mcs_installations_processing.py

+raw_daps_df = pd.read_parquet(daps_epc_path)
+
+# %%
+# Preprocess datasets to make them comparable


The processing below is exactly the same as for the MCS-EPC full dataset

crispy-wonton · 2025-06-09T16:58:54Z

asf_core_data/analysis/compare_processing/compare_mcs_installations_processing.py

+# %%
+# Compare with y-data profiling
+core_report = ProfileReport(core_df, title=f"Core {dataset.upper()}", minimal=True)
+daps_report = ProfileReport(daps_df, title=f"Daps {dataset.upper()}", minimal=True)
+comparison_report = core_report.compare(daps_report)
+comparison_report.to_file(f"{dataset}_comparison.html")


These rows are identical to lines 30-33.

add notebook to compare MCS datasets from daps and core processing

9f88e98

crispy-wonton commented Jun 9, 2025

View reviewed changes

crispy-wonton requested a review from sofiapinto June 9, 2025 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add notebook to compare MCS datasets from daps and core processing #112

add notebook to compare MCS datasets from daps and core processing #112

Uh oh!

crispy-wonton commented Jun 9, 2025 •

edited

Loading

Uh oh!

crispy-wonton Jun 9, 2025

Uh oh!

crispy-wonton Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

add notebook to compare MCS datasets from daps and core processing #112

Are you sure you want to change the base?

add notebook to compare MCS datasets from daps and core processing #112

Uh oh!

Conversation

crispy-wonton commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Instructions for reviewer:

Please pay special attention to:

Uh oh!

crispy-wonton Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

crispy-wonton Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crispy-wonton commented Jun 9, 2025 •

edited

Loading