-
Notifications
You must be signed in to change notification settings - Fork 1
152 parallelise garden script #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
chunk into groups of 30 files instead of single files
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Show resolved
Hide resolved
asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py
Outdated
Show resolved
Hide resolved
…arden_size_flow.py`
|
||
quarter = Parameter( | ||
name="quarter", | ||
help="EPC data quarter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="EPC data quarter", | |
help="EPC data quarter, 1-4", |
required=True, | ||
default="ews", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely want to run a debug flag here where we are able to run just through the sample of 3 building footprint + land registry file pairs
Thanks for this Roisín, looks great! High level comments, I suggested potentially:
I was able to run over the script, it took about an hour which seems fine! I quickly checked the head of the resulting parquet file, garden sizes seemed to check out, which is good. Let me know if anything is unclear! Aidan |
Fixes #152
Description
Refactor garden size pipeline into metaflow.
New files:
/pipeline/flows/run_calculate_garden_size_flow.py
- new flow to estimate garden size for individual properties. Refactored version of/pipeline/run_scripts/run_calculate_garden_size.py
/utils/parallel_utils.py
- new utils file with new function to assist parallelisation.MANIFEST.in
- required to importasf_heat_pump_suitability
as package in batch machineUpdated files:
setup.cfg
;setup.py
- changes required to importasf_heat_pump_suitability
as package in batch machineInstructions for Reviewer
I have set up the script to run on a sample of 3 building footprint + land registry file pairs and save it to S3. This is simply to test the flow of the whole pipeline, but the results should still make sense as this is just a subset of gardens. It should be relatively quick to run (~30-60 mins - if it's taking much longer, please kill the run and let me know). Please could you test that the flow runs all the way through successfully.
After the test run is complete, check that there are no flows running with the following line of code:
python asf_heat_pump_suitability/pipeline/flows/run_calculate_garden_size_flow.py --datastore=s3 batch list --my-runs
Please pay special attention to ...
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s