Skip to content

Commit 6b4c3d6

Browse files
authored
Merge pull request #119 from MetOffice/develop
Merge all changes from develop onto main before merging CORDEX changes
2 parents dd42a86 + c3c66be commit 6b4c3d6

19 files changed

+889
-99
lines changed

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@ conda activate pyprecis-environment
2020
:exclamation: *Note: As of v1.0 we are unable to provison the model data necessary for reproducing the full PyPRECIS learning environment via github due to it's large file size. Contact the PRECIS team for more information.*
2121

2222
## Before you start...
23-
Read through the current issues to see what you can help with. If you have your own ideas for improvements, please start a new issues so we can track and discuss your improvement. You must create a new branch for any changes you make.
23+
Read through the current issues to see what you can help with. If you have your own ideas for improvements, please start a new issue so we can track and discuss your improvement. You must create a new branch for any changes you make.
2424

2525
**Please take note of the following guidelines when contributing to the PyPRECIS repository.**
2626

27-
* Please do **not** make changes to the `master` branch. The `master` branch is reserved for files and code that has been fully tested and reviewed. Only the core PyPRECIS developers can/should push to the `master` branch.
27+
* Please do **not** make changes to `main` or `develop` branches. The `main` branch is reserved for files and code that has been fully tested and reviewed. Only the core PyPRECIS developers can push to the `main` and `develop` branches.
2828

2929
* The `develop` branch contains the latest holistic version of the `PyPRECIS` repository. Please branch off `develop` to fix a particular issue or add a new feature.
3030
* Please use the following tokens at the start of a new branch name to help sign-post and group branches:
@@ -66,5 +66,5 @@ have questions.**
6666

6767
<h5 align="center">
6868
<img src="notebooks/img/MO_MASTER_black_mono_for_light_backg_RBG.png" width="200" alt="Met Office"> <br>
69-
&copy; British Crown Copyright 2018 - 2019, Met Office
69+
&copy; British Crown Copyright 2018 - 2022, Met Office
7070
</h5>

README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ PyPRECIS is built on [Jupyter Notebooks](https://jupyter.org/), with data proces
3131
Further information about PRECIS can be found on the [Met Office website](https://www.metoffice.gov.uk/precis).
3232

3333
## Contents
34-
The teaching elements of PyPRECIS are contained in the `notebooks` directory. The primary worksheets are:
34+
The teaching elements of PyPRECIS are contained in the `notebooks` directory. The core primary worksheets are:
3535

3636
Worksheet | Aims
3737
:----: | -----------
@@ -42,7 +42,7 @@ Worksheet | Aims
4242
[5](notebooks/worksheet5.ipynb) | <li>Have an appreciation for working with daily model data</li><li>Understand how to calculate some useful climate extremes statistics</li><li>Be aware of some coding stratagies for dealing with large data sets</li></ul>
4343
[6](notebooks/worksheet6.ipynb) | An extended coding exercise designed to allow you to put everything you've learned into practise
4444

45-
Additional tutorials specific to the CSSP 20th Century reanalysis datasets:
45+
Additional tutorials specific to the CSSP 20th Century reanalysis dataset:
4646

4747
Worksheet | Aims
4848
:----: | -----------
@@ -55,10 +55,17 @@ Three additional worksheets are available for use by workshop instructors:
5555

5656
* `makedata.ipynb`: Provides scripts for preparing raw model output for use in notebook exercises.
5757
* `worksheet_solutions.ipyn`: Solutions to worksheet exercices.
58-
* `worksheet6example.ipynb`: Example code for Worksheet 6.
58+
* `worksheet6example.ipynb`: Example code for Worksheet 6.
5959

6060
## Data
61-
The data used in the worksheets is currently only available within the Met Office. Data relating to the CSSP_20CRDS_Tutorials is also available in Zarr format in an Azure Blob Storage Service. See the `data/DATA-ACESS.md` for further details.
61+
Data relating to the PyPRECIS project is currently held internally to the Met Office.
62+
63+
The total data volume for the core worksheets is 36.68 GB, of which ~20 GB is raw pp data. This is too large to be stored on github, or via git lfs.
64+
As of v2.0, the storage solution for making this data available alongside the notebooks is still under investgation.
65+
66+
Data relating to the **CSSP 20CRDS** tutorials is held online in an Azure Blob Storage Service. To access this data user will need a valid shared access signature (SAS) token. The data is in [Zarr](https://zarr.readthedocs.io/en/stable/) format and the total volume is ~2TB. The data is in hourly, 3 hourly, 6 hourly, daily and monthly frequencies stored seperatrely under the `metoffice-20cr-ds` container on MS-Azure. Monthly data only is also via [Zenodo](https://zenodo.org/record/2558135).
67+
68+
6269

6370
## Contributing
6471
Information on how to contribute can be found in the [Contributing guide](CONTRIBUTING.md).
@@ -69,5 +76,5 @@ PyPRECIS is licenced under BSD 3-clause licence for use outside of the Met Offic
6976

7077
<h5 align="center">
7178
<img src="notebooks/img/MO_MASTER_black_mono_for_light_backg_RBG.png" width="200" alt="Met Office"> <br>
72-
&copy; British Crown Copyright 2018 - 2020, Met Office
79+
&copy; British Crown Copyright 2018 - 2022, Met Office
7380
</h5>

data/DATA-ACCESS.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

dockerfile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
FROM continuumio/miniconda3
2+
3+
RUN apt-get update
4+
5+
# Set working directory for the project
6+
WORKDIR /app
7+
8+
SHELL ["/bin/bash", "--login", "-c"]
9+
10+
RUN apt-get install -y git
11+
12+
# Create Conda environment from the YAML file
13+
COPY environment.yml .
14+
RUN pip install --upgrade pip
15+
16+
RUN conda env create -f environment.yml
17+
18+
RUN conda init bash
19+
RUN conda activate pyprecis-environment
20+
21+
RUN pip install ipykernel && \
22+
python -m ipykernel install --name pyprecis-training
23+

environment.yml

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
11
name: pyprecis-environment
22
channels:
33
- conda-forge
4-
- defaults
5-
dependencies:
6-
- python=3.6.6
7-
- numpy
8-
- matplotlib
9-
- cartopy=0.16.0
10-
- dask=0.19.4
11-
- iris=2.2.0
4+
dependencies:
5+
- python=3.6.10
6+
- iris=2.4.0
7+
- numpy=1.17.4
8+
- matplotlib=3.1.3
9+
- nc-time-axis=1.2.0
10+
- jupyter_client=6.1.7
11+
- jupyter_core=4.6.3
12+
- dask=2.11.0
13+
- notebook=5.7.8
14+
- mo_pack=0.2.0
15+
- boto3
16+
- botocore
17+
- tqdm

notebooks/awsutils/README-AWS.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
2+
## AWS
3+
4+
### Create an EC2 instance
5+
6+
* Select Eu-west2 (London) region from the top right of navigation bar
7+
* Click on Launch instance
8+
* Choose Amazon Linux 2 AMI (HVM) kARNEL 5.10 64-bit (- X86) machine, click select
9+
* Choose t2.2xlarge and click next: configure instance details
10+
* Choose subnet default eu-west-2c
11+
* In IAM role choose existing trainings-ec2-dev role and click next: storage
12+
* 8 gb is fine, click next: add tags
13+
* Add following tags
14+
* Name: [Unique Instance name]
15+
* Tenable: FA
16+
* ServiceOwner: [firstname.lastname]
17+
* ServiceCode: PABCLT
18+
* add securitygroup, select an existing security group: IAStrainings-ec2-mo
19+
* Review and Launch and then select launch
20+
* It will prompt to set a key pair (to allow ssh). create a new key and download it.
21+
22+
It will create the instance. To see the running instance goto instances and instacne state will be "Running"
23+
24+
### SSH instance on VDI
25+
26+
27+
* Save the key (.pem) to .ssh and set the permission: chmod 0400 ~/.ssh/your_key.pem
28+
* Open ~/.ssh/config and add following:
29+
30+
```
31+
Host ec2-*.eu-west-2.compute.amazonaws.com
32+
IdentityFile ~/.ssh/your_key.pem
33+
User ec2-user
34+
35+
```
36+
37+
* Find the public IPv4 DNS and ssh in using it ssh ec2-<ip address>.eu-west-2.compute.amazonaws.com, public IPv4 DNS can be found in instance detail on AWS. Click on your instance and it will open the details.
38+
39+
* Remember to shutdown the instance when not using it. It will save the cost.
40+
### create s3 bucket
41+
42+
* goto s3 service and press "create bucket"
43+
* name the bucket
44+
* set region to EU (London) eu-west-2
45+
* add tags:
46+
* Name: [name of bucket or any unique name]
47+
* ServiceOwner: [your-name]
48+
* ServiceCode: PABCLT
49+
* Tenable: FA
50+
* click on "create bucket"
51+
52+
### Key configurations
53+
54+
55+
The above script run only when config files contains latest keys. In order to update the keys:
56+
57+
* go to AB climate training dev --> Administrator access --> command line or programmatic access
58+
* Copy keys in "Option 1: Set AWS environment variables"
59+
* In VDI, paste (/replace existing) these keys in ~/.aws/config
60+
* add [default] in first line
61+
* Copy keys in "Option 2: Add a profile to your AWS credentials file"
62+
* In VDI, Paste the keys in credentials file: ~/.aws/credentials (remove the first copied line, looks somethings like: [198477955030_AdministratorAccess])
63+
* add [default] in first line
64+
65+
The config and credentials file should look like (with own keys):
66+
67+
```
68+
[default]
69+
export AWS_ACCESS_KEY_ID="ASIAS4NRVH7LD2RRGSFB"
70+
export AWS_SECRET_ACCESS_KEY="rpI/dxzQWhCul8ZHd18n1VW1FWjc0LxoKeGO50oM"
71+
export AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjEGkaCWV1LXdlc3QtMiJH"
72+
```
73+
74+
### Loading data on s3 bucket from VDI (using boto3)
75+
76+
to upload the file(s) on S3 use: /aws-scripts/s3_file_upload.py
77+
to upload the directory(s) on S3 use: /aws-scripts/s3_bulk_data_upload.py
78+
79+
### AWS Elastic container repository
80+
81+
Following instructions are for creating image repo on ECR and uploading container image
82+
83+
* ssh to the previously created EC2 instance, make an empty Git repo:
84+
85+
```
86+
sudo yum install -y git
87+
git init
88+
```
89+
* On VDI, run the following command to push the PyPrecis repo containing the docker file to the EC2 instance:
90+
```
91+
git push <ec2 host name>:~
92+
```
93+
94+
* Now checkout the branch on EC2: git checkout [branch-name]
95+
* Install docker and start docker service
96+
97+
```
98+
sudo amazon-linux-extras install docker
99+
sudo service docker start
100+
```
101+
102+
* build docker image:
103+
104+
```
105+
sudo docker build .
106+
```
107+
108+
* goto AWS ECR console and "create repository", make it private and name it
109+
110+
* Once created, press "push commands"
111+
112+
* copy the command and run it on EC2 instance, it will push the container image on record. if get "permission denied" error, please add "sudo" before "docker" in the command.
113+
114+
115+
116+
### AWS Sagemaker: Run notebook using custom kernel
117+
The instructions below follow the following tutorial:
118+
https://aws.amazon.com/blogs/machine-learning/bringing-your-own-custom-container-image-to-amazon-sagemaker-studio-notebooks/
119+
120+
* goto Sagemaker and "open sagemaker domain"
121+
* add user
122+
* Name and and select Amazonsagemaker-executionrole (dafult one)
123+
124+
* Once user is created, goto "attach image"
125+
* Select "New Image" and add image URI (copy from image repo)
126+
* Give new image name, display name, sagmaker-executionrole and add tags and attach the image
127+
* add kernel name and display name (both can be same)
128+
* Now, launch app -> Studio and it will open the Notebook dashboard.
129+
* Select python notebook and add your custom named Kernel
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
2+
import io
3+
import os
4+
import boto3
5+
from urllib.parse import urlparse
6+
from fnmatch import fnmatch
7+
from shutil import copyfile
8+
9+
10+
def _fetch_s3_file(s3_uri, save_to):
11+
12+
bucket_name, key = _split_s3_uri(s3_uri)
13+
print(f"Fetching s3 object {key} from bucket {bucket_name}")
14+
15+
client = boto3.client("s3")
16+
obj = client.get_object(
17+
Bucket=bucket_name,
18+
Key=key,
19+
)
20+
with io.FileIO(save_to, "w") as f:
21+
for i in obj["Body"]:
22+
f.write(i)
23+
24+
25+
def _save_s3_file(s3_uri, out_filename, file_to_save="/tmp/tmp"):
26+
bucket, folder = _split_s3_uri(s3_uri)
27+
out_filepath = os.path.join(folder, out_filename)
28+
print(f"Save s3 object {out_filepath} to bucket {bucket}")
29+
client = boto3.client("s3")
30+
client.upload_file(
31+
Filename=file_to_save,
32+
Bucket=bucket,
33+
Key=out_filepath
34+
)
35+
36+
37+
def _split_s3_uri(s3_uri):
38+
parsed_uri = urlparse(s3_uri)
39+
return parsed_uri.netloc, parsed_uri.path[1:]
40+
41+
42+
def find_matching_s3_keys(in_fileglob):
43+
44+
bucket_name, file_and_folder_name = _split_s3_uri(in_fileglob)
45+
folder_name = os.path.split(file_and_folder_name)[0]
46+
all_key_responses = _get_all_files_in_s3_folder(bucket_name, folder_name)
47+
matching_keys = []
48+
for key in [k["Key"] for k in all_key_responses]:
49+
if fnmatch(key, file_and_folder_name):
50+
matching_keys.append(key)
51+
return matching_keys
52+
53+
54+
def _get_all_files_in_s3_folder(bucket_name, folder_name):
55+
client = boto3.client("s3")
56+
response = client.list_objects_v2(
57+
Bucket=bucket_name,
58+
Prefix=folder_name,
59+
)
60+
all_key_responses = []
61+
if "Contents" in response:
62+
all_key_responses = response["Contents"]
63+
while response["IsTruncated"]:
64+
continuation_token = response["NextContinuationToken"]
65+
response = client.list_objects_v2(
66+
Bucket=bucket_name,
67+
Prefix=folder_name,
68+
ContinuationToken=continuation_token,
69+
)
70+
if "Contents" in response:
71+
all_key_responses += response["Contents"]
72+
return all_key_responses
73+
74+
75+
def copy_s3_files(in_fileglob, out_folder):
76+
'''
77+
This function copy files from s3 bucket to local directory.
78+
args
79+
---
80+
in_fileglob: s3 uri of flies (wild card can be used)
81+
out_folder: local path where data will be stored
82+
'''
83+
matching_keys = find_matching_s3_keys(in_fileglob)
84+
in_bucket_name = _split_s3_uri(in_fileglob)[0]
85+
out_scheme = urlparse(out_folder).scheme
86+
for key in matching_keys:
87+
new_filename = os.path.split(key)[1]
88+
temp_filename = os.path.join("/tmp", new_filename)
89+
in_s3_uri = os.path.join(f"s3://{in_bucket_name}", key)
90+
_fetch_s3_file(in_s3_uri, temp_filename)
91+
if out_scheme == "s3":
92+
_save_s3_file(
93+
out_folder,
94+
new_filename,
95+
temp_filename,
96+
)
97+
else:
98+
copyfile(
99+
temp_filename, os.path.join(out_folder, new_filename)
100+
)
101+
os.remove(temp_filename)
102+
103+
104+
def main():
105+
in_fileglob = 's3://ias-pyprecis/data/cmip5/*.nc'
106+
out_folder = '/home/h01/zmaalick/myprojs/PyPRECIS/aws-scripts'
107+
copy_s3_files(in_fileglob, out_folder)
108+
109+
110+
if __name__ == "__main__":
111+
main()

0 commit comments

Comments
 (0)