Skip to content

Commit 1abb4c1

Browse files
Merge pull request #1 from geospatial-jeff/dev
v0.2
2 parents 73d16fd + 09d7351 commit 1abb4c1

File tree

10 files changed

+332
-36
lines changed

10 files changed

+332
-36
lines changed

README.md

+37-11
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ Use the [stac-updater CLI](stac_updater/cli.py) to build and deploy your service
1717
stac-updater new-service
1818
1919
# Build AWS resources to update collection
20-
stac-updater update-collection --root https://stac.com/landsat-8-l1/catalog.json
20+
stac-updater update-collection --root https://stac.com/landsat-8-l1/catalog.json \
21+
--path {landsat:path}/{landsat:row} \
22+
--row {date}/{id}
2123
2224
# Modify kickoff event source to s3:ObjectCreated
2325
stac-updater modify-kickoff --type s3 --bucket_name stac-updater-kickoff
@@ -26,20 +28,14 @@ stac-updater modify-kickoff --type s3 --bucket_name stac-updater-kickoff
2628
stac-updater deploy
2729
```
2830

29-
Once deployed, any STAC Item uploaded to the `stac-updater-kickoff` bucket will be ingested by the service and added to the `https://stac.com/landsat-8-l1/catalog.json` collection. Regardless of event source, the service expects the following JSON payload:
30-
31-
| Field Name | Type | Description | Example |
32-
| ---------- | ----- | ----------- | ------- |
33-
| stac_item | dict | **REQUIRED.** [STAC Item](https://github.com/radiantearth/stac-spec/tree/master/item-spec) to ingest into collection. | [link](https://github.com/radiantearth/stac-spec/blob/dev/item-spec/examples/sample-full.json) |
34-
| path | str | String pattern indicating subcatalogs. Used by [sat-stac](https://github.com/sat-utils/sat-stac/blob/master/tutorial-1.ipynb#Views) to automatically build sub catalogs from item properties. | '${landsat:path}/${landsat:row}' |
35-
| filename | str | String pattern indicating filename. Used by [sat-stac](https://github.com/sat-utils/sat-stac/blob/master/tutorial-1.ipynb#Views) to automatically build item filename from item properties.| '${date}/${id}' |
31+
Once deployed, any STAC Item uploaded to the `stac-updater-kickoff` bucket will be ingested by the service and added to the `https://stac.com/landsat-8-l1/catalog.json` collection. Regardless of event source, the service expects the payload to contain a [STAC Item](https://github.com/radiantearth/stac-spec/tree/master/item-spec).
3632

3733
Each call to `update-collection` tells the services to update a single collection. Updating multiple collections within a single deployment is accomplished with multiple calls to `update-collection`. When updating multiple collections, the services uses a SNS fanout pattern to distribute messages across multiple queues (1 queue per collection).
3834

3935
![abc](docs/images/update-collection.png)
4036

4137
## SNS Notifications
42-
You may additionally deploy a SNS topic which publishes messages whenever a STAC Item is succesfully uploaded to a collection.
38+
You may deploy a SNS topic which publishes messages whenever a STAC Item is succesfully uploaded to a collection.
4339

4440
```
4541
# Add SNS notification
@@ -60,8 +56,38 @@ Once deployed, end-users may subscribe to the newly created SNS topic to be noti
6056

6157
![abc](docs/images/sns-notifications.png)
6258

59+
## Logging
60+
You may pipe CloudWatch logs to a deployed instance of AWS Elasticsearch service for monitoring and visualizing with kibana.
61+
62+
```
63+
# Add ES logging
64+
stac-updater add-logging --es_host xxxxxxxxxx.region.es.amazonaws.com
65+
```
66+
67+
Logs are saved to the `stac_updater_logs_YYYYMMDD` index (a new index is created each day) with the following schema:
68+
69+
| Field Name | Type | Description | Example |
70+
| ---------- | ----- | ----------- | ------- |
71+
| id | string | Unique ID of the CloudWatch log event. | 34819275800 |
72+
| timestamp | date | Date of the lambda invocation. | June 23rd 2019, 21:25:26.649 |
73+
| BilledDuration | str | Time (ms) charged for execution. | 87 |
74+
| CollectionName | str | Name of collection. | landsat8 |
75+
| Duration | str | Runtime (ms) of the lambda function. | 442.49 |
76+
| ItemCount | number | Number of STAC Items processed by the invocation. | 4 |
77+
| ItemLinks | string array | URLs of STAC Items processed by the invocation. | ['https://stac.s3.amazonaws.com/landsat8/item.json'] |
78+
| MemorySize | number | Memory limit of lambda function. | 1024 |
79+
| MaxMemoryUsed | number | Maximum memory (MB) consumed by the lambda function. | 87 |
80+
| RequestId | str | Unique request ID of the lambda invocation. | 87 |
81+
82+
The following image is a kibana time-series visualization showing number of lambda invocations binned into 15 second intervals after 200 STAC Items were pushed into the queue. Notice how lambda scales up to handle the initial burst of messages.
83+
84+
![es-logging-1](docs/images/es-logging-invokes.png)
85+
86+
It took 86 total invocations to process the 200 STAC Items.
87+
88+
![es-logging-2](docs/images/es-logging-summary.png)
89+
90+
6391

6492
# TODOS
6593
- Add support for dynamic catalogs ([sat-api](https://github.com/sat-utils/sat-api), [staccato](https://github.com/boundlessgeo/staccato)).
66-
- Add aggregator service for metrics/logging etc on batch jobs.
67-
- Add SNS event source.

docs/images/es-logging-invokes.png

27.2 KB
Loading

docs/images/es-logging-summary.png

9.85 KB
Loading

requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
click==7.0
2+
elasticsearch>=6.0.0,<7.0.0
3+
requests-aws4auth==0.9
24
sat-stac==0.1.3
35
PyYAML==5.1.1

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
requirements = [line.rstrip() for line in reqs]
55

66
setup(name="STAC Updater",
7-
version='0.1',
7+
version='0.2',
88
author='Jeff Albrecht',
99
author_email='geospatialjeff@gmail.com',
1010
packages=find_packages(exclude=['package']),

stac_updater/cli.py

+45-4
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,24 @@ def new_service():
2424
@click.option('--root', '-r', type=str, required=True, help="URL of collection.")
2525
@click.option('--long-poll/--short-poll', default=False, help="Enable long polling.")
2626
@click.option('--concurrency', type=int, default=1, help="Sets lambda concurrency limit when polling the queue.")
27-
def update_collection(root, long_poll, concurrency):
27+
@click.option('--path', type=str, help="Pattern used by sat-stac to build sub-catalogs.")
28+
@click.option('--filename', type=str, help="Pattern used by sat-stac to build item name.")
29+
def update_collection(root, long_poll, concurrency, path, filename):
2830
# Create a SQS queue for the collection
2931
# Subscribe SQS queue to SNS topic with filter policy on collection name
3032
# Configure lambda function and attach to SQS queue (use ENV variables to pass state)
3133

3234
name = Collection.open(root).id
3335
filter_rule = {'collection': [name]}
3436

37+
pattern = re.compile('[\W_]+')
38+
name = pattern.sub('', name)
39+
3540
with open(sls_config_path, 'r') as f:
3641
# Using unsafe load to preserve type.
3742
sls_config = yaml.unsafe_load(f)
3843

39-
aws_resources = resources.update_collection(name, root, filter_rule, long_poll, concurrency)
44+
aws_resources = resources.update_collection(name, root, filter_rule, long_poll, concurrency, path, filename)
4045
sls_config['resources']['Resources'].update(aws_resources['resources'])
4146
sls_config['functions'].update(aws_resources['functions'])
4247

@@ -45,14 +50,17 @@ def update_collection(root, long_poll, concurrency):
4550

4651
@stac_updater.command(name='modify-kickoff', short_help="modify event source of kickoff")
4752
@click.option('--type', '-t', type=str, default='lambda', help="Type of event source used by kickoff.")
48-
@click.option('--bucket_name', '-n', type=str, help="Required if type=='s3'; creates new bucket used by event source.")
49-
def modify_kickoff(type, bucket_name):
53+
@click.option('--bucket_name', type=str, help="Required if type=='s3'; defines name of bucket used by event source.")
54+
@click.option('--topic_name', type=str, help="Required if type=='sns'; defines name of SNS topic used by event source.")
55+
def modify_kickoff(type, bucket_name, topic_name):
5056
func_name = 'kickoff'
5157

5258
if type == 's3':
5359
kickoff_func = resources.lambda_s3_trigger(func_name, bucket_name)
5460
elif type == 'lambda':
5561
kickoff_func = resources.lambda_invoke(func_name)
62+
elif type == 'sns':
63+
kickoff_func = resources.lambda_sns_trigger(func_name, topic_name)
5664
else:
5765
raise ValueError("The `type` parameter must be one of ['s3', 'lambda'].")
5866

@@ -64,6 +72,9 @@ def modify_kickoff(type, bucket_name):
6472
sls_config = yaml.unsafe_load(f)
6573
sls_config['functions']['kickoff'].update(kickoff_func)
6674

75+
if type == 'lambda' and 'events' in sls_config['functions']['kickoff']:
76+
del(sls_config['functions']['kickoff']['events'])
77+
6778
with open(sls_config_path, 'w') as outf:
6879
yaml.dump(sls_config, outf, indent=1)
6980

@@ -87,6 +98,36 @@ def add_notifications(topic_name):
8798
with open(sls_config_path, 'w') as outf:
8899
yaml.dump(sls_config, outf, indent=1)
89100

101+
@stac_updater.command(name='add-logging', short_help="Pipe cloudwatch logs into elasticsearch.")
102+
@click.option('--es_host', type=str, required=True, help="Domain name of elasticsearch instance.")
103+
def add_logging(es_host):
104+
# Add the ES_LOGGING lambda function (cloudwatch trigger).
105+
# Add es_domain to ES_LOGGING lambda as environment variable.
106+
# Update IAM permissions (es:*, arn:Aws:es:*)
107+
with open(sls_config_path, 'r') as f:
108+
sls_config = yaml.unsafe_load(f)
109+
110+
# Create lambda function
111+
service_name = sls_config['custom']['service-name']
112+
service_stage = sls_config['custom']['stage']
113+
collection_names = [x.split('_')[0] for x in list(sls_config['functions']) if x not in ['kickoff', 'es_log_ingest']]
114+
func = resources.lambda_cloudwatch_trigger("es_log_ingest", service_name, service_stage, collection_names)
115+
func.update({'environment': {'ES_HOST': es_host}})
116+
sls_config['functions'].update({'es_log_ingest': func})
117+
118+
# Expanding IAM role
119+
if 'es:*' not in sls_config['provider']['iamRoleStatements'][0]['Action']:
120+
sls_config['provider']['iamRoleStatements'][0]['Action'].append('es:*')
121+
if 'arn:aws:es:*' not in sls_config['provider']['iamRoleStatements'][0]['Resource']:
122+
sls_config['provider']['iamRoleStatements'][0]['Resource'].append('arn:aws:ecs:*')
123+
124+
with open(sls_config_path, 'w') as outf:
125+
yaml.dump(sls_config, outf, indent=1)
126+
127+
128+
129+
130+
90131
@stac_updater.command(name='deploy', short_help="deploy service to aws")
91132
def deploy():
92133
subprocess.call("docker build . -t stac-updater:latest", shell=True)

stac_updater/handler.py

+44-14
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
import os
22
import json
3+
import base64
4+
import gzip
35

4-
from satstac import Collection, Item
56
import boto3
7+
from satstac import Collection, Item
68

79
from stac_updater import utils
810

@@ -13,7 +15,6 @@
1315
REGION = os.getenv('REGION')
1416
NOTIFICATION_TOPIC = os.getenv('NOTIFICATION_TOPIC')
1517

16-
1718
def kickoff(event, context):
1819
event_source = os.getenv('EVENT_SOURCE')
1920

@@ -24,18 +25,22 @@ def kickoff(event, context):
2425
content_object = s3_res.Object(bucket, key)
2526
file_content = content_object.get()['Body'].read().decode('utf-8')
2627
payload = json.loads(file_content)
28+
elif event_source == "sns":
29+
payload = json.loads(event['Records'][0]['Sns']['Message'])
2730
else:
2831
# Default is lambda
2932
payload = event
3033

34+
print(payload)
35+
3136
try:
32-
coll_name = payload['stac_item']['properties']['collection']
37+
coll_name = payload['properties']['collection']
3338
except KeyError:
34-
coll_name = payload['stac_item']['collection']
39+
coll_name = payload['collection']
3540

3641
sns_client.publish(
3742
TopicArn=f"arn:aws:sns:{REGION}:{ACCOUNT_ID}:newStacItemTopic",
38-
Message=json.dumps(event),
43+
Message=json.dumps(payload),
3944
MessageAttributes={
4045
'collection': {
4146
'DataType': 'String',
@@ -44,26 +49,51 @@ def kickoff(event, context):
4449
}
4550
)
4651

47-
4852
def update_collection(event, context):
4953
collection_root = os.getenv('COLLECTION_ROOT')
54+
path = os.getenv('PATH')
55+
filename = os.getenv('FILENAME')
56+
57+
item_count = len(event['Records'])
58+
stac_links = []
5059

5160
for record in event['Records']:
52-
message = json.loads(record['body'])
61+
stac_item = json.loads(record['body'])
62+
63+
print(stac_item)
5364

5465
col = Collection.open(collection_root)
55-
kwargs = {'item': Item(message['stac_item'])}
56-
if 'path' in message:
57-
kwargs.update({'path': message['path']})
58-
if 'filename' in message:
59-
kwargs.update({'filename': message['filename']})
66+
collection_name = col.id
67+
kwargs = {'item': Item(stac_item)}
68+
if path:
69+
kwargs.update({'path': '$' + '/$'.join(path.split('/'))})
70+
if filename:
71+
kwargs.update({'filename': '$' + '/$'.join(filename.split('/'))})
72+
print(kwargs)
6073
col.add_item(**kwargs)
6174
col.save()
6275

76+
stac_links.append(kwargs['item'].links('self')[0])
77+
6378
# Send message to SNS Topic if enabled
6479
if NOTIFICATION_TOPIC:
65-
kwargs = utils.stac_to_sns(message['stac_item'])
80+
kwargs = utils.stac_to_sns(kwargs['item'].data)
6681
kwargs.update({
6782
'TopicArn': f"arn:aws:sns:{REGION}:{ACCOUNT_ID}:{NOTIFICATION_TOPIC}"
6883
})
69-
sns_client.publish(**kwargs)
84+
sns_client.publish(**kwargs)
85+
86+
87+
print(f"LOGS CollectionName: {collection_name}\tItemCount: {item_count}\tItemLinks: {stac_links}")
88+
89+
90+
def es_log_ingest(event, context):
91+
from stac_updater import logging
92+
93+
cw_data = event['awslogs']['data']
94+
compressed_payload = base64.b64decode(cw_data)
95+
uncompressed_payload = gzip.decompress(compressed_payload)
96+
payload = json.loads(uncompressed_payload)
97+
98+
# Index to ES
99+
logging.index_logs(payload)

0 commit comments

Comments
 (0)