You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 30, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+56-36
Original file line number
Diff line number
Diff line change
@@ -5,96 +5,116 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [1.7.0] - 2022-02-14
9
+
10
+
### Added
11
+
12
+
- The capability to ingest custom data by uploading files as JSON, XLSX, or CSV files
13
+
14
+
### Updated
15
+
16
+
- Use Amazon Kinesis Data Firehose partition projection to store and partition the data by source date (instead of system processing date)
17
+
- Use Amazon Athena dynamic partitioning features to run SQL queries on data stored in S3 bucket
18
+
- AWS CDK version 1.137.0
19
+
- AWS SDK version 2.1067.0
20
+
21
+
### Removed
22
+
23
+
- Creating of AWS Glue partitions (replaced with Amazon Athena dynamic partitions)
24
+
8
25
## [1.6.1] - 2021-10-26
26
+
9
27
### Fixed
10
-
- GitHub [issue #42](https://github.com/aws-solutions/discovering-hot-topics-using-machine-learning/issues/42). To fix the issue, RSS feed ingestion lambda function and SQLs related to the Amazon QuickSight dashboard were updated.
28
+
29
+
- GitHub [issue #42](https://github.com/aws-solutions/discovering-hot-topics-using-machine-learning/issues/42). To fix the issue, RSS feed ingestion lambda function and SQLs related to the Amazon QuickSight dashboard were updated.
11
30
12
31
### Updated
13
-
- AWS CDK version to 1.125.0
14
-
- AWS SDK version to 2.1008.0
32
+
33
+
- AWS CDK version to 1.125.0
34
+
- AWS SDK version to 2.1008.0
15
35
16
36
## [1.6.0] - 2021-09-27
17
37
18
38
### Added
19
39
20
-
-Capability to ingest YouTube comments
40
+
- Capability to ingest YouTube comments
21
41
22
42
### Updated
23
43
24
-
-AWS CDK version to 1.121.0
25
-
-AWS SDK version to 2.991.0
26
-
-Updated Amazon QuickSight analysis and dashboard to reflect the new ingestion source
44
+
- AWS CDK version to 1.121.0
45
+
- AWS SDK version to 2.991.0
46
+
- Updated Amazon QuickSight analysis and dashboard to reflect the new ingestion source
27
47
28
48
## [1.5.0] - 2021-07-22
29
49
30
50
### Added
31
51
32
-
-Ingest RSS feeds from over ~3000+ news websites across the world
52
+
- Ingest RSS feeds from over ~3000+ news websites across the world
33
53
34
54
### Updated
35
55
36
-
-AWS CDK version to 1.110.1
37
-
-AWS SDK version to 2.945.0
38
-
-Updated Nodejs Lambda runtimes to use Nodejs 14.x
39
-
-Updated Amazon QuickSight analysis and dashboard to reflect the new ingestion source
40
-
-Updated AWS StepFunction workflows to handle parallel ingestion (tweets from Twitter and RSS feeds from News websites)
56
+
- AWS CDK version to 1.110.1
57
+
- AWS SDK version to 2.945.0
58
+
- Updated Nodejs Lambda runtimes to use Nodejs 14.x
59
+
- Updated Amazon QuickSight analysis and dashboard to reflect the new ingestion source
60
+
- Updated AWS StepFunction workflows to handle parallel ingestion (tweets from Twitter and RSS feeds from News websites)
41
61
42
62
### Fixed
43
63
44
-
-Truncated tweets through merging [GitHub pull request #26](https://github.com/awslabs/discovering-hot-topics-using-machine-learning/pull/26)
64
+
- Truncated tweets through merging [GitHub pull request #26](https://github.com/awslabs/discovering-hot-topics-using-machine-learning/pull/26)
45
65
46
66
## [1.4.0] - 2021-02-04
47
67
48
68
### Added
49
69
50
-
-Capability to use geo coordinates when invoking the Twitter API to filter tweets returned by its Search API
51
-
-New visuals and sheets (tabs) on Amazon QuickSight to perform analysis using geo coordinates (when available with tweets)
52
-
-Additional remediation to handle throttling conditions from Twitter v1.1 API calls and push additional information to Amazon CloudWatch Logs that can be used to create alarms or notifications using CloudWatch Metric Filters
70
+
- Capability to use geo coordinates when invoking the Twitter API to filter tweets returned by its Search API
71
+
- New visuals and sheets (tabs) on Amazon QuickSight to perform analysis using geo coordinates (when available with tweets)
72
+
- Additional remediation to handle throttling conditions from Twitter v1.1 API calls and push additional information to Amazon CloudWatch Logs that can be used to create alarms or notifications using CloudWatch Metric Filters
53
73
54
74
### Updated
55
75
56
-
-Switched to AWS Managed KMS keys for AWS Glue Security Configuration
57
-
-AWS CDK version to 1.83.0
58
-
-AWS SDK version to 2.828.0
76
+
- Switched to AWS Managed KMS keys for AWS Glue Security Configuration
77
+
- AWS CDK version to 1.83.0
78
+
- AWS SDK version to 2.828.0
59
79
60
80
## [1.3.0] - 2020-11-24
61
81
62
82
### Changed
63
83
64
-
-Implementation to refactor and to reuse the following architecture patterns from [AWS Solutions Constructs](https://aws.amazon.com/solutions/constructs/)
65
-
- aws-kinesisfirehose-s3
66
-
- aws-kinesisstreams-lambda
67
-
- aws-lambda-step-function
84
+
- Implementation to refactor and to reuse the following architecture patterns from [AWS Solutions Constructs](https://aws.amazon.com/solutions/constructs/)
85
+
- aws-kinesisfirehose-s3
86
+
- aws-kinesisstreams-lambda
87
+
- aws-lambda-step-function
68
88
69
89
### Updated
70
90
71
-
-The join condition for Topic Modeling in Amazon QuickSight dataset to provide accurate topic identification for a specific run
72
-
-ID and name generation for Amazon QuickSigh resource to use dynamic value based on the stack name
73
-
-AWS CDK version to 1.73.0
74
-
-AWS SDK version to 2.790.0
91
+
- The join condition for Topic Modeling in Amazon QuickSight dataset to provide accurate topic identification for a specific run
92
+
- ID and name generation for Amazon QuickSigh resource to use dynamic value based on the stack name
93
+
- AWS CDK version to 1.73.0
94
+
- AWS SDK version to 2.790.0
75
95
76
96
## [1.2.0] - 2020-10-29
77
97
78
98
### Added
79
99
80
-
-New and simplified interactive Amazon QuickSight dashboard that is now automatically generated through an AWS CloudFormation deployment and that customers can extend to suit their business case
100
+
- New and simplified interactive Amazon QuickSight dashboard that is now automatically generated through an AWS CloudFormation deployment and that customers can extend to suit their business case
81
101
82
102
### Updated
83
103
84
-
-Updated to AWS CDK v1.69.0
85
-
-Consolidate Amazon S3 access Log bucket across the solution. All access log files have a prefix that corresponds to the bucket for which they are generated
104
+
- Updated to AWS CDK v1.69.0
105
+
- Consolidate Amazon S3 access Log bucket across the solution. All access log files have a prefix that corresponds to the bucket for which they are generated
86
106
87
107
## [1.1.0] - 2020-09-29
88
108
89
109
### Updated
90
110
91
-
-S3 storage for inference outputs to use Apache Parquet
92
-
-Add partitioning to AWS Glue tables
93
-
-Update to AWS CDK v1.63.0
94
-
-Update to AWS SDK v2.755.0
111
+
- S3 storage for inference outputs to use Apache Parquet
Copy file name to clipboardExpand all lines: README.md
+29-27
Original file line number
Diff line number
Diff line change
@@ -6,21 +6,21 @@ The solution automates digital asset (text and image) ingestion from twitter, RS
6
6
7
7
The solution performs the following key features:
8
8
9
-
-**Performs topic modeling to detect dominant topics**: identifies the terms that collectively form a topic from within customer feedback
10
-
-**Identifies the sentiment of what customers are saying**: uses contextual semantic search to understand the nature of online discussions
11
-
-**Determines if images associated with your brand contain unsafe content**: detects unsafe and negative imagery in content
12
-
-**Helps customers identify insights in near real-time**: you can use a visualization dashboard to better understand context, threats, and opportunities almost instantly
9
+
-**Performs topic modeling to detect dominant topics**: identifies the terms that collectively form a topic from within customer feedback
10
+
-**Identifies the sentiment of what customers are saying**: uses contextual semantic search to understand the nature of online discussions
11
+
-**Determines if images associated with your brand contain unsafe content**: detects unsafe and negative imagery in content
12
+
-**Helps customers identify insights in near real-time**: you can use a visualization dashboard to better understand context, threats, and opportunities almost instantly
13
13
14
14
This solution deploys an AWS CloudFormation template that supports Twitter, RSS feeds, and YouTube comments as data source options for ingestion, but the solution can be customized to aggregate other social media platforms and internal enterprise systems.
15
15
16
16
For a detailed solution deployment guide, refer to [Discovering Hot Topics using Machine Learning](https://aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning)
17
17
18
18
## On this Page
19
19
20
-
-[Architecture Overview](#architecture-overview)
21
-
-[Deployment](#deployment)
22
-
-[Source Code](#source-code)
23
-
-[Creating a custom build](#creating-a-custom-build)
20
+
-[Architecture Overview](#architecture-overview)
21
+
-[Deployment](#deployment)
22
+
-[Source Code](#source-code)
23
+
-[Creating a custom build](#creating-a-custom-build)
24
24
25
25
## Architecture Overview
26
26
@@ -54,13 +54,13 @@ After you deploy the solution, use the included Amazon QuickSight dashboard to v
54
54
55
55
[AWS CDK Solutions Constructs](https://aws.amazon.com/solutions/constructs/) make it easier to consistently create well-architected applications. All AWS Solutions Constructs are reviewed by AWS and use best practices established by the AWS Well-Architected Framework. This solution uses the following AWS CDK Constructs:
56
56
57
-
-aws-events-rule-lambda
58
-
-aws-kinesisfirehose-s3
59
-
-aws-kinesisstreams-lambda
60
-
-aws-lambda-dynamodb
61
-
-aws-lambda-s3
62
-
-aws-lambda-step-function
63
-
-aws-sqs-lambda
57
+
- aws-events-rule-lambda
58
+
- aws-kinesisfirehose-s3
59
+
- aws-kinesisstreams-lambda
60
+
- aws-lambda-dynamodb
61
+
- aws-lambda-s3
62
+
- aws-lambda-step-function
63
+
- aws-sqs-lambda
64
64
65
65
## Deployment
66
66
@@ -78,12 +78,12 @@ The solution is deployed using a CloudFormation template with a lambda backed cu
78
78
├── bin [entrypoint of the CDK application]
79
79
├── lambda [folder containing source code the lambda functions]
80
80
│ ├── capture_news_feed [lambda function to ingest news feeds]
81
-
│ ├── create-partition [lambda function to create glue partitions]
82
81
│ ├── firehose_topic_proxy [lambda function to write topic analysis output to Amazon Kinesis Firehose]
83
82
│ ├── firehose-text-proxy [lambda function to write text analysis output to Amazon Kinesis Firehose]
84
-
│ ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Stream]
83
+
│ ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Streams]
84
+
│ ├── ingestion-custom [lambda function that reads files from Amazon S3 bucket and pushes data to Amazon Kinesis Data Streams]
85
85
│ ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
86
-
│ ├── ingestion-youtube [lambda function that ingests comments from YouTube videos and pushes data to Amazon Kinesis Data Stream]
86
+
│ ├── ingestion-youtube [lambda function that ingests comments from YouTube videos and pushes data to Amazon Kinesis Data Streams]
87
87
│ ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
88
88
│ ├── layers [lambda layer function library for Node and Python layers]
89
89
│ │ ├── aws-nodesdk-custom-config
@@ -106,10 +106,12 @@ The solution is deployed using a CloudFormation template with a lambda backed cu
106
106
│ ├── ingestion [CDK constructs for data ingestion]
107
107
│ ├── integration [CDK constructs for Amazon Events Bridge]
108
108
│ ├── quicksight-custom-resources [CDK construct that invokes custom resources to create Amazon QuickSight resources]
109
+
│ ├── s3-event-notification [CDK construct that configures S3 events to be pushed to Amazon EventBridge]
109
110
│ ├── storage [CDK constructs that define storage of the inference events]
110
111
│ ├── text-analysis-workflow [CDK constructs for text analysis of ingested data]
111
112
│ ├── topic-analysis-workflow [CDK constructs for topic visualization of ingested data]
112
113
│ └── visualization [CDK constructs to build a relational database model for visualization]
114
+
├── discovering-hot-topics.ts
113
115
```
114
116
115
117
## Creating a custom build
@@ -124,22 +126,22 @@ Clone this git repository
124
126
125
127
### 2. Build the solution for deployment
126
128
127
-
-To run the unit tests
129
+
- To run the unit tests
128
130
129
131
```
130
132
cd <rootDir>/source
131
133
chmod +x ./run-all-tests.sh
132
134
./run-all-tests.sh
133
135
```
134
136
135
-
-Configure the bucket name of your target Amazon S3 distribution bucket
137
+
- Configure the bucket name of your target Amazon S3 distribution bucket
136
138
137
139
```
138
140
export DIST_OUTPUT_BUCKET=my-bucket-name
139
141
export VERSION=my-version
140
142
```
141
143
142
-
-Now build the distributable:
144
+
- Now build the distributable:
143
145
144
146
```
145
147
cd <rootDir>/deployment
@@ -148,7 +150,7 @@ chmod +x ./build-s3-dist.sh
148
150
149
151
```
150
152
151
-
-Parameter details
153
+
- Parameter details
152
154
153
155
```
154
156
$DIST_OUTPUT_BUCKET - This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
@@ -158,13 +160,13 @@ $CF_TEMPLATE_BUCKET_NAME - The name of the S3 bucket where the CloudFormation te
158
160
$QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
159
161
```
160
162
161
-
-When creating and using buckets it is recommeded to:
163
+
- When creating and using buckets it is recommeded to:
162
164
163
-
- Use randomized names or uuid as part of your bucket naming strategy.
164
-
- Ensure buckets are not public.
165
-
- Verify bucket ownership prior to uploading templates or code artifacts.
165
+
- Use randomized names or uuid as part of your bucket naming strategy.
166
+
- Ensure buckets are not public.
167
+
- Verify bucket ownership prior to uploading templates or code artifacts.
166
168
167
-
-Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
169
+
- Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
0 commit comments