Skip to content
This repository was archived by the owner on Nov 8, 2022. It is now read-only.

Commit bd73011

Browse files
danielkoratPeter Izsak
authored andcommitted
Sentiment Solution (#365)
* Added Aspect Based Sentiment Analysis model * Added Aspect Based Sentiment Analysis solution
1 parent 4d8cca4 commit bd73011

File tree

84 files changed

+9925
-116
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+9925
-116
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ NLP models that provide best (or near) in class performance:
105105

106106
Natural Language Understanding (NLU) models that address semantic understanding:
107107

108+
* [Aspect Based Sentiment Analysis (ABSA)](http://nlp_architect.nervanasys.com/absa.html)
108109
* [Noun phrase embedding representation (NP2Vec)](http://nlp_architect.nervanasys.com/np2vec.html)
109110
* [Most common word sense detection](http://nlp_architect.nervanasys.com/word_sense.html)
110111
* [Relation identification](http://nlp_architect.nervanasys.com/identifying_semantic_relation.html)
@@ -127,6 +128,7 @@ Solutions (End-to-end applications) using one or more models:
127128

128129
* [Term Set expansion](http://nlp_architect.nervanasys.com/term_set_expansion.html) - uses the included word chunker as a noun phrase extractor and NP2Vec to create semantic term sets
129130
* [Topics and trend analysis](http://nlp_architect.nervanasys.com/trend_analysis.html) - analyzing trending phrases in temporal corpora
131+
* [Aspect Based Sentiment Analysis (ABSA)](http://nlp_architect.nervanasys.com/absa_solution.html)
130132

131133
## Documentation
132134

datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_test.csv

Lines changed: 2001 additions & 0 deletions
Large diffs are not rendered by default.

datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv

Lines changed: 2001 additions & 0 deletions
Large diffs are not rendered by default.

doc/source/absa.rst

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
.. ---------------------------------------------------------------------------
2+
.. Copyright 2016-2018 Intel Corporation
3+
..
4+
.. Licensed under the Apache License, Version 2.0 (the "License");
5+
.. you may not use this file except in compliance with the License.
6+
.. You may obtain a copy of the License at
7+
..
8+
.. http://www.apache.org/licenses/LICENSE-2.0
9+
..
10+
.. Unless required by applicable law or agreed to in writing, software
11+
.. distributed under the License is distributed on an "AS IS" BASIS,
12+
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
.. See the License for the specific language governing permissions and
14+
.. limitations under the License.
15+
.. ---------------------------------------------------------------------------
16+
17+
Aspect Based Sentiment Analysis (ABSA)
18+
######################################
19+
20+
Overview
21+
========
22+
Aspect Based Sentiment Analysis is the task of co-extracting opinion terms and aspect terms
23+
(opinion targets) and the relations between them in a given corpus.
24+
25+
Algorithm Overview
26+
==================
27+
Training: the training phase inputs training data and outputs an opinion lexicon and an aspect lexicon.
28+
the training flow consists the following three main steps:
29+
30+
1. The first training step is text pre-processing that is performed by Spacy_. This step includes
31+
tokenization, part-of-speech tagging and sentence breaking.
32+
33+
2. The second training step is to apply a dependency parser to the training
34+
data. for this purpose we used the parser described in [1]_.
35+
For more details regarding steps 1 & 2 see :doc:`BIST <spacy_bist>` dependency parser.
36+
37+
3. The third step is based on applying a bootstrap lexicon acquisition algorithm described in [2]_,
38+
the algorithm uses a generic lexicon introduced by [3]_ as initial step for the bootstrap process.
39+
40+
4. The last step includes applying an MLP based opinion term re-ranking and polarity estimation
41+
algorithm. This step is based on using the word embbedding similarities between each acquired term
42+
and a set of generic opinion terms as features. A pre-trained model is re-ranking provided.
43+
44+
Inference: the inference phase inputs an inference data along with the opinion lexicon and aspect
45+
lexicon generated by the training phase. The output of the inference phase is a list aspect-opinion
46+
pairs (along with their polarity and score) extracted from the inference data.
47+
The inference approach is based on detecting syntactically related aspect-opinion pairs.
48+
49+
50+
Flow
51+
====
52+
.. image :: assets/absa_flow.png
53+
54+
Training
55+
========
56+
Full code example is available at ``examples/absa/train.py``.
57+
There are two training modes:
58+
59+
1. Providing training data in a raw text format. In this case the training flow will
60+
apply the dependency parser to the data:
61+
62+
.. code:: bash
63+
64+
python3 examples/absa/train.py --data=TRAINING_DATASET
65+
66+
67+
Arguments:
68+
69+
``--data=TRAINING_DATASET`` - path to the input training dataset. Should point to a single raw text file with documents
70+
separated by newlines or a single csv file containing one doc per line or a directory containing one raw
71+
text file per document.
72+
73+
Optional arguments:
74+
75+
``--rerank-model=RERANK_MODEL`` - path to re-rank model. By default when running the training
76+
for the first time this model will be downloaded to ``~/nlp-architect/cache/absa/train/reranking_model``
77+
78+
Notes:
79+
80+
a. The generated opinion and aspect lexicons are written as csv files to:
81+
``~/nlp-architect/cache/absa/train/output/generated_opinion_lex_reranked.csv`` and to ``~/nlp-architect/cache/absa/train/output/generated_aspect_lex.csv``
82+
83+
b. In this mode the parsed data (jsons of ParsedDocument objects) is written to (``~/nlp-architect/cache/absa/train/parsed``)
84+
85+
c. When running the training for the first time the system will download
86+
glove word embbedding model (the user will be prompt for authorization) to
87+
``~/nlp-architect/cache/absa/train/word_emb_unzipped`` (this may take a while)
88+
89+
d. For demonstration purposes we provide a sample of tripadvisor.co.uk restaurants reviews under the
90+
`Creative Commons Attribution-Share-Alike 3.0 License <https://creativecommons.org/licenses/by-sa/3.0/>`__ (Copyright 2018 Wikimedia Foundation).
91+
The dataset can be found at ``datasets/absa/datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv``.
92+
``~/nlp-architect/cache/absa/train/reranking_model``
93+
94+
2. Providing parsed training data. In this case the training flow skips the parsing step:
95+
96+
.. code:: bash
97+
98+
python3 examples/absa/train.py --parsed-data=PARSED_TRAINING_DATASET
99+
100+
Arguments:
101+
102+
``--parsed-data=PARSED_TRAINING_DATASET`` - path to the parsed format (jsons of ParsedDocument objects) of the training dataset.
103+
104+
Inference
105+
=========
106+
Full code example is available at ``examples/absa/inference/inference.py``.
107+
There are two inference modes:
108+
109+
1. Providing inference data in a raw text format.
110+
111+
.. code:: python
112+
113+
inference = SentimentInference(ASPECT_LEX, OPINION_LEX)
114+
sentiment_doc = inference.run(doc="The food was wonderful and fresh. Staff were friendly.")
115+
116+
Arguments:
117+
118+
``ASPECT_LEX`` - path to aspect lexicon (csv file) that was produced by the training phase.
119+
aspect.csv may be manually edited for grouping alias aspect names (e.g. 'drinks' and 'beverages')
120+
together. Simply copy all alias names to the same line in the csv file.
121+
122+
``OPINION_LEX`` - path to opinion lexicon (csv file) that was produced by the training phase.
123+
124+
``doc`` - input sentence.
125+
126+
2. Providing parsed inference data (ParsedDocument format). In this case the parsing step is skipped:
127+
128+
.. code:: python
129+
130+
inference = SentimentInference(ASPECT_LEX, OPINION_LEX, parse=False)
131+
doc_parsed = json.load(open('/path/to/parsed_doc.json'), object_hook=CoreNLPDoc.decoder)
132+
sentiment_doc = inference.run(parsed_doc=doc_parsed)
133+
134+
135+
Inference - interactive mode
136+
============================
137+
138+
The provided file ``examples/absa/inference/interactive.py`` enables using generated lexicons in interactive mode:
139+
140+
141+
.. code:: bash
142+
143+
python3 interactive.py --aspects=ASPECT_LEX --opinions=OPINION_LEX
144+
145+
146+
Arguments:
147+
148+
``--aspects=ASPECT_LEX`` - path to aspect lexicon (csv file format)
149+
150+
``--opinions=OPINION_LEX`` - path to opinion lexicon (csv file format)
151+
152+
153+
References
154+
==========
155+
156+
.. [1] `Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations <https://transacl.org/ojs/index.php/tacl/article/view/885/198>`__, Kiperwasser, E., & Goldberg, Y, Transactions Of The Association For Computational Linguistics (2106), 4, 313-327.
157+
.. [2] `Opinion word expansion and target extraction through double propagation <https://dl.acm.org/citation.cfm?id=1970422>`__, Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen, In Computational Linguistics, volume 37(1).
158+
.. [3] `Mining and Summarizing Customer Reviews <http://dx.doi.org/10.1145/1014052.1014073>`__, Minqing Hu and Bing Liu, Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp. 168-177, 2004.
159+
160+
.. _Spacy: https://spacy.io

doc/source/absa_solution.rst

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
.. ---------------------------------------------------------------------------
2+
.. Copyright 2016-2018 Intel Corporation
3+
..
4+
.. Licensed under the Apache License, Version 2.0 (the "License");
5+
.. you may not use this file except in compliance with the License.
6+
.. You may obtain a copy of the License at
7+
..
8+
.. http://www.apache.org/licenses/LICENSE-2.0
9+
..
10+
.. Unless required by applicable law or agreed to in writing, software
11+
.. distributed under the License is distributed on an "AS IS" BASIS,
12+
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
.. See the License for the specific language governing permissions and
14+
.. limitations under the License.
15+
.. ---------------------------------------------------------------------------
16+
17+
18+
Aspect Based Sentiment Analysis (ABSA) Solution
19+
###############################################
20+
21+
Overview
22+
========
23+
Aspect Based Sentiment Analysis is the task of co-extracting opinion terms and aspect terms
24+
(opinion targets) and the relations between them in a given corpus.
25+
26+
Solution Overview
27+
=================
28+
The input to the solution is an inference dataset along with an opinion and aspect
29+
lexicons. The output of the solution is a visulalization of
30+
a statistical analysis of sentiment towards specific aspects in the inference dataset.
31+
32+
The solution uses the opinion and aspect lexicons that are generated using NLP Architect's ABSA
33+
training model (see blue blocks in 'flow' diagram below). For more details regarding the training step
34+
see :doc:`ABSA <absa>`
35+
36+
The solution encapsulates the ABSA inference model (green blocks) and adds on top of it a
37+
statistical analysis module (grey block) for calculating the amount of positive and negative sentiment
38+
instances per aspect and a visualization module (grey block).
39+
40+
Flow
41+
====
42+
.. image :: assets/absa_solution_flow.png
43+
44+
Visualization
45+
=============
46+
47+
The visualization includes two sections:
48+
49+
1. Bar-chart that displays the amount of positive and negative sentiment towards each aspect:
50+
51+
.. image :: assets/absa_solution_ui_1.png
52+
53+
54+
2. Textual table that displays (upon clicking a specific aspect on the bar chart) sentences that
55+
include positive/negative sentiment towards that specific aspect. Aspect terms are colored in
56+
blue and opinion terms are colored in green/red:
57+
58+
.. image :: assets/absa_solution_ui_2.png
59+
60+
61+
Solution execution
62+
==================
63+
64+
The solution execution is divided to two parts - training A and inference:
65+
66+
Training
67+
========
68+
69+
See training under :doc:`ABSA <absa>`
70+
71+
72+
Inference
73+
=========
74+
75+
Full code example is available at ``examples/absa/solution/absa_solution.py``.
76+
There are two training modes:
77+
78+
1. Providing solution data in a raw text format. In this case the solution flow will
79+
apply the dependency parser to the data:
80+
81+
82+
.. code:: python
83+
84+
SentimentSolution().run(aspect_lex='/path/to/aspects.csv',
85+
opinion_lex='path/to/opinions.csv',
86+
data='/path/to/text/file/or/directory')
87+
88+
89+
Arguments:
90+
91+
``aspect_lex'=/path/to/aspects.csv'`` - path to aspect lexicon (csv file) that was produced by the training phase.
92+
aspect.csv may be manually edited for grouping alias aspect names (e.g. 'drinks' and 'beverages')
93+
together. Simply copy all alias names to the same line in the csv file.
94+
95+
``opinion_lex='path/to/opinions.csv'`` - path to opinion lexicon (csv file) that was produced by the training phase.
96+
97+
``data=/path/to/text/file/or/directory`` - path to a single raw text file with documents
98+
separated by newlines or a single csv file containing one doc per line or a directory containing one raw
99+
text file per document.
100+
101+
Notes:
102+
103+
a. For demonstration purposes we provide a sample of tripadvisor.co.uk
104+
restaurants reviews under the `Creative Commons Attribution-Share-Alike 3.0 License <https://creativecommons.org/licenses/by-sa/3.0/>`__ (Copyright 2018 Wikimedia Foundation).
105+
The dataset can is located at ``datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_test.csv``.
106+
107+
b. By default, when the execution terminates, a browser window is opened, displaying the
108+
visualization UI.
109+
110+
111+
2. Providing parsed training data. In this case the solution flow skips the parsing step:
112+
113+
.. code:: python
114+
115+
stats_dataframe = SentimentSolution().run(aspect_lex='/path/to/aspects.csv',
116+
opinion_lex='path/to/opinions.csv',
117+
parsed_data='/path/to/parsed/directory',
118+
ui=False)
119+
120+
Note:
121+
122+
Setting ``ui=False`` disables the UI and enables to get the sentiment statistics as dataframe. This
123+
enables the user to use those statistics as input to his own custom built visualization.
124+

doc/source/assets/absa_flow.png

226 KB
Loading
295 KB
Loading
66.5 KB
Loading
161 KB
Loading

doc/source/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
:maxdepth: 1
3535
:caption: NLP/NLU Components
3636

37+
Aspect Based Sentiment Analysis <absa.rst>
3738
chunker.rst
3839
ner_crf.rst
3940
intent.rst
@@ -61,6 +62,7 @@
6162
:maxdepth: 1
6263
:caption: Solutions
6364

65+
Aspect Based Sentiment Analysis <absa_solution.rst>
6466
Set Expansion <term_set_expansion.rst>
6567
Trend Analysis <trend_analysis.rst>
6668

0 commit comments

Comments
 (0)