IntelLabs
diff --git a/‎README.md
Lines changed: 2 additions & 0 deletions b/‎README.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_test.csv
Lines changed: 2001 additions & 0 deletions b/‎datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_test.csv
Lines changed: 2001 additions & 0 deletions
diff --git a/‎datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv
Lines changed: 2001 additions & 0 deletions b/‎datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv
Lines changed: 2001 additions & 0 deletions
diff --git a/‎doc/source/absa.rst
Lines changed: 160 additions & 0 deletions b/‎doc/source/absa.rst
Lines changed: 160 additions & 0 deletions
diff --git a/‎doc/source/absa_solution.rst
Lines changed: 124 additions & 0 deletions b/‎doc/source/absa_solution.rst
Lines changed: 124 additions & 0 deletions
diff --git a/‎doc/source/assets/absa_flow.png
226 KB b/‎doc/source/assets/absa_flow.png
226 KB
diff --git a/‎doc/source/assets/absa_solution_flow.png
295 KB b/‎doc/source/assets/absa_solution_flow.png
295 KB
diff --git a/‎doc/source/assets/absa_solution_ui_1.png
66.5 KB b/‎doc/source/assets/absa_solution_ui_1.png
66.5 KB
diff --git a/‎doc/source/assets/absa_solution_ui_2.png
161 KB b/‎doc/source/assets/absa_solution_ui_2.png
161 KB
diff --git a/‎doc/source/index.rst
Lines changed: 2 additions & 0 deletions b/‎doc/source/index.rst
Lines changed: 2 additions & 0 deletions
@@ -105,6 +105,7 @@ NLP models that provide best (or near) in class performance:
 
 Natural Language Understanding (NLU) models that address semantic understanding:
 
+* [Aspect Based Sentiment Analysis (ABSA)](http://nlp_architect.nervanasys.com/absa.html)
 * [Noun phrase embedding representation (NP2Vec)](http://nlp_architect.nervanasys.com/np2vec.html)
 * [Most common word sense detection](http://nlp_architect.nervanasys.com/word_sense.html)
 * [Relation identification](http://nlp_architect.nervanasys.com/identifying_semantic_relation.html)
@@ -127,6 +128,7 @@ Solutions (End-to-end applications) using one or more models:
 
 * [Term Set expansion](http://nlp_architect.nervanasys.com/term_set_expansion.html) - uses the included word chunker as a noun phrase extractor and NP2Vec to create semantic term sets
 * [Topics and trend analysis](http://nlp_architect.nervanasys.com/trend_analysis.html) - analyzing trending phrases in temporal corpora
+* [Aspect Based Sentiment Analysis (ABSA)](http://nlp_architect.nervanasys.com/absa_solution.html)
 
 ## Documentation
 
 
@@ -0,0 +1,160 @@
+.. ---------------------------------------------------------------------------
+.. Copyright 2016-2018 Intel Corporation
+..
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+..
+..      http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. ---------------------------------------------------------------------------
+
+Aspect Based Sentiment Analysis (ABSA)
+######################################
+
+Overview
+========
+Aspect Based Sentiment Analysis is the task of co-extracting opinion terms and aspect terms
+(opinion targets) and the relations between them in a given corpus.
+
+Algorithm Overview
+==================
+Training: the training phase inputs training data and outputs an opinion lexicon and an aspect lexicon.
+the training flow consists the following three main steps:
+
+1. The first training step is text pre-processing that is performed by Spacy_. This step includes
+tokenization, part-of-speech tagging and sentence breaking.
+
+2. The second training step is to apply a dependency parser to the training
+data. for this purpose we used the parser described in [1]_.
+For more details regarding steps 1 & 2 see :doc:`BIST <spacy_bist>` dependency parser.
+
+3. The third step is based on applying a bootstrap lexicon acquisition algorithm described in [2]_,
+the algorithm uses a generic lexicon introduced by [3]_ as initial step for the bootstrap process.
+
+4. The last step includes applying an MLP based opinion term re-ranking and polarity estimation
+algorithm. This step is based on using the word embbedding similarities between each acquired term
+and a set of generic opinion terms as features. A pre-trained model is re-ranking provided.
+
+Inference: the inference phase inputs an inference data along with the opinion lexicon and aspect
+lexicon generated by the training phase. The output of the inference phase is a list aspect-opinion
+pairs (along with their polarity and score) extracted from the inference data.
+The inference approach is based on detecting syntactically related aspect-opinion pairs.
+
+
+Flow
+====
+.. image :: assets/absa_flow.png
+
+Training
+========
+Full code example is available at ``examples/absa/train.py``.
+There are two training modes:
+
+1. Providing training data in a raw text format. In this case the training flow will
+apply the dependency parser to the data:
+
+.. code:: bash
+
+    python3 examples/absa/train.py --data=TRAINING_DATASET
+
+
+Arguments:
+
+``--data=TRAINING_DATASET`` - path to the input training dataset. Should point to a single raw text file with documents
+separated by newlines or a single csv file containing one doc per line or a directory containing one raw
+text file per document.
+
+Optional arguments:
+
+``--rerank-model=RERANK_MODEL`` - path to re-rank model. By default when running the training
+for the first time this model will be downloaded to ``~/nlp-architect/cache/absa/train/reranking_model``
+
+Notes:
+
+a. The generated opinion and aspect lexicons are written as csv files to:
+``~/nlp-architect/cache/absa/train/output/generated_opinion_lex_reranked.csv`` and to ``~/nlp-architect/cache/absa/train/output/generated_aspect_lex.csv``
+
+b. In this mode the parsed data (jsons of ParsedDocument objects) is written to (``~/nlp-architect/cache/absa/train/parsed``)
+
+c. When running the training for the first time the system will download
+glove word embbedding model (the user will be prompt for authorization) to
+``~/nlp-architect/cache/absa/train/word_emb_unzipped`` (this may take a while)
+
+d. For demonstration purposes we provide a sample of tripadvisor.co.uk restaurants reviews under the
+`Creative Commons Attribution-Share-Alike 3.0 License <https://creativecommons.org/licenses/by-sa/3.0/>`__ (Copyright 2018 Wikimedia Foundation).
+The dataset can be found at ``datasets/absa/datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv``.
+``~/nlp-architect/cache/absa/train/reranking_model``
+
+2. Providing parsed training data. In this case the training flow skips the parsing step:
+
+.. code:: bash
+
+    python3 examples/absa/train.py --parsed-data=PARSED_TRAINING_DATASET
+
+Arguments:
+
+``--parsed-data=PARSED_TRAINING_DATASET``   - path to the parsed format (jsons of ParsedDocument objects) of the training dataset.
+
+Inference
+=========
+Full code example is available at ``examples/absa/inference/inference.py``.
+There are two inference modes:
+
+1. Providing inference data in a raw text format.
+
+.. code:: python
+
+    inference = SentimentInference(ASPECT_LEX, OPINION_LEX)
+    sentiment_doc = inference.run(doc="The food was wonderful and fresh. Staff were friendly.")
+
+Arguments:
+
+``ASPECT_LEX``  - path to aspect lexicon (csv file) that was produced by the training phase.
+aspect.csv may be manually edited for grouping alias aspect names (e.g. 'drinks' and 'beverages')
+together. Simply copy all alias names to the same line in the csv file.
+
+``OPINION_LEX`` - path to opinion lexicon (csv file) that was produced by the training phase.
+
+``doc`` - input sentence.
+
+2. Providing parsed inference data (ParsedDocument format). In this case the parsing step is skipped:
+
+.. code:: python
+
+    inference = SentimentInference(ASPECT_LEX, OPINION_LEX, parse=False)
+    doc_parsed = json.load(open('/path/to/parsed_doc.json'), object_hook=CoreNLPDoc.decoder)
+    sentiment_doc = inference.run(parsed_doc=doc_parsed)
+
+
+Inference - interactive mode
+============================
+
+The provided file ``examples/absa/inference/interactive.py`` enables using generated lexicons in interactive mode:
+
+
+.. code:: bash
+
+    python3 interactive.py --aspects=ASPECT_LEX --opinions=OPINION_LEX
+
+
+Arguments:
+
+``--aspects=ASPECT_LEX``    - path to aspect lexicon (csv file format)
+
+``--opinions=OPINION_LEX``  - path to opinion lexicon (csv file format)
+
+
+References
+==========
+
+.. [1] `Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations <https://transacl.org/ojs/index.php/tacl/article/view/885/198>`__, Kiperwasser, E., & Goldberg, Y, Transactions Of The Association For Computational Linguistics (2106), 4, 313-327.
+.. [2] `Opinion word expansion and target extraction through double propagation <https://dl.acm.org/citation.cfm?id=1970422>`__, Guang  Qiu,  Bing  Liu,  Jiajun  Bu,  and  Chun  Chen, In Computational Linguistics, volume 37(1).
+.. [3] `Mining and Summarizing Customer Reviews <http://dx.doi.org/10.1145/1014052.1014073>`__, Minqing Hu and Bing Liu, Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp. 168-177, 2004.
+
+.. _Spacy: https://spacy.io
@@ -0,0 +1,124 @@
+.. ---------------------------------------------------------------------------
+.. Copyright 2016-2018 Intel Corporation
+..
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+..
+..      http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. ---------------------------------------------------------------------------
+
+
+Aspect Based Sentiment Analysis (ABSA) Solution
+###############################################
+
+Overview
+========
+Aspect Based Sentiment Analysis is the task of co-extracting opinion terms and aspect terms
+(opinion targets) and the relations between them in a given corpus.
+
+Solution Overview
+=================
+The input to the solution is an inference dataset along with an opinion and aspect
+lexicons. The output of the solution is a visulalization of
+a statistical analysis of sentiment towards specific aspects in the inference dataset.
+
+The solution uses the opinion and aspect lexicons that are generated using NLP Architect's ABSA
+training model (see blue blocks in 'flow' diagram below). For more details regarding the training step
+see :doc:`ABSA <absa>`
+
+The solution encapsulates the ABSA inference model (green blocks) and adds on top of it a
+statistical analysis module (grey block) for calculating the amount of positive and negative sentiment
+instances per aspect and a visualization module (grey block).
+
+Flow
+====
+.. image :: assets/absa_solution_flow.png
+
+Visualization
+=============
+
+The visualization includes two sections:
+
+1. Bar-chart that displays the amount of positive and negative sentiment towards each aspect:
+
+.. image :: assets/absa_solution_ui_1.png
+
+
+2. Textual table that displays (upon clicking a specific aspect on the bar chart) sentences that
+include positive/negative sentiment towards that specific aspect. Aspect terms are colored in
+blue and opinion terms are colored in green/red:
+
+.. image :: assets/absa_solution_ui_2.png
+
+
+Solution execution
+==================
+
+The solution execution is divided to two parts - training A and inference:
+
+Training
+========
+
+See training under :doc:`ABSA <absa>`
+
+
+Inference
+=========
+
+Full code example is available at ``examples/absa/solution/absa_solution.py``.
+There are two training modes:
+
+1. Providing solution data in a raw text format. In this case the solution flow will
+apply the dependency parser to the data:
+
+
+.. code:: python
+
+     SentimentSolution().run(aspect_lex='/path/to/aspects.csv',
+            opinion_lex='path/to/opinions.csv',
+            data='/path/to/text/file/or/directory')
+
+
+Arguments:
+
+``aspect_lex'=/path/to/aspects.csv'``  - path to aspect lexicon (csv file) that was produced by the training phase.
+aspect.csv may be manually edited for grouping alias aspect names (e.g. 'drinks' and 'beverages')
+together. Simply copy all alias names to the same line in the csv file.
+
+``opinion_lex='path/to/opinions.csv'`` - path to opinion lexicon (csv file) that was produced by the training phase.
+
+``data=/path/to/text/file/or/directory`` - path to a single raw text file with documents
+separated by newlines or a single csv file containing one doc per line or a directory containing one raw
+text file per document.
+
+Notes:
+
+a. For demonstration purposes we provide a sample of tripadvisor.co.uk
+restaurants reviews under the `Creative Commons Attribution-Share-Alike 3.0 License <https://creativecommons.org/licenses/by-sa/3.0/>`__ (Copyright 2018 Wikimedia Foundation).
+The dataset can is located at ``datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_test.csv``.
+
+b. By default, when the execution terminates, a browser window is opened, displaying the
+visualization UI.
+
+
+2. Providing parsed training data. In this case the solution flow skips the parsing step:
+
+.. code:: python
+
+    stats_dataframe = SentimentSolution().run(aspect_lex='/path/to/aspects.csv',
+                            opinion_lex='path/to/opinions.csv',
+                            parsed_data='/path/to/parsed/directory',
+                            ui=False)
+
+Note:
+
+Setting ``ui=False`` disables the UI and enables to get the sentiment statistics as dataframe. This
+enables the user to use those statistics as input to his own custom built visualization.
+
@@ -34,6 +34,7 @@
    :maxdepth: 1
    :caption: NLP/NLU Components
 
+   Aspect Based Sentiment Analysis <absa.rst>
    chunker.rst
    ner_crf.rst
    intent.rst
@@ -61,6 +62,7 @@
    :maxdepth: 1
    :caption: Solutions
 
+   Aspect Based Sentiment Analysis <absa_solution.rst>
    Set Expansion <term_set_expansion.rst>
    Trend Analysis <trend_analysis.rst>