|
38 | 38 | "metadata": {},
|
39 | 39 | "source": [
|
40 | 40 | "## Loading and Preparing the Dataset\n",
|
41 |
| - "We will use an open source dataset consisting of approx. 28000 customer reviews for a clothing store. The dataset is available at [Shopper Sentiments](https://www.kaggle.com/datasets/nelgiriyewithana/shoppersentiments).\n", |
| 41 | + "We will use an open dataset consisting of approx. 28000 customer reviews for a clothing store. The dataset is available at [Shopper Sentiments](https://www.kaggle.com/datasets/nelgiriyewithana/shoppersentiments).\n", |
42 | 42 | "\n",
|
43 | 43 | "We will load the dataset and convert it into a JSON format that can be used by Haystack.\n"
|
44 | 44 | ]
|
|
122 | 122 | "source": [
|
123 | 123 | "## Setting up Azure AI Search and Indexing Pipeline\n",
|
124 | 124 | "\n",
|
125 |
| - "We set up indexing pipeline with `AzureAISearchDocumentStore` by following these steps:\n", |
| 125 | + "We set up an indexing pipeline with `AzureAISearchDocumentStore` by following these steps:\n", |
126 | 126 | "1. Configure semantic search for the index\n",
|
127 | 127 | "2. Initialize the document store with custom metadata fields and semantic search configuration\n",
|
128 | 128 | "3. Create an indexing pipeline that:\n",
|
|
187 | 187 | "\n",
|
188 | 188 | "# Indexing Pipeline\n",
|
189 | 189 | "indexing_pipeline = Pipeline()\n",
|
190 |
| - "indexing_pipeline.add_component(\"document_embedder\", AzureOpenAIDocumentEmbedder())\n", |
| 190 | + "indexing_pipeline.add_component(AzureOpenAIDocumentEmbedder(), name=\"document_embedder\")\n", |
191 | 191 | "indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name=\"doc_writer\")\n",
|
192 | 192 | "indexing_pipeline.connect(\"document_embedder\", \"doc_writer\")\n",
|
193 | 193 | "\n",
|
|
202 | 202 | "\n",
|
203 | 203 | "Here we set up the query pipeline that will retrieve relevant reviews based on user queries. The pipeline consists of:\n",
|
204 | 204 | "\n",
|
205 |
| - "1. A text embedder (`AzureOpenAITextEmbedder`) that converts user queries into vector embeddings\n", |
| 205 | + "1. A text embedder (`AzureOpenAITextEmbedder`) that converts user queries into embeddings.\n", |
206 | 206 | "2. A hybrid retriever (`AzureAISearchHybridRetriever`) that uses vector and semantic search to retrieve the most relevant reviews.\n"
|
207 | 207 | ]
|
208 | 208 | },
|
|
303 | 303 | "import numpy as np\n",
|
304 | 304 | "\n",
|
305 | 305 | "\n",
|
306 |
| - "def plot_sentiment_distribution(topics):\n", |
307 |
| - " # Create DataFrame from topics data\n", |
| 306 | + "def plot_sentiment_distribution(aspects):\n", |
| 307 | + " # Create DataFrame from aspects data\n", |
308 | 308 | " data = [(topic, review['sentiment']['analyzer_rating'], \n",
|
309 | 309 | " review['review']['rating'], review['sentiment']['label'])\n",
|
310 |
| - " for topic, reviews in topics.items()\n", |
| 310 | + " for topic, reviews in aspects.items()\n", |
311 | 311 | " for review in reviews]\n",
|
312 | 312 | " \n",
|
313 | 313 | " df = pd.DataFrame(data, columns=['Topic', 'Normalized Score', 'Original Rating', 'Sentiment'])\n",
|
|
367 | 367 | "\n",
|
368 | 368 | "Create a tool to perform aspect-based sentiment analysis on customer reviews using the VADER sentiment analyzer. It involves:\n",
|
369 | 369 | "\n",
|
370 |
| - "- Identifying specific topics within reviews (e.g., product quality, shipping, customer service, pricing) using predefined keywords\n", |
371 |
| - "- Calculating sentiment scores for each review mentioning these topics\n", |
| 370 | + "- Identifying specific aspects within reviews (e.g., product quality, shipping, customer service, pricing) using predefined keywords\n", |
| 371 | + "- Calculating sentiment scores for each review mentioning these aspects\n", |
372 | 372 | "- Categorizing sentiment as 'positive', 'negative', or 'neutral' \n",
|
373 | 373 | "- Normalizing sentiment scores to a scale of 1 to 5 for comparison with customer ratings\n"
|
374 | 374 | ]
|
|
394 | 394 | " sentiment scores using VADER and categorizes the sentiment as 'positive', 'negative', or 'neutral'.\n",
|
395 | 395 | " \n",
|
396 | 396 | " \"\"\"\n",
|
397 |
| - " topics = {\n", |
| 397 | + " aspects = {\n", |
398 | 398 | " \"product_quality\": [],\n",
|
399 | 399 | " \"shipping\": [],\n",
|
400 | 400 | " \"customer_service\": [],\n",
|
|
432 | 432 | " sentiment_label = 'neutral'\n",
|
433 | 433 | " \n",
|
434 | 434 | " # Append the review along with its sentiment analysis result\n",
|
435 |
| - " topics[topic].append({\n", |
| 435 | + " aspects[topic].append({\n", |
436 | 436 | " \"review\": review,\n",
|
437 | 437 | " \"sentiment\": {\n",
|
438 | 438 | " \"analyzer_rating\": normalized_score,\n",
|
439 | 439 | " \"label\": sentiment_label\n",
|
440 | 440 | " }\n",
|
441 | 441 | " })\n",
|
442 |
| - " plot_sentiment_distribution(topics)\n", |
| 442 | + " plot_sentiment_distribution(aspects)\n", |
443 | 443 | "\n",
|
444 | 444 | " return {\n",
|
445 | 445 | " \"total_reviews\": len(reviews),\n",
|
446 |
| - " \"sentiment_analysis\": topics,\n", |
| 446 | + " \"sentiment_analysis\": aspects,\n", |
447 | 447 | " \"average_rating\": sum(r.get(\"rating\", 3) for r in reviews) / len(reviews)\n",
|
448 | 448 | " }\n",
|
449 | 449 | "\n",
|
|
0 commit comments