|
38 | 38 | "metadata": {}, |
39 | 39 | "source": [ |
40 | 40 | "## Loading and Preparing the Dataset\n", |
41 | | - "We will use an open source dataset consisting of approx. 28000 customer reviews for a clothing store. The dataset is available at [Shopper Sentiments](https://www.kaggle.com/datasets/nelgiriyewithana/shoppersentiments).\n", |
| 41 | + "We will use an open dataset consisting of approx. 28000 customer reviews for a clothing store. The dataset is available at [Shopper Sentiments](https://www.kaggle.com/datasets/nelgiriyewithana/shoppersentiments).\n", |
42 | 42 | "\n", |
43 | 43 | "We will load the dataset and convert it into a JSON format that can be used by Haystack.\n" |
44 | 44 | ] |
|
122 | 122 | "source": [ |
123 | 123 | "## Setting up Azure AI Search and Indexing Pipeline\n", |
124 | 124 | "\n", |
125 | | - "We set up indexing pipeline with `AzureAISearchDocumentStore` by following these steps:\n", |
| 125 | + "We set up an indexing pipeline with `AzureAISearchDocumentStore` by following these steps:\n", |
126 | 126 | "1. Configure semantic search for the index\n", |
127 | 127 | "2. Initialize the document store with custom metadata fields and semantic search configuration\n", |
128 | 128 | "3. Create an indexing pipeline that:\n", |
|
187 | 187 | "\n", |
188 | 188 | "# Indexing Pipeline\n", |
189 | 189 | "indexing_pipeline = Pipeline()\n", |
190 | | - "indexing_pipeline.add_component(\"document_embedder\", AzureOpenAIDocumentEmbedder())\n", |
| 190 | + "indexing_pipeline.add_component(AzureOpenAIDocumentEmbedder(), name=\"document_embedder\")\n", |
191 | 191 | "indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name=\"doc_writer\")\n", |
192 | 192 | "indexing_pipeline.connect(\"document_embedder\", \"doc_writer\")\n", |
193 | 193 | "\n", |
|
202 | 202 | "\n", |
203 | 203 | "Here we set up the query pipeline that will retrieve relevant reviews based on user queries. The pipeline consists of:\n", |
204 | 204 | "\n", |
205 | | - "1. A text embedder (`AzureOpenAITextEmbedder`) that converts user queries into vector embeddings\n", |
| 205 | + "1. A text embedder (`AzureOpenAITextEmbedder`) that converts user queries into embeddings.\n", |
206 | 206 | "2. A hybrid retriever (`AzureAISearchHybridRetriever`) that uses vector and semantic search to retrieve the most relevant reviews.\n" |
207 | 207 | ] |
208 | 208 | }, |
|
303 | 303 | "import numpy as np\n", |
304 | 304 | "\n", |
305 | 305 | "\n", |
306 | | - "def plot_sentiment_distribution(topics):\n", |
307 | | - " # Create DataFrame from topics data\n", |
| 306 | + "def plot_sentiment_distribution(aspects):\n", |
| 307 | + " # Create DataFrame from aspects data\n", |
308 | 308 | " data = [(topic, review['sentiment']['analyzer_rating'], \n", |
309 | 309 | " review['review']['rating'], review['sentiment']['label'])\n", |
310 | | - " for topic, reviews in topics.items()\n", |
| 310 | + " for topic, reviews in aspects.items()\n", |
311 | 311 | " for review in reviews]\n", |
312 | 312 | " \n", |
313 | 313 | " df = pd.DataFrame(data, columns=['Topic', 'Normalized Score', 'Original Rating', 'Sentiment'])\n", |
|
367 | 367 | "\n", |
368 | 368 | "Create a tool to perform aspect-based sentiment analysis on customer reviews using the VADER sentiment analyzer. It involves:\n", |
369 | 369 | "\n", |
370 | | - "- Identifying specific topics within reviews (e.g., product quality, shipping, customer service, pricing) using predefined keywords\n", |
371 | | - "- Calculating sentiment scores for each review mentioning these topics\n", |
| 370 | + "- Identifying specific aspects within reviews (e.g., product quality, shipping, customer service, pricing) using predefined keywords\n", |
| 371 | + "- Calculating sentiment scores for each review mentioning these aspects\n", |
372 | 372 | "- Categorizing sentiment as 'positive', 'negative', or 'neutral' \n", |
373 | 373 | "- Normalizing sentiment scores to a scale of 1 to 5 for comparison with customer ratings\n" |
374 | 374 | ] |
|
394 | 394 | " sentiment scores using VADER and categorizes the sentiment as 'positive', 'negative', or 'neutral'.\n", |
395 | 395 | " \n", |
396 | 396 | " \"\"\"\n", |
397 | | - " topics = {\n", |
| 397 | + " aspects = {\n", |
398 | 398 | " \"product_quality\": [],\n", |
399 | 399 | " \"shipping\": [],\n", |
400 | 400 | " \"customer_service\": [],\n", |
|
432 | 432 | " sentiment_label = 'neutral'\n", |
433 | 433 | " \n", |
434 | 434 | " # Append the review along with its sentiment analysis result\n", |
435 | | - " topics[topic].append({\n", |
| 435 | + " aspects[topic].append({\n", |
436 | 436 | " \"review\": review,\n", |
437 | 437 | " \"sentiment\": {\n", |
438 | 438 | " \"analyzer_rating\": normalized_score,\n", |
439 | 439 | " \"label\": sentiment_label\n", |
440 | 440 | " }\n", |
441 | 441 | " })\n", |
442 | | - " plot_sentiment_distribution(topics)\n", |
| 442 | + " plot_sentiment_distribution(aspects)\n", |
443 | 443 | "\n", |
444 | 444 | " return {\n", |
445 | 445 | " \"total_reviews\": len(reviews),\n", |
446 | | - " \"sentiment_analysis\": topics,\n", |
| 446 | + " \"sentiment_analysis\": aspects,\n", |
447 | 447 | " \"average_rating\": sum(r.get(\"rating\", 3) for r in reviews) / len(reviews)\n", |
448 | 448 | " }\n", |
449 | 449 | "\n", |
|
0 commit comments