Skip to content

Commit 13a867b

Browse files
authored
Add cosmosdb page (#280)
* Add cosmosdb page * Update azure-cosmos-db.md
1 parent 2975543 commit 13a867b

File tree

4 files changed

+144
-0
lines changed

4 files changed

+144
-0
lines changed

images/azure-cosmosdb-collection.png

294 KB
Loading

images/azure-cosmosdb-quickstart.png

119 KB
Loading

integrations/azure-cosmos-db.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
layout: integration
3+
name: Azure CosmosDB
4+
description: Use Azure CosmosDB with Haystack
5+
authors:
6+
- name: deepset
7+
socials:
8+
github: deepset-ai
9+
twitter: deepset_ai
10+
linkedin: https://www.linkedin.com/company/deepset-ai/
11+
pypi: https://pypi.org/project/mongodb-atlas-haystack/
12+
repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mongodb_atlas
13+
type: Document Store
14+
report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues
15+
logo: /logos/azure-cosmos-db.png
16+
toc: true
17+
version: Haystack 2.0
18+
---
19+
20+
**Table of Contents**
21+
22+
- [Overview](#overview)
23+
- [Installation](#installation)
24+
- [Usage](#usage)
25+
26+
## Overview
27+
28+
[Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction) is a fully managed NoSQL, relational, and vector database for modern app development. It offers single-digit millisecond response times, automatic and instant scalability, and guaranteed speed at any scale. It is the database that ChatGPT relies on to dynamically scale with high reliability and low maintenance.
29+
30+
[Azure Cosmos DB for MongoDB](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/introduction) makes it easy to use Azure Cosmos DB as if it were a MongoDB database. You can use your existing MongoDB skills and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the connection string for your account using the API for MongoDB. Learn more in the [Azure Cosmos DB for MongoDB documentation](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/).
31+
32+
## Installation
33+
34+
It's possible to connect to your MongoDB cluster in Azure Cosmos DB through the `MongoDBAtlasDocumentStore`. For that, install the `mongo-atlas-haystack` integration.
35+
```bash
36+
pip install mongodb-atlas-haystack
37+
```
38+
39+
## Usage
40+
41+
To use Azure Cosmos DB for MongoDB with `MongoDBAtlasDocumentStore`, you'll need to set up an Azure Cosmos DB for MongoDB vCore cluster through the Azure portal. For a step-by-step guide, refer to [Quickstart: Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal).
42+
43+
After setting up your cluster, configure the `MONGO_CONNECTION_STRING` environment variable using the connection string for your cluster. You can find the connection string by following the instructions [here](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal#get-cluster-credentials). The format should look like this:
44+
45+
```python
46+
import os
47+
48+
os.environ["MONGO_CONNECTION_STRING"] = "mongodb+srv://<username>:<password>@<clustername>.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
49+
```
50+
51+
Next, navigate to the Quickstart page of your cluster and click "Launch Quickstart."
52+
53+
![Azure CosmosDB cluster quickstart](https://raw.githubusercontent.com/deepset-ai/haystack-integrations/main/images/azure-cosmosdb-quickstart.png)
54+
55+
This will start the Quickstart guide, which will walk you through creating a database and a collection.
56+
57+
![Azure CosmosDB collection](https://raw.githubusercontent.com/deepset-ai/haystack-integrations/main/images/azure-cosmosdb-collection.png)
58+
59+
Once this is done, you can initialize the [`MongoDBAtlasDocumentStore`](https://docs.haystack.deepset.ai/docs/mongodbatlasdocumentstore) in Haystack with the appropriate configuration.
60+
61+
```python
62+
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
63+
from haystack import Document
64+
65+
document_store = MongoDBAtlasDocumentStore(
66+
database_name="quickstartDB", # your db name
67+
collection_name="sampleCollection", # your collection name
68+
vector_search_index="haystack-test", # your cluster name
69+
)
70+
71+
document_store.write_documents([Document(content="this is my first doc")])
72+
```
73+
74+
### Example pipelines
75+
76+
Here is some example code of an end-to-end RAG app built on Azure Cosmos DB: one indexing pipeline that embeds the documents,
77+
and a generative pipeline that can be used for question answering.
78+
79+
```python
80+
from haystack import Pipeline, Document
81+
from haystack.document_stores.types import DuplicatePolicy
82+
from haystack.components.writers import DocumentWriter
83+
from haystack.components.generators import OpenAIGenerator
84+
from haystack.components.builders.prompt_builder import PromptBuilder
85+
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
86+
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
87+
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
88+
89+
# Create some example documents
90+
documents = [
91+
Document(content="My name is Jean and I live in Paris."),
92+
Document(content="My name is Mark and I live in Berlin."),
93+
Document(content="My name is Giorgio and I live in Rome."),
94+
]
95+
96+
document_store = MongoDBAtlasDocumentStore(
97+
database_name="quickstartDB", # your db name
98+
collection_name="sampleCollection", # your collection name
99+
vector_search_index="haystack-test", # your cluster name
100+
)
101+
102+
# Define some more components
103+
doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
104+
doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2")
105+
query_embedder = SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2")
106+
107+
# Pipeline that ingests document for retrieval
108+
indexing_pipe = Pipeline()
109+
indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
110+
indexing_pipe.add_component(instance=doc_writer, name="doc_writer")
111+
112+
indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
113+
indexing_pipe.run({"doc_embedder": {"documents": documents}})
114+
115+
# Build a RAG pipeline with a Retriever to get documents relevant to
116+
# the query, a PromptBuilder to create a custom prompt and the OpenAIGenerator (LLM)
117+
prompt_template = """
118+
Given these documents, answer the question.\nDocuments:
119+
{% for doc in documents %}
120+
{{ doc.content }}
121+
{% endfor %}
122+
123+
\nQuestion: {{question}}
124+
\nAnswer:
125+
"""
126+
rag_pipeline = Pipeline()
127+
rag_pipeline.add_component(instance=query_embedder, name="query_embedder")
128+
rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store), name="retriever")
129+
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
130+
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
131+
rag_pipeline.connect("query_embedder", "retriever.query_embedding")
132+
rag_pipeline.connect("embedding_retriever", "prompt_builder.documents")
133+
rag_pipeline.connect("prompt_builder", "llm")
134+
135+
# Ask a question on the data you just added.
136+
question = "Where does Mark live?"
137+
result = rag_pipeline.run(
138+
{
139+
"query_embedder": {"text": question},
140+
"prompt_builder": {"question": question},
141+
}
142+
)
143+
print(result)
144+
```

logos/azure-cosmos-db.png

110 KB
Loading

0 commit comments

Comments
 (0)