Skip to content

Commit 746d73d

Browse files
authored
Update apify.md
1 parent dad8f86 commit 746d73d

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

integrations/apify.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ toc: true
3232
It helps automate web tasks and extract content from e-commerce websites, social media (Facebook, Instagram, TikTok), search engines, online maps, and more.
3333
Apify provides more than two thousand ready-made cloud solutions called Actors.
3434

35+
> Follow 🧑‍🍳 [Cookbook: Extract and use website content for question answering with Apify-Haystack integration](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/apify_haystack_rag.ipynb) for the full example
36+
3537
## Installation
3638

3739
Install the Apify-haystack integration:
@@ -75,12 +77,14 @@ Haystack is an open-source framework fo...', meta: {'url': 'https://docs.haystac
7577

7678
```python
7779
from dotenv import load_dotenv
80+
import os
7881
from haystack import Document
7982

8083
from apify_haystack import ApifyDatasetFromActorCall
8184

82-
# Set APIFY-API-TOKEN here or load it from .env file
83-
apify_api_token = "" or load_dotenv()
85+
# Use APIFY_API_TOKEN from .env file or set it
86+
load_dotenv()
87+
os.environ["APIFY_API_TOKEN"] = "YOUR APIFY_API_TOKEN"
8488

8589
actor_id = "apify/website-content-crawler"
8690
run_input = {
@@ -104,8 +108,7 @@ def dataset_mapping_function(dataset_item: dict) -> Document:
104108
actor = ApifyDatasetFromActorCall(
105109
actor_id=actor_id,
106110
run_input=run_input,
107-
dataset_mapping_function=dataset_mapping_function,
108-
apify_api_token=apify_api_token,
111+
dataset_mapping_function=dataset_mapping_function
109112
)
110113
print(f"Calling the Apify Actor {actor_id} ... crawling will take some time ...")
111114
print("You can monitor the progress at: https://console.apify.com/actors/runs")
@@ -117,7 +120,7 @@ for d in dataset:
117120
print(d)
118121
```
119122

120-
### ApifyDatasetFromActorCall in a [RAG pipeline](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)
123+
### ApifyDatasetFromActorCall in a RAG pipeline
121124

122125
*Retrieval-Augmented Generation (RAG):* Extracting text content from a website and using it for question answering.
123126
Answer questions about the https://haystack.deepset.ai website using the extracted text content.
@@ -128,7 +131,7 @@ question: "What is haystack?"
128131
answer: Haystack is an open-source framework for building production-ready LLM applications
129132
``````
130133
131-
In addition to the `Apify API token`, you also need to specify `OpenAI API token` to run this example.
134+
In addition to the `APIFY_API_TOKEN`, you also need to specify `OPENAI_API_KEY` to run this example.
132135
133136
```python
134137
@@ -145,10 +148,10 @@ from haystack.utils.auth import Secret
145148
146149
from apify_haystack import ApifyDatasetFromActorCall
147150
148-
# Set APIFY-API-TOKEN here or use it from .env file
151+
# Set APIFY_API_TOKEN and OPENAI_API_KEY here or use it from .env file
149152
load_dotenv()
150-
apify_api_token = "" or os.getenv("APIFY_API_TOKEN")
151-
openai_api_key = "" or os.getenv("OPENAI_API_KEY")
153+
os.environ["APIFY_API_TOKEN"] = getpass("Enter YOUR APIFY_API_TOKEN")
154+
os.environ["OPENAI_API_KEY"] = getpass("Enter YOUR OPENAI_API_KEY")
152155
153156
actor_id = "apify/website-content-crawler"
154157
run_input = {
@@ -172,16 +175,15 @@ def dataset_mapping_function(dataset_item: dict) -> Document:
172175
apify_dataset_loader = ApifyDatasetFromActorCall(
173176
actor_id=actor_id,
174177
run_input=run_input,
175-
dataset_mapping_function=dataset_mapping_function,
176-
apify_api_token=apify_api_token,
178+
dataset_mapping_function=dataset_mapping_function
177179
)
178180
179181
# Components
180182
print("Initializing components...")
181183
document_store = InMemoryDocumentStore()
182184
183-
docs_embedder = OpenAIDocumentEmbedder(api_key=Secret.from_token(openai_api_key))
184-
text_embedder = OpenAITextEmbedder(api_key=Secret.from_token(openai_api_key))
185+
docs_embedder = OpenAIDocumentEmbedder()
186+
text_embedder = OpenAITextEmbedder()
185187
retriever = InMemoryEmbeddingRetriever(document_store)
186188
generator = OpenAIGenerator(model="gpt-3.5-turbo")
187189

0 commit comments

Comments
 (0)