Skip to content

Commit 8212340

Browse files
authored
Merge pull request #866 from ScrapeGraphAI/deps-cleanup
Deps cleanup
2 parents 927c99b + 8d9c909 commit 8212340

35 files changed

+355
-2616
lines changed

.github/update-requirements.yml

Lines changed: 0 additions & 26 deletions
This file was deleted.

.github/workflows/python-publish.yml

Lines changed: 0 additions & 32 deletions
This file was deleted.

README.md

Lines changed: 47 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -24,21 +24,6 @@ Just say which information you want to extract and the library will do it for yo
2424
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;">
2525
</p>
2626

27-
## 🔗 ScrapeGraph API & SDKs
28-
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
29-
30-
<p align="center">
31-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
32-
</p>
33-
34-
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
35-
36-
| SDK | Language | GitHub Link |
37-
|-----------|----------|-----------------------------------------------------------------------------|
38-
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
39-
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
40-
41-
The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
4227

4328
## 🚀 Quick install
4429

@@ -47,35 +32,12 @@ The reference page for Scrapegraph-ai is available on the official page of PyPI:
4732
```bash
4833
pip install scrapegraphai
4934

35+
# IMPORTANT (to fetch websites content)
5036
playwright install
5137
```
5238

5339
**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
5440

55-
<details>
56-
<summary><b>Optional Dependencies</b></summary>
57-
Additional dependecies can be added while installing the library:
58-
59-
- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
60-
61-
This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
62-
```bash
63-
pip install scrapegraphai[other-language-models]
64-
```
65-
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.
66-
67-
```bash
68-
pip install scrapegraphai[more-semantic-options]
69-
```
70-
71-
- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.
72-
73-
```bash
74-
pip install scrapegraphai[more-browser-options]
75-
```
76-
77-
</details>
78-
7941

8042
## 💻 Usage
8143
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
@@ -84,13 +46,12 @@ The most common one is the `SmartScraperGraph`, which extracts information from
8446

8547

8648
```python
87-
import json
8849
from scrapegraphai.graphs import SmartScraperGraph
8950

9051
# Define the configuration for the scraping pipeline
9152
graph_config = {
9253
"llm": {
93-
"api_key": "YOUR_OPENAI_APIKEY",
54+
"api_key": "YOUR_OPENAI_API_KEY",
9455
"model": "openai/gpt-4o-mini",
9556
},
9657
"verbose": True,
@@ -99,33 +60,45 @@ graph_config = {
9960

10061
# Create the SmartScraperGraph instance
10162
smart_scraper_graph = SmartScraperGraph(
102-
prompt="Extract me all the news from the website",
103-
source="https://www.wired.com",
63+
prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
64+
source="https://scrapegraphai.com/",
10465
config=graph_config
10566
)
10667

10768
# Run the pipeline
10869
result = smart_scraper_graph.run()
70+
71+
import json
10972
print(json.dumps(result, indent=4))
11073
```
11174

11275
The output will be a dictionary like the following:
11376

11477
```python
115-
"result": {
116-
"news": [
117-
{
118-
"title": "The New Jersey Drone Mystery May Not Actually Be That Mysterious",
119-
"link": "https://www.wired.com/story/new-jersey-drone-mystery-maybe-not-drones/",
120-
"author": "Lily Hay Newman"
121-
},
122-
{
123-
"title": "Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award",
124-
"link": "https://www.wired.com/story/bytedance-intern-best-paper-neurips/",
125-
"author": "Louise Matsakis"
126-
},
127-
...
128-
]
78+
{
79+
"description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
80+
"founders": [
81+
{
82+
"name": "Marco Perini",
83+
"role": "Founder & Technical Lead",
84+
"linkedin": "https://www.linkedin.com/in/perinim/"
85+
},
86+
{
87+
"name": "Marco Vinciguerra",
88+
"role": "Founder & Software Engineer",
89+
"linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
90+
},
91+
{
92+
"name": "Lorenzo Padoan",
93+
"role": "Founder & Product Engineer",
94+
"linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
95+
}
96+
],
97+
"social_media_links": {
98+
"linkedin": "https://www.linkedin.com/company/101881123",
99+
"twitter": "https://x.com/scrapegraphai",
100+
"github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
101+
}
129102
}
130103
```
131104
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
@@ -145,20 +118,30 @@ It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**,
145118

146119
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
147120

148-
## 🔍 Demo
149-
Official streamlit demo:
150-
151-
[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-demo-demo.streamlit.app)
152121

153-
Try it directly on the web using Google Colab:
122+
## 📖 Documentation
154123

155124
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
156125

157-
## 📖 Documentation
158-
159126
The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
160127
Check out also the Docusaurus [here](https://docs-oss.scrapegraphai.com/).
161128

129+
## 🔗 ScrapeGraph API & SDKs
130+
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
131+
132+
<p align="center">
133+
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
134+
</p>
135+
136+
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
137+
138+
| SDK | Language | GitHub Link |
139+
|-----------|----------|-----------------------------------------------------------------------------|
140+
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
141+
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
142+
143+
The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
144+
162145
## 🏆 Sponsors
163146
<div style="text-align: center;">
164147
<a href="https://2ly.link/1zaXG">

docs/turkish.md

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -31,31 +31,6 @@ playwright install
3131

3232
**Not**: Diğer kütüphanelerle çakışmaları önlemek için kütüphaneyi sanal bir ortamda kurmanız önerilir 🐱
3333

34-
<details>
35-
<summary><b>Opsiyonel Bağımlılıklar</b></summary>
36-
Kütüphaneyi kurarken ek bağımlılıklar ekleyebilirsiniz:
37-
38-
- **Daha Fazla Dil Modeli**: Fireworks, Groq, Anthropic, Hugging Face ve Nvidia AI Endpoints gibi ek dil modelleri kurulur.
39-
40-
Bu grup, Fireworks, Groq, Anthropic, Together AI, Hugging Face ve Nvidia AI Endpoints gibi ek dil modellerini kullanmanızı sağlar.
41-
42-
```bash
43-
pip install scrapegraphai[other-language-models]
44-
```
45-
46-
- **Semantik Seçenekler**: Graphviz gibi gelişmiş semantik işleme araçlarını içerir.
47-
48-
```bash
49-
pip install scrapegraphai[more-semantic-options]
50-
```
51-
52-
- **Tarayıcı Seçenekleri**: Browserbase gibi ek tarayıcı yönetim araçları/hizmetlerini içerir.
53-
54-
```bash
55-
pip install scrapegraphai[more-browser-options]
56-
```
57-
58-
</details>
5934

6035
## 💻 Kullanım
6136

examples/anthropic/csv_scraper_anthropic.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@
33
"""
44
import os
55
from dotenv import load_dotenv
6-
import pandas as pd
76
from scrapegraphai.graphs import CSVScraperGraph
8-
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
7+
from scrapegraphai.utils import prettify_exec_info
98

109
load_dotenv()
1110

@@ -17,7 +16,8 @@
1716
curr_dir = os.path.dirname(os.path.realpath(__file__))
1817
file_path = os.path.join(curr_dir, FILE_NAME)
1918

20-
text = pd.read_csv(file_path)
19+
with open(file_path, 'r') as file:
20+
text = file.read()
2121

2222
# ************************************************
2323
# Define the configuration for the graph
@@ -41,7 +41,7 @@
4141

4242
csv_scraper_graph = CSVScraperGraph(
4343
prompt="List me all the last names",
44-
source=str(text), # Pass the content of the file, not the file object
44+
source=text, # Pass the content of the file
4545
config=graph_config
4646
)
4747

@@ -53,8 +53,4 @@
5353
# ************************************************
5454

5555
graph_exec_info = csv_scraper_graph.get_execution_info()
56-
print(prettify_exec_info(graph_exec_info))
57-
58-
# Save to json or csv
59-
convert_to_csv(result, "result")
60-
convert_to_json(result, "result")
56+
print(prettify_exec_info(graph_exec_info))

examples/anthropic/csv_scraper_graph_multi_anthropic.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@
33
"""
44
import os
55
from dotenv import load_dotenv
6-
import pandas as pd
76
from scrapegraphai.graphs import CSVScraperMultiGraph
8-
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
7+
from scrapegraphai.utils import prettify_exec_info
98

109
load_dotenv()
1110
# ************************************************
@@ -16,7 +15,8 @@
1615
curr_dir = os.path.dirname(os.path.realpath(__file__))
1716
file_path = os.path.join(curr_dir, FILE_NAME)
1817

19-
text = pd.read_csv(file_path)
18+
with open(file_path, 'r') as file:
19+
text = file.read()
2020

2121
# ************************************************
2222
# Define the configuration for the graph
@@ -48,7 +48,3 @@
4848

4949
graph_exec_info = csv_scraper_graph.get_execution_info()
5050
print(prettify_exec_info(graph_exec_info))
51-
52-
# Save to json or csv
53-
convert_to_csv(result, "result")
54-
convert_to_json(result, "result")

examples/openai/depth_search_graph_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
load_dotenv()
99

10-
openai_key = os.getenv("OPENAI_APIKEY")
10+
openai_key = os.getenv("OPENAI_API_KEY")
1111

1212
graph_config = {
1313
"llm": {

examples/openai/search_graph_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# Define the configuration for the graph
1212
# ************************************************
1313

14-
openai_key = os.getenv("OPENAI_APIKEY")
14+
openai_key = os.getenv("OPENAI_API_KEY")
1515

1616
graph_config = {
1717
"llm": {

examples/openai/smart_scraper_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
# ************************************************
2929

3030
smart_scraper_graph = SmartScraperGraph(
31-
prompt="Extract me all the articles",
31+
prompt="Extract me the first article",
3232
source="https://www.wired.com",
3333
config=graph_config
3434
)

examples/openai/speech_graph_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
# Define the configuration for the graph
2121
# ************************************************
2222

23-
openai_key = os.getenv("OPENAI_APIKEY")
23+
openai_key = os.getenv("OPENAI_API_KEY")
2424

2525
graph_config = {
2626
"llm": {

0 commit comments

Comments
 (0)