Skip to content

Commit e3a4916

Browse files
authored
update unstructured (#249)
1 parent c408a1d commit e3a4916

File tree

1 file changed

+41
-14
lines changed

1 file changed

+41
-14
lines changed

integrations/unstructured-file-converter.md

Lines changed: 41 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,37 +14,64 @@ type: Data Ingestion
1414
report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues
1515
logo: /logos/unstructured.svg
1616
version: Haystack 2.0
17+
toc: true
1718
---
19+
- [Overview](#overview)
20+
- [Installation](#installation)
21+
- [Usage](#usage)
22+
- [Connecting to the Unstructured API](#connecting-to-the-unstructured-api)
23+
- [Hosted API](#hosted-api)
24+
- [Local API (Docker)](#local-api-docker)
25+
- [Running Unstructured File Converter](#running-unstructured-file-converter)
26+
- [In isolation](#in-isolation)
27+
- [In a Haystack Pipeline](#in-a-haystack-pipeline)
1828

19-
Component for the Haystack (2.x) LLM framework to easily convert files and directories into Documents using the Unstructured API.
2029

21-
**[Unstructured](https://unstructured-io.github.io/unstructured/index.html)** provides a series of tools to do **ETL for LLMs**. This component calls the Unstructured API that simply extracts text and other information from a vast range of file formats. See [supported file types](https://docs.unstructured.io/api-reference/api-services/overview#supported-file-types).
30+
31+
## Overview
32+
Component for the Haystack (2.x) LLM framework to convert files and directories into Documents using the Unstructured API.
33+
34+
**[Unstructured](https://unstructured-io.github.io/unstructured/index.html)** provides ETL tools for LLMs, extracting text and other information from various file formats. See [supported file types](https://docs.unstructured.io/api-reference/api-services/overview#supported-file-types) for more details.
2235

2336
## Installation
37+
To install the [Unstructured File Converter](https://docs.haystack.deepset.ai/docs/unstructuredfileconverter), run:
2438

2539
```bash
2640
pip install unstructured-fileconverter-haystack
2741
```
2842

29-
### Hosted API
30-
If you plan to use the hosted version of the Unstructured API, you just need the **(free) Unstructured API key**. You can get it by signing up [here](https://unstructured.io/api-key-free).
43+
## Usage
44+
45+
### Connecting to the Unstructured API
46+
#### Hosted API
47+
48+
The Unstructured API is available in both free and paid versions: Unstructured Serverless API or Free Unstructured API.
3149

32-
### Local API (Docker)
33-
If you want to run your own local instance of the Unstructured API, you need Docker and you can find instructions [here](https://unstructured-io.github.io/unstructured/api.html#using-docker-images).
50+
For the Free Unstructured API, the API URL is `https://api.unstructured.io/general/v0/general`. For the Unstructured Serverless API, find your unique API URL in your Unstructured account.
3451

35-
In short, this should work:
52+
Note that the API keys for free and paid versions are not interchangeable.
53+
54+
Set the Unstructured API key as an environment variable:
3655
```bash
37-
docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0
56+
export UNSTRUCTURED_API_KEY=your_api_key
3857
```
3958

40-
## Usage
59+
#### Local API (Docker)
60+
You can run a local instance of the Unstructured API using Docker:
4161

42-
If you plan to use the hosted version of the Unstructured API, set the Unstructured API key as an environment variable `UNSTRUCTURED_API_KEY`:
4362
```bash
44-
export UNSTRUCTURED_API_KEY=your_api_key
63+
docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0
64+
```
65+
66+
When initializing the component, specify the localhost URL:
67+
```python
68+
from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter
69+
70+
converter = UnstructuredFileConverter(api_url="http://localhost:8000/general/v0/general")
4571
```
4672

47-
### In isolation
73+
### Running Unstructured File Converter
74+
#### In isolation
4875
```python
4976
import os
5077
from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter
@@ -53,7 +80,7 @@ converter = UnstructuredFileConverter()
5380
documents = converter.run(paths = ["a/file/path.pdf", "a/directory/path"])["documents"]
5481
```
5582

56-
### In a Haystack Pipeline
83+
#### In a Haystack Pipeline
5784
```python
5885
import os
5986
from haystack import Pipeline
@@ -69,4 +96,4 @@ indexing.add_component("writer", DocumentWriter(document_store))
6996
indexing.connect("converter", "writer")
7097

7198
indexing.run({"converter": {"paths": ["a/file/path.pdf", "a/directory/path"]}})
72-
```
99+
```

0 commit comments

Comments
 (0)