Skip to content

Add support for CrateDB to LangChain LLM framework #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: release-v0.3.4
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
284c5d9
CrateDB vector: Add vector store support
amotl Sep 15, 2023
ba95bde
CrateDB vector: Add documentation
amotl Sep 15, 2023
00159ff
CrateDB loader: Add SQLAlchemy document loader
amotl Sep 16, 2023
473b66a
CrateDB loader: Add document loader support
amotl Sep 16, 2023
b9015c9
Community: Generalize `SQLChatMessageHistory` to improve code reusabi…
amotl Sep 17, 2023
8a0f3d6
CrateDB memory: Add conversational memory support
amotl Sep 17, 2023
3330b0d
CrateDB vector: Fix usage when only reading, and not storing
amotl Oct 27, 2023
38c2374
CrateDB vector: Unable to invoke `add_embeddings` without embeddings
amotl Oct 27, 2023
0f6adf9
CrateDB vector: Improve SQLAlchemy model factory
amotl Nov 20, 2023
2d30228
CrateDB vector: Fix cascading deletes
amotl Nov 20, 2023
9dfc828
CrateDB vector: Add CrateDBVectorSearchMultiCollection
amotl Nov 21, 2023
b72a06c
CrateDB vector: Improve SQLAlchemy data model query utility functions
amotl Nov 21, 2023
f8317fe
CrateDB vector: Improve testing when initialized without dimensionality
amotl Nov 21, 2023
53aee67
CrateDB vector: Use SA's `bulk_save_objects` method for inserting emb…
amotl Nov 21, 2023
70685ce
CrateDB vector: Test non-deterministic values by using pytest.approx
amotl Nov 22, 2023
ccd2a25
CrateDB vector: Fix initialization of vector dimensionality
amotl Nov 27, 2023
800ace6
CrateDB: Refactor to `langchain_community`
amotl Jan 18, 2024
b40c24f
CrateDB vector: Adjustments for updates to pgvector adapter
amotl Jan 18, 2024
cb06a66
CrateDB vector: Relax test constraint
amotl Jan 19, 2024
fa28b24
CrateDB loader: SQLAlchemyLoader has been superseded by SQLDatabaseLo…
amotl Jun 5, 2024
41ccacf
CrateDB: Migrate from `crate[sqlalchemy]` to `sqlalchemy-cratedb`
amotl Jun 10, 2024
3bc63a8
CrateDB: Stop using CrateDB Toolkit
amotl Jun 18, 2024
c561a95
CrateDB: Stop using local `FloatVector` implementation
amotl Jun 25, 2024
8b278a8
CrateDB: Format code. Satisfy linter and type checker. ruff + mypy
amotl Oct 24, 2024
41f6462
CrateDB: Remove adjustment to ConsistentFakeEmbeddings in langchain-core
amotl Oct 28, 2024
19a09ab
CrateDB: Refactor leftovers from langchain-core to langchain-community
amotl Oct 28, 2024
91da770
CrateDB: Remove documentation about SQLDatabaseLoader
amotl Oct 28, 2024
1faedfe
CrateDB: Remove leftovers in langchain-core
amotl Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ node_modules/

.docusaurus
.cache-loader
docs/api
docs/api
example.sqlite
165 changes: 165 additions & 0 deletions docs/docs/how_to/sql_database.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# SQLDatabaseLoader


## About

The `SQLDatabaseLoader` loads records from any database supported by
[SQLAlchemy], see [SQLAlchemy dialects] for the whole list of supported
SQL databases and dialects.

You can either use plain SQL for querying, or use an SQLAlchemy `Select`
statement object, if you are using SQLAlchemy-Core or -ORM.

You can select which columns to place into the document, which columns
to place into its metadata, which columns to use as a `source` attribute
in metadata, and whether to include the result row number and/or the SQL
query expression into the metadata.


## Example

This example uses PostgreSQL, and the `psycopg2` driver.


### Prerequisites

```shell
psql postgresql://postgres@localhost/ --command "CREATE DATABASE testdrive;"
psql postgresql://postgres@localhost/testdrive < ./libs/langchain/tests/integration_tests/examples/mlb_teams_2012.sql
```


### Basic loading

```python
from langchain_community.document_loaders.sql_database import SQLDatabaseLoader
from pprint import pprint


loader = SQLDatabaseLoader(
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
)
docs = loader.load()
```

```python
pprint(docs)
```

<CodeOutputBlock lang="python">

```
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={}),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={}),
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={})]
```

</CodeOutputBlock>


## Enriching metadata

Use the `include_rownum_into_metadata` and `include_query_into_metadata` options to
optionally populate the `metadata` dictionary with corresponding information.

Having the `query` within metadata is useful when using documents loaded from
database tables for chains that answer questions using their origin queries.

```python
loader = SQLDatabaseLoader(
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
include_rownum_into_metadata=True,
include_query_into_metadata=True,
)
docs = loader.load()
```

```python
pprint(docs)
```

<CodeOutputBlock lang="python">

```
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'row': 2, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'})]
```

</CodeOutputBlock>


## Customizing metadata

Use the `page_content_columns`, and `metadata_columns` options to optionally populate
the `metadata` dictionary with corresponding information. When `page_content_columns`
is empty, all columns will be used.

```python
import functools

row_to_content = functools.partial(
SQLDatabaseLoader.page_content_default_mapper, column_names=["Payroll (millions)", "Wins"]
)
row_to_metadata = functools.partial(
SQLDatabaseLoader.metadata_default_mapper, column_names=["Team"]
)

loader = SQLDatabaseLoader(
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
page_content_mapper=row_to_content,
metadata_mapper=row_to_metadata,
)
docs = loader.load()
```

```python
pprint(docs)
```

<CodeOutputBlock lang="python">

```
[Document(page_content='Payroll (millions): 81.34\nWins: 98', metadata={'Team': 'Nationals'}),
Document(page_content='Payroll (millions): 82.2\nWins: 97', metadata={'Team': 'Reds'}),
Document(page_content='Payroll (millions): 197.96\nWins: 95', metadata={'Team': 'Yankees'})]
```

</CodeOutputBlock>


## Specify column(s) to identify the document source

Use the `source_columns` option to specify the columns to use as a "source" for the
document created from each row. This is useful for identifying documents through
their metadata. Typically, you may use the primary key column(s) for that purpose.

```python
loader = SQLDatabaseLoader(
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
source_columns=["Team"],
)
docs = loader.load()
```

```python
pprint(docs)
```

<CodeOutputBlock lang="python">

```
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}),
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'}),
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'source': 'Yankees'})]
```

</CodeOutputBlock>


[SQLAlchemy]: https://www.sqlalchemy.org/
[SQLAlchemy dialects]: https://docs.sqlalchemy.org/en/20/dialects/
Loading