crate-workbench
diff --git a/‎docs/extras/integrations/document_loaders/sqlalchemy.ipynb
Lines changed: 237 additions & 0 deletions b/‎docs/extras/integrations/document_loaders/sqlalchemy.ipynb
Lines changed: 237 additions & 0 deletions
diff --git a/‎docs/snippets/modules/data_connection/document_loaders/how_to/sqlalchemy.mdx
Lines changed: 155 additions & 0 deletions b/‎docs/snippets/modules/data_connection/document_loaders/how_to/sqlalchemy.mdx
Lines changed: 155 additions & 0 deletions
@@ -0,0 +1,237 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SQLAlchemy\n",
+    "\n",
+    "This notebook demonstrates how to load documents from an [SQLite] database,\n",
+    "using the [SQLAlchemy] document loader.\n",
+    "\n",
+    "It loads the result of a database query with one document per row.\n",
+    "\n",
+    "[SQLAlchemy]: https://www.sqlalchemy.org/\n",
+    "[SQLite]: https://sqlite.org/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Prerequisites"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "#!pip install langchain termsql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Provide input data as SQLite database."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting example.csv\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%file example.csv\n",
+    "Team,Payroll\n",
+    "Nationals,81.34\n",
+    "Reds,82.20"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Nationals|81.34\r\n",
+      "Reds|82.2\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!termsql --infile=example.csv --head --delimiter=\",\" --outfile=example.sqlite --table=payroll"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Usage"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders.sqlalchemy import SQLAlchemyLoader\n",
+    "from pprint import pprint\n",
+    "\n",
+    "loader = SQLAlchemyLoader(\n",
+    "    \"SELECT * FROM payroll\",\n",
+    "    url=\"sqlite:///example.sqlite\",\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Document(page_content='Team: Nationals\\nPayroll: 81.34', metadata={}),\n",
+      " Document(page_content='Team: Reds\\nPayroll: 82.2', metadata={})]\n"
+     ]
+    }
+   ],
+   "source": [
+    "pprint(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Specifying Which Columns are Content vs Metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = SQLAlchemyLoader(\n",
+    "    \"SELECT * FROM payroll\",\n",
+    "    url=\"sqlite:///example.sqlite\",\n",
+    "    page_content_columns=[\"Team\"],\n",
+    "    metadata_columns=[\"Payroll\"],\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Document(page_content='Team: Nationals', metadata={'Payroll': 81.34}),\n",
+      " Document(page_content='Team: Reds', metadata={'Payroll': 82.2})]\n"
+     ]
+    }
+   ],
+   "source": [
+    "pprint(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Adding Source to Metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = SQLAlchemyLoader(\n",
+    "    \"SELECT * FROM payroll\",\n",
+    "    url=\"sqlite:///example.sqlite\",\n",
+    "    source_columns=[\"Team\"],\n",
+    ")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Document(page_content='Team: Nationals\\nPayroll: 81.34', metadata={'source': 'Nationals'}),\n",
+      " Document(page_content='Team: Reds\\nPayroll: 82.2', metadata={'source': 'Reds'})]\n"
+     ]
+    }
+   ],
+   "source": [
+    "pprint(documents)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
@@ -0,0 +1,155 @@
+# SQLAlchemy
+
+
+## About
+
+The [SQLAlchemy] document loader loads records from any supported database,
+see [SQLAlchemy dialects] for all supported SQL databases and dialects.
+
+You can either use plain SQL for querying, or use an SQLAlchemy `Select`
+statement object, if you are using SQLAlchemy-Core or -ORM.
+
+You can select which columns to place into the document, which columns
+to place into its metadata, which columns to use as a `source` attribute
+in metadata, and whether to include the result row number and/or the SQL
+query expression into the metadata.
+
+
+## Example
+
+This example uses PostgreSQL, and the `psycopg2` driver.
+
+
+### Prerequisites
+
+```shell
+psql postgresql://postgres@localhost/ --command "CREATE DATABASE testdrive;"
+psql postgresql://postgres@localhost/testdrive < ./libs/langchain/tests/integration_tests/examples/mlb_teams_2012.sql
+```
+
+
+### Basic loading
+
+```python
+from langchain.document_loaders.sqlalchemy import SQLAlchemyLoader
+from pprint import pprint
+
+
+loader = SQLAlchemyLoader(
+    query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
+    url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
+)
+docs = loader.load()
+```
+
+```python
+pprint(docs)
+```
+
+<CodeOutputBlock lang="python">
+
+```
+[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={}),
+ Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={}),
+ Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={})]
+```
+
+</CodeOutputBlock>
+
+
+## Enriching metadata
+
+Use the `include_rownum_into_metadata` and `include_query_into_metadata` options to
+optionally populate the `metadata` dictionary with corresponding information.
+
+Having the `query` within metadata is useful when using documents loaded from
+database tables for chains that answer questions using their origin queries.
+
+```python
+loader = SQLAlchemyLoader(
+    query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
+    url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
+    include_rownum_into_metadata=True,
+    include_query_into_metadata=True,
+)
+docs = loader.load()
+```
+
+```python
+pprint(docs)
+```
+
+<CodeOutputBlock lang="python">
+
+```
+[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
+ Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
+ Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'row': 2, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'})]
+```
+
+</CodeOutputBlock>
+
+
+## Customizing metadata
+
+Use the `page_content_columns`, and `metadata_columns` options to optionally populate
+the `metadata` dictionary with corresponding information. When `page_content_columns`
+is empty, all columns will be used.
+
+```python
+loader = SQLAlchemyLoader(
+    query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
+    url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
+    page_content_columns=["Payroll (millions)", "Wins"],
+    metadata_columns=["Team"],
+)
+docs = loader.load()
+```
+
+```python
+pprint(docs)
+```
+
+<CodeOutputBlock lang="python">
+
+```
+[Document(page_content='Payroll (millions): 81.34\nWins: 98', metadata={'Team': 'Nationals'}),
+ Document(page_content='Payroll (millions): 82.2\nWins: 97', metadata={'Team': 'Reds'}),
+ Document(page_content='Payroll (millions): 197.96\nWins: 95', metadata={'Team': 'Yankees'})]
+```
+
+</CodeOutputBlock>
+
+
+## Specify column(s) to identify the document source
+
+Use the `source_columns` option to specify the columns to use as a "source" for the
+document created from each row. This is useful for identifying documents through
+their metadata. Typically, you may use the primary key column(s) for that purpose.
+
+```python
+loader = SQLAlchemyLoader(
+    query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
+    url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
+    source_columns="Team",
+)
+docs = loader.load()
+```
+
+```python
+pprint(docs)
+```
+
+<CodeOutputBlock lang="python">
+
+```
+[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}),
+ Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'}),
+ Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'source': 'Yankees'})]
+```
+
+</CodeOutputBlock>
+
+
+[SQLAlchemy]: https://www.sqlalchemy.org/
+[SQLAlchemy dialects]: https://docs.sqlalchemy.org/en/20/dialects/