This document lists the SQL surface currently supported by the lance DuckDB extension, with short examples.
INSTALL lance FROM community;
LOAD lance;For local development builds, load the extension artifact directly:
LOAD 'build/release/extension/lance/lance.duckdb_extension';Query a dataset by selecting from its URI directly:
SELECT *
FROM 'path/to/dataset.lance'
LIMIT 10;-- Search a vector column, returning distances in `_distance` (smaller is closer)
SELECT id, label, _distance
FROM lance_vector_search(
'path/to/dataset.lance',
'vec',
[0.1, 0.2, 0.3, 0.4]::FLOAT[4],
k = 5,
use_index = true,
nprobs = 4,
refine_factor = 2,
prefilter = true
)
ORDER BY _distance ASC;Signature: lance_vector_search(uri, vector_column, query_vector, ...)
Positional arguments:
uri(VARCHAR): Dataset root path or object store URI (e.g.s3://...).vector_column(VARCHAR): Vector column name.query_vector(FLOAT[dim] or DOUBLE[dim], preferred): Query vector (must be non-empty; values are cast to float32).FLOAT[]/DOUBLE[]are also accepted.
Named parameters:
k(BIGINT, default10): Number of results to return. Must be > 0.use_index(BOOLEAN, defaulttrue): Iftrue, allow ANN index usage when available.nprobs(BIGINT, optional): Number of IVF partitions to probe when using a vector index. Must be > 0. Only affects IVF-based vector indices.refine_factor(BIGINT, optional): Over-fetch factor for re-ranking using original vectors. Must be > 0. A value of1still enables re-ranking.prefilter(BOOLEAN, defaultfalse): Iftrue, filters are applied before top-k selection.explain_verbose(BOOLEAN, defaultfalse): Emit a more verbose Lance plan inEXPLAINoutput.
Output:
- Dataset columns plus
_distance(smaller is closer).
Filter semantics:
- If
prefilter=false, filter pushdown is best-effort. If pushdown fails, the query is retried without pushed filters and DuckDB applies filters for correctness. - If
prefilter=true, prefilterable filters must be pushed down, otherwise the query fails with an error.
-- Search a text column, returning BM25-like scores in `_score` (larger is better)
SELECT id, text, _score
FROM lance_fts('path/to/dataset.lance', 'text', 'puppy', k = 10, prefilter = true)
ORDER BY _score DESC;Signature: lance_fts(uri, text_column, query, ...)
Positional arguments:
uri(VARCHAR): Dataset root path or object store URI (e.g.s3://...).text_column(VARCHAR): Text column name.query(VARCHAR): Query string.
Named parameters:
k(BIGINT, default10): Number of results to return. Must be > 0.prefilter(BOOLEAN, defaultfalse): Iftrue, filters are applied before top-k selection.
Output:
- Dataset columns plus
_score(larger is better).
Filter semantics:
- If
prefilter=false, filter pushdown is best-effort. If pushdown fails, the query is retried without pushed filters and DuckDB applies filters for correctness. - If
prefilter=true, prefilterable filters must be pushed down, otherwise the query fails with an error.
-- Combine vector and text scores, returning `_hybrid_score` (larger is better)
SELECT id, _hybrid_score, _distance, _score
FROM lance_hybrid_search(
'path/to/dataset.lance',
'vec',
[0.1, 0.2, 0.3, 0.4]::FLOAT[4],
'text',
'puppy',
k = 10,
prefilter = false,
alpha = 0.5,
oversample_factor = 4
)
ORDER BY _hybrid_score DESC;Signature: lance_hybrid_search(uri, vector_column, query_vector, text_column, query, ...)
Positional arguments:
uri(VARCHAR): Dataset root path or object store URI (e.g.s3://...).vector_column(VARCHAR): Vector column name.query_vector(FLOAT[dim] or DOUBLE[dim], preferred): Query vector (must be non-empty; values are cast to float32).FLOAT[]/DOUBLE[]are also accepted.text_column(VARCHAR): Text column name.query(VARCHAR): Query string.
Named parameters:
k(BIGINT, default10): Number of results to return. Must be > 0.prefilter(BOOLEAN, defaultfalse): Iftrue, filters are applied before top-k selection.alpha(FLOAT, default0.5): Vector/text mixing weight. Larger values weigh vector similarity more heavily.oversample_factor(INTEGER, default4): Oversample factor for candidate generation. If provided, must be > 0.
Output:
- Dataset columns plus
_hybrid_score(larger is better),_distance, and_score.
Filter semantics:
- If
prefilter=false, filter pushdown is best-effort. If pushdown fails, the query is retried without pushed filters and DuckDB applies filters for correctness. - If
prefilter=true, prefilterable filters must be pushed down, otherwise the query fails with an error.
Namespaces let you treat a directory (or a remote namespace service) as a database catalog and access datasets as tables.
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
-- A dataset stored at path/to/dir/my_table.lance becomes ns.main.my_table
SELECT count(*) FROM ns.main.my_table;
SHOW TABLES FROM ns.main;
DETACH ns;ATTACH 'namespace_id' AS ns (TYPE LANCE, ENDPOINT 'http://127.0.0.1:2333');
SHOW TABLES FROM ns.main;
SELECT count(*) FROM ns.main.some_table;
DETACH ns;Write a new dataset (or overwrite an existing one):
COPY (
SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
UNION ALL
SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s
) TO 'path/to/out.lance' (FORMAT lance, mode 'overwrite');Append to an existing dataset:
COPY (SELECT 3::BIGINT AS id, 'c'::VARCHAR AS s)
TO 'path/to/out.lance' (FORMAT lance, mode 'append');Create an empty dataset (schema only):
COPY (
SELECT 1::BIGINT AS id, 'x'::VARCHAR AS s
LIMIT 0
) TO 'path/to/empty.lance' (FORMAT lance, mode 'overwrite', write_empty_file true);Notes:
modesupports at leastoverwriteandappend.write_empty_filecontrols whether an empty dataset is materialized when the input produces zero rows.
When a directory is attached as a namespace, CREATE TABLE and CREATE TABLE AS SELECT write datasets into the namespace root.
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
-- Schema-only (creates an empty dataset)
CREATE OR REPLACE TABLE ns.main.my_empty (id BIGINT, s VARCHAR);
-- CTAS (writes query results)
CREATE OR REPLACE TABLE ns.main.my_dataset AS
SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
UNION ALL
SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s;
SELECT count(*) FROM ns.main.my_dataset;
DETACH ns;These statements apply to tables inside an attached namespace (e.g. ns.main.my_table).
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
INSERT INTO ns.main.my_table VALUES (3::BIGINT, 'c'::VARCHAR);
INSERT INTO ns.main.my_table SELECT 4::BIGINT AS id, 'd'::VARCHAR AS s;
DETACH ns;ATTACH 'path/to/dir' AS ns (TYPE LANCE);
UPDATE ns.main.my_table SET s = 'bb' WHERE id = 2;
UPDATE ns.main.my_table SET s = 'x'; -- update all rows
UPDATE ns.main.my_table SET s = DEFAULT WHERE id = 2;
DETACH ns;ATTACH 'path/to/dir' AS ns (TYPE LANCE);
DELETE FROM ns.main.my_table WHERE id <= 2;
DELETE FROM ns.main.my_table; -- delete all rows
DETACH ns;MERGE is supported on Lance tables, including:
WHEN MATCHEDwithUPDATE/DELETE/DO NOTHING/ERRORWHEN NOT MATCHED(BY TARGET) withINSERT/DO NOTHINGWHEN NOT MATCHED BY SOURCEwithUPDATE/DELETE/DO NOTHINGRETURNING(including DuckDB'smerge_actionvirtual column)
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
MERGE INTO ns.main.my_table AS t
USING (
SELECT 2::BIGINT AS id, 'bb'::VARCHAR AS s
UNION ALL
SELECT 3::BIGINT AS id, 'c'::VARCHAR AS s
) AS src
ON t.id = src.id
WHEN MATCHED THEN UPDATE SET s = src.s
WHEN NOT MATCHED THEN INSERT (id, s) VALUES (src.id, src.s);
MERGE INTO ns.main.my_table AS t
USING (SELECT 3::BIGINT AS id) AS src
ON t.id = src.id
WHEN MATCHED THEN DELETE
RETURNING merge_action, id, s;
DETACH ns;Notes:
- Mutating the same target row more than once in a single
MERGEraises a constraint error. MERGEuses a single Lance transaction per statement.
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
TRUNCATE TABLE ns.main.my_table;
DETACH ns;ATTACH 'path/to/dir' AS ns (TYPE LANCE);
DROP TABLE ns.main.my_table;
DROP TABLE IF EXISTS ns.main.my_table;
DETACH ns;Notes:
- Directory namespaces reject unsafe dataset names that attempt path traversal.
Schema evolution:
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
ALTER TABLE ns.main.my_table
ADD COLUMN age_plus_one BIGINT DEFAULT (age + 1);
ALTER TABLE ns.main.my_table RENAME COLUMN score TO score2;
ALTER TABLE ns.main.my_table ALTER COLUMN age TYPE BIGINT;
ALTER TABLE ns.main.my_table DROP COLUMN score2;
DETACH ns;Table and column comments:
ATTACH 'path/to/dir' AS ns (TYPE LANCE);
COMMENT ON TABLE ns.main.my_table IS 'table comment';
COMMENT ON COLUMN ns.main.my_table.age_plus_one IS 'col comment';
DETACH ns;Notes:
ALTER COLUMN ... SET NOT NULLis currently not supported.
The extension supports index DDL targeting a dataset path.
Vector ANN index:
CREATE INDEX vec_idx ON 'path/to/dataset.lance' (vec)
USING IVF_FLAT WITH (num_partitions=1, metric_type='l2');Scalar index:
CREATE INDEX label_idx ON 'path/to/dataset.lance' (label)
USING BTREE;Full-text index:
CREATE INDEX text_idx ON 'path/to/dataset.lance' (text)
USING INVERTED;Notes:
CREATE INDEXcurrently supports a single column.
SHOW INDEXES ON 'path/to/dataset.lance';DROP INDEX vec_idx ON 'path/to/dataset.lance';ALTER INDEX vec_idx
ON 'path/to/dataset.lance'
OPTIMIZE WITH (mode = 'append');Supported mode values:
appendmerge(num_indices_to_mergeis supported)retrain
Maintenance statements accept either:
- A dataset path string literal (for example
'path/to/dataset.lance') - An attached Lance table name (for example
ns.main.my_table)
OPTIMIZE 'path/to/dataset.lance' WITH (
target_rows_per_fragment = 1048576,
max_rows_per_group = 1024,
max_bytes_per_file = 0,
materialize_deletions = true,
materialize_deletions_threshold = 0.1,
num_threads = 0,
batch_size = 0,
defer_index_remap = false
);Returns:
Operation(compact)TargetMetricsJSON(compaction metrics)
VACUUM LANCE 'path/to/dataset.lance' WITH (
older_than_seconds = 1209600,
delete_unverified = false,
error_if_tagged_old_versions = true,
retain_n_versions = 3
);Returns:
Operation(cleanup)TargetMetricsJSON(cleanup metrics such as removed bytes / versions)
ALTER INDEX vec_idx
ON 'path/to/dataset.lance'
OPTIMIZE WITH (
mode = 'merge',
num_indices_to_merge = 4
);Supported mode values:
appendmerge(num_indices_to_mergeis supported)retrain
Returns:
Operation(optimize_index)TargetMetricsJSON
ALTER TABLE 'path/to/dataset.lance'
SET AUTO_CLEANUP WITH (interval = 1, older_than = '1h', retain_versions = 3);
ALTER TABLE 'path/to/dataset.lance' UNSET AUTO_CLEANUP;
SHOW MAINTENANCE ON 'path/to/dataset.lance';SHOW MAINTENANCE returns key/value rows including:
enabledinterval(if configured)older_than(if configured)retain_versions(if configured)