This repository was archived by the owner on Mar 25, 2026. It is now read-only.
feat(vectors): add pgvector embeddings, /vectors page, and bdp-embed Python CLI#1
Open
sebastianstupak wants to merge 41 commits intomainfrom
Open
feat(vectors): add pgvector embeddings, /vectors page, and bdp-embed Python CLI#1sebastianstupak wants to merge 41 commits intomainfrom
sebastianstupak wants to merge 41 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…olumes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Terraform IaC: Hetzner VPS (cx22), persistent primary IP, data volume (prevent_destroy), Storage Box for restic backups, Cloudflare DNS - Cloud-init: Docker, Dokploy install, UFW, restic cron, /etc/dokploy symlink for LE cert persistence - docker-compose: remove standalone traefik, add MinIO (replaces OVH S3), postgres bind mounts - xtask infra: 16 commands (bootstrap, plan, apply, ssh, status, post-deploy, backup-*, logs, update, etc.) - infrastructure/README.md updated for new setup
- .secrets.example: use plain key=value format (no TF_VAR_ prefix) - infra.rs load_env_preamble: parse .secrets and export each key both as key=val (direct) and TF_VAR_key=val (for Terraform) - infra.rs ssh_key_path: read lowercase ssh_key_path= key - bootstrap: reference lowercase $ssh_key_path var - Add .github/workflows/infrastructure.yml for Hetzner Terraform CI with plan/apply/destroy via GitHub Environment secrets (TF_VAR_*) - Remove old OVH infrastructure.yml.disabled (superseded) No .tfvars files — all Terraform vars via TF_VAR_* env vars. GitHub CI stores secrets as TF_VAR_<key> in production environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full design for pgvector-based semantic embeddings across all BDP bioinformatics registry entries, WizMap-style quadtree tile visualization page using regl-scatterplot, and semantic search for MCP integration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ledge graph Full architecture spec covering deck.gl v9 tile-based streaming renderer, FlatBuffers binary protocol, Rust CQRS tile server with PostGIS spatial indexing, offline Louvain+ForceAtlas2 layout pipeline, extensible entity/edge type registry pre-seeded with all future bioinformatics domains, and 9-phase ingestion roadmap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- .secrets.example: all keys now CAPITALIZED_LIKE_THIS (standard .env) - load_env_preamble: lowercases key before TF_VAR_ prefix so Terraform vars match (HCLOUD_TOKEN -> TF_VAR_hcloud_token) - ssh_key_path: reads SSH_KEY_PATH (uppercase) - bootstrap: prints SSH_PUBLIC_KEY= ready-to-paste line for existing keys - Add `cargo xtask infra gen-secrets` — generates all random secrets (passwords + restic passphrase) and prints remaining manual steps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tors page Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ental writes - Add db.py with psycopg async connection helper (get_conn context manager) - Implement embed.py with embed command that: - Fetches unemebedded registry entries from database - Batches entries and calls OpenAI text-embedding-3-small API - Implements exponential backoff for rate limiting - Truncates text to 32k chars for safety - Writes vectors to entry_embeddings table with upsert - Shows progress with tqdm and user-friendly messages - Update cli.py to import and register embed subcommand - Supports DATABASE_URL and OPENAI_API_KEY from environment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pload Implement quadtree tile generation from 2D point projections with: - Vectorized cell assignment for O(N) performance - Adaptive downsampling (fewer points at lower zoom levels) - Multi-level tile generation (zoom 0-14 by default) - S3/MinIO upload with progress tracking - Database status update on completion Includes comprehensive unit tests for tile key generation, point filtering, and quadtree building with progressive downsampling verification. Uncomment tiles import in cli.py to register the new subcommand. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces the vectors feature module with: - GetVectorStatsQuery / VectorStatsResponse following the CQRS pattern (implements Request<Result<…>> and crate::cqrs::middleware::Query) - Live counts from registry_entries and entry_embeddings - Most-recent row from vector_projection_runs - queries/mod.rs with only get_stats (semantic_search, get_neighbors, get_tile added in Tasks 9-10) - vectors/mod.rs with stub comment for routes (added in Task 10) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements SemanticSearchQuery (OpenAI embed → pgvector KNN with moka in-process cache) and GetNeighborsQuery (seed-vector KNN excluding self). Both follow CQRS Query pattern via mediator trait impls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…handlers in mediator
- Create get_tile.rs query: fetches tile bytes from S3 at
vectors/tiles/{run_id}/{z}/{x}/{y}.json using storage.download()
- Create routes.rs: mounts /stats, /search, /:entry_id/neighbors,
/tiles/:run_id/:z/:x/:y with proper error mapping and cache headers
- Register 4 vector handlers in cqrs/mod.rs (get_stats, semantic_search,
get_neighbors, get_tile)
- Add vectors module to features/mod.rs and mount at /vectors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add VectorSidebar component showing point metadata and nearest neighbors, VectorSearchBar with debounced semantic search and centroid fly-to, and a /vectors nav link in the header following the existing icon+text pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…react Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ous notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isabled harness Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entry_embeddings(halfvec(512) + HNSW index),entry_projections(pre-computed 2D coords), andvector_projection_runs(pipeline tracking)tools/bdp-embed):embed(OpenAI → entry_embeddings),project(landmark UMAP → entry_projections via MinIO),tiles(quadtree WizMap tiles → MinIO) subcommandsget_stats,semantic_search(Moka-cached embeddings),get_neighbors(KNN),get_tile(MinIO tile proxy); all registered at/api/v1/vectors/vectorspage with regl-scatterplot WebGL canvas, tile-based loading, source-type legend, sidebar (neighbors), search bar (semantic), and header nav linkTest Plan
cargo xtask db migratethencargo xtask sqlx prepareto generate SQLx metadata for new queriescd tools/bdp-embed && pip install -e .bdp-embed embed --db-url $DATABASE_URL --openai-key $OPENAI_API_KEYto populate embeddingsbdp-embed projectthenbdp-embed tilesto generate projection + tilesGET /api/v1/vectors/stats→ returns JSON with run status and countsGET /api/v1/vectors/search?q=ribosome→ returns semantic search results/vectors→ scatter plot renders with points colored by source type🤖 Generated with Claude Code