noesisnoema-pipeline is a minimal, practical pipeline for:
- Fetching GGUF LLMs via the Hugging Face CLI – to run with llama.cpp–compatible runtimes on iOS/desktop/server.
- Building a RAGpack (chunks + embeddings) – split documents, embed them, and ship as a
.zip
your apps can load.
🎥 Demo video: Watch on YouTube
- Safely download GGUF (often quantized) community models from Hugging Face.
- Produce a RAGpack v1.1 (
chunks.json
,embeddings.npy
,citations.jsonl
,manifest.json
). - (Optional) Execute the same workflow on Google Colab using our helper notebook.
- Precise Citations: Paragraph boundaries, character offsets, and optional span‑level source mapping for highlighting.
- Rich Metadata: Embedder version, chunker parameters, indexing timestamps, and source diversity metrics.
- Preview Support: Snippet extraction with context for DeepSearch UI and API.
- Validation: Built‑in CLI validation with
nn-pack validate
including schema checks. - Backward Compatible: Automatically handles v1.0 RAGpacks with clear deprecation warnings.
- macOS / Linux (Windows works best via WSL)
- Python 3.10+ (CLI usage also works on 3.8+)
git
- Create an account: https://huggingface.co/join
- Issue a token: Settings → Access Tokens → New token
- Role: Read
- Prefer Fine‑grained and enable Gated repos: Read (required for Meta Llama and other gated repos).
- For gated models, visit the model page and Accept the license/usage policy.
python -m pip install -U "huggingface_hub[cli]"
# or, if you prefer pipx
# pipx install 'huggingface_hub[cli]'
huggingface-cli login # paste your token when prompted
huggingface-cli whoami # sanity check
For faster downloads, enable the HF Transfer extension:
python -m pip install -U hf_transfer export HF_HUB_ENABLE_HF_TRANSFER=1
huggingface-cli download janhq/Jan-v1-4B-GGUF-Q4_K_M \
--include "*Q4_K_M.gguf" \
--local-dir models/jan-v1-4b
- TinyLlama (lightweight / quick check)
# Example community GGUF repo
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF \
--include "*Q4_K_M.gguf" \
--local-dir models/tinyllama-1.1b
Verify
ls -lh models/<your_model_dir>
shasum -a 256 models/<your_model_dir>/*.gguf # optional integrity check
Why the CLI over
git clone
?
Large LFS repos often include many artifacts you don’t need.huggingface-cli download --include
pulls only what you ask for and avoids common failures/timeouts.
Use the notebook under notebooks/
to turn your documents into a self‑contained RAGpack. Output files:
chunks.json
— split text using improved token-based chunkingembeddings.npy
— NumPy embeddings (fast to load)embeddings.csv
— CSV embeddings (easy to load from Swift/iOS, etc.)metadata.json
— enhanced with chunking parameters
The chunker now uses token-based splitting with configurable overlap instead of simple character-based splitting:
- Chunk size: Configure in tokens (default 512) for better LLM compatibility
- Overlap: Configurable token overlap (default 50) for context preservation
- Smart boundaries: Attempts to break at sentence boundaries when possible
- Unicode support: Proper handling of non-ASCII text, emojis, and multiple languages
For more details, see chunker/README.md
.
RAGpack is model‑agnostic and independent of the GGUF download step.
You can do the same on Colab using the helper notebook. Choose a repo_id
and download .gguf
files directly to a mounted Google Drive folder or local Colab storage.
Notebook: gguf_downloader_colab.ipynb
Usage:
- Upload the notebook to Colab and run the first cell to install dependencies.
- (Optional) Mount Google Drive if you want to persist models.
- Log in with your HF token (fine‑grained, Read; enable Gated repos: Read if necessary).
- Enter the
repo_id
of the model you want. - The notebook lists
.gguf
files → choose one → Download.
- 403 Forbidden (gated): Accept the license on the model page and ensure your token allows Gated repos: Read.
- Nothing downloads / 404: Double‑check
repo_id
and make sure the repo actually contains.gguf
files. - Slow/unstable: Install
hf_transfer
and setHF_HUB_ENABLE_HF_TRANSFER=1
. Use--resume-download
to continue interrupted downloads. - Colab disk limits: Mount Google Drive and set
--local-dir
to a Drive folder.
noesisnoema-pipeline/
├── notebooks/ # RAGpack notebook(s), Colab‑friendly
├── exported/ # Artifacts (kept empty; has a `.gitkeep`)
├── README.md
└── .gitignore
.gitignore
(excerpt):
__pycache__/
.ipynb_checkpoints/
*.pyc
*.pyo
*.pyd
.env
.venv
.DS_Store
*.log
*.csv
*.npy
*.jsonl
*.gguf
exported/
models/
dist/
build/
This project provides tools (pipelines, utilities, and examples) for creating RAGpacks and experimenting with Retrieval‑Augmented Generation (RAG). No copyrighted texts, PDFs, or derivative datasets are included in this repository.
Demonstration videos (YouTube) are included in the README for educational purposes; they do not distribute copyrighted materials, only show the workflow.
Users are responsible for ensuring that their use of this project complies with applicable copyright and data‑protection laws in their jurisdiction. For example, creating embeddings from copyrighted works may be permissible for private research or experimentation (e.g., under "text and data mining" exceptions), but redistribution of the original texts or derived chunks is typically prohibited.
This repository and its maintainers do not provide legal advice. Use at your own risk.
MIT License (see LICENSE
). Each model retains its own license; always follow the model’s Hugging Face page.
- Hugging Face and the OSS community.
- All contributors to NoesisNoema / RAGfish.
- Precise Citations: Paragraph boundaries, character offsets, and optional span‑level source mapping for highlighting.
- Rich Metadata: Embedder version, chunker parameters, indexing timestamps, and source diversity metrics.
- Preview Support: Snippet extraction with context for DeepSearch UI and API.
- Validation: Built‑in CLI validation with
nn-pack validate
including schema checks. - Backward Compatible: Automatically handles v1.0 RAGpacks with clear deprecation warnings.