Skip to content

Commit 68642d5

Browse files
committed
Improve glob filtering and docs
1 parent e5f6e89 commit 68642d5

File tree

8 files changed

+320
-210
lines changed

8 files changed

+320
-210
lines changed

CHANGELOG.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,22 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [1.0.2] - 2025-09-17
8+
## [1.0.3] - 2025-10-06
99

10+
### Changed
11+
- Promote the CLI configuration defaults and optional flag handling refinements for general availability.
12+
13+
### Notes
14+
- Approved performance-focused guidance for faster pro preset runs (targeted excludes, tighter budgets, and optional manifest skips).
15+
16+
## [1.0.2] - 2025-10-06
17+
18+
### Added
19+
- **Pyproject-driven defaults**: CLI now reads `[tool.dir2md]` configuration to seed argument defaults before parsing.
20+
- **Legacy TOML support**: Bundled `tomli` for Python 3.10 and earlier so configuration loading works across environments.
21+
22+
### Changed
23+
- **Optional flag handling**: CLI leaves optional switches unset unless provided, preserving config defaults and canonical output filenames.
1024
### Fixed
1125
- **CLI Option Functionality**
1226
- Fixed `--include-glob` having no effect during report generation
@@ -77,4 +91,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7791
- Token budget optimization with LLM modes (off, ref, summary, inline)
7892
- Basic masking capabilities for sensitive data
7993
- Gitignore integration and file filtering
80-
- Manifest generation and statistics reporting
94+
- Manifest generation and statistics reporting
95+
96+
97+
98+
99+

FEATURES.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ Dir2md follows an **Open-Core** model - providing essential functionality for fr
2121

2222
### Core Functionality
2323
- **📁 Directory Scanning**: Complete file tree analysis with `.gitignore` support
24-
- **🎯 Smart Filtering**: Include/exclude/omit glob patterns
24+
- **🎯 Smart Filtering**: Include/exclude/omit glob patterns (gitwildmatch semantics)
2525
- **📊 Token Optimization**: Head/tail sampling with configurable budgets
2626
- **🔄 Duplicate Detection**: SimHash-based content deduplication
2727
- **📋 Manifest Generation**: JSON metadata with file hashes and statistics
2828
- **⏰ Deterministic Output**: `--no-timestamp` for reproducible builds
29-
- **🎨 Multiple Presets**: `iceberg`, `pro`, `raw` (default: `raw` for developers)
29+
- **🎨 Multiple Presets**: `iceberg`, `pro`, `raw` (default: `raw` for developers; raw disables `--emit-manifest`)
3030

3131
### Basic Security
3232
- **🛡️ Essential Masking**: Protection for common secrets

README.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,20 @@ dir2md . --emit-manifest --no-timestamp
7878
dir2md . --budget-tokens 4000 --preset iceberg
7979
```
8080

81+
#### pyproject.toml defaults
82+
83+
Dir2md reads `[tool.dir2md]` from the nearest `pyproject.toml` so teams can share default presets, budgets, and filters.
84+
85+
```toml
86+
[tool.dir2md]
87+
preset = "iceberg"
88+
include_glob = ["src/**/*.py", "tests/**/*.py"]
89+
exclude_glob = ["**/__pycache__/**"]
90+
emit_manifest = true
91+
```
92+
93+
Patterns use gitignore-style `gitwildmatch` semantics. Command-line flags still override any value loaded from the configuration file.
94+
8195
### Output Example
8296

8397
```markdown
@@ -106,7 +120,7 @@ dir2md . --budget-tokens 4000 --preset iceberg
106120

107121
| Preset | Description | Best For |
108122
|--------|-------------|-----------|
109-
| `raw` | Full content inclusion | Development, code review |
123+
| `raw` | Full inline content; manifest disabled | Development, code review |
110124
| `iceberg` | Balanced sampling | General documentation |
111125
| `pro` | Advanced optimization | Large projects, LLM context |
112126

@@ -152,8 +166,9 @@ dir2md [path] -o output.md --preset [iceberg|pro|raw]
152166
--sample-tail 40 # Lines from file end
153167

154168
# Filtering
155-
--include-glob "*.py,*.md" # Include patterns
156-
--exclude-glob "test*,*.tmp" # Exclude patterns
169+
--include-glob "*.py" # Gitignore-style include (gitwildmatch)
170+
--exclude-glob "**/__pycache__/**" # Gitignore-style exclude
171+
--omit-glob "tests/**" # Omit content but keep tree
157172
--only-ext "py,js,ts" # File extensions only
158173

159174
# Security
@@ -165,6 +180,8 @@ dir2md [path] -o output.md --preset [iceberg|pro|raw]
165180
--dry-run # Preview without writing
166181
```
167182

183+
Note: the `raw` preset always forces `--emit-manifest` off. Use the `pro` preset with manual flags if you need inline output plus a manifest.
184+
168185
## 🤝 Contributing
169186

170187
We welcome contributions! Dir2md follows an open-core model:

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "dir2md"
7-
version = "1.0.2"
7+
version = "1.0.3"
88
description = "Generate a Markdown blueprint: directory tree + optional file contents (token-optimized, ICEBERG preset)"
99
readme = "README.md"
1010
authors = [{name = "Flamehaven", email = "info@flamehaven.space"}]
1111
license = {text = "MIT"}
1212
requires-python = ">=3.9"
13-
dependencies = ["pathspec>=0.12.0"]
13+
dependencies = ["pathspec>=0.12.0", 'tomli>=2.0.0; python_version < "3.11"']
1414

1515
[project.scripts]
1616
dir2md = "dir2md.cli:main"
@@ -20,3 +20,4 @@ package-dir = {"" = "src"}
2020

2121
[tool.setuptools.packages.find]
2222
where = ["src"]
23+

src/dir2md/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
from .core import Config, generate_markdown_report
33

44
__all__ = ["__version__", "apply_masking", "Config", "generate_markdown_report"]
5-
__version__ = "1.0.2"
5+
__version__ = "1.0.3"
6+

src/dir2md/cli.py

Lines changed: 152 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,21 @@
11
from __future__ import annotations
22
import argparse, zipfile, hashlib, os
33
from pathlib import Path
4+
from typing import Any
5+
6+
try:
7+
import tomllib as _toml_loader
8+
except ModuleNotFoundError:
9+
try:
10+
import tomli as _toml_loader
11+
except ModuleNotFoundError:
12+
_toml_loader = None
13+
414
from .core import Config, generate_markdown_report
515
from . import __version__
616

717
# Load .env file if it exists (for Pro license and configuration)
818
def load_env_file():
9-
# Try current directory first, then parent directories
1019
current = Path.cwd()
1120
for parent in [current] + list(current.parents):
1221
env_file = parent / '.env'
@@ -17,17 +26,44 @@ def load_env_file():
1726
if line and not line.startswith('#') and '=' in line:
1827
key, value = line.split('=', 1)
1928
os.environ[key.strip()] = value.strip()
20-
break # Stop after first .env file found
29+
break
2130
except Exception:
22-
pass # Silently ignore .env file errors
31+
pass
2332

24-
# Load environment configuration on import
2533
load_env_file()
2634

35+
DEFAULT_OUTPUT = "PROJECT_BLUEPRINT.md"
2736
DEFAULT_EXCLUDES = [
2837
".git", "__pycache__", "node_modules", ".venv",
2938
"build", "dist", "*.pyc", ".DS_Store",
3039
]
40+
_CONFIG_KEYS = {
41+
"path",
42+
"output",
43+
"preset",
44+
"llm_mode",
45+
"budget_tokens",
46+
"max_file_tokens",
47+
"dedup",
48+
"sample_head",
49+
"sample_tail",
50+
"explain",
51+
"include_glob",
52+
"exclude_glob",
53+
"omit_glob",
54+
"only_ext",
55+
"respect_gitignore",
56+
"follow_symlinks",
57+
"max_bytes",
58+
"max_lines",
59+
"emit_manifest",
60+
"stats",
61+
"capsule",
62+
"dry_run",
63+
"no_timestamp",
64+
"masking",
65+
}
66+
3167

3268
def positive_int(v: str) -> int:
3369
try:
@@ -38,75 +74,139 @@ def positive_int(v: str) -> int:
3874
raise argparse.ArgumentTypeError("Only positive integers are allowed.")
3975
return iv
4076

77+
78+
def _load_pyproject_config() -> dict[str, Any]:
79+
if _toml_loader is None:
80+
return {}
81+
82+
decode_error = getattr(_toml_loader, "TOMLDecodeError", ValueError)
83+
current_dir = Path.cwd()
84+
for path in [current_dir] + list(current_dir.parents):
85+
pyproject_path = path / "pyproject.toml"
86+
if not pyproject_path.is_file():
87+
continue
88+
try:
89+
with pyproject_path.open("rb") as handle:
90+
data = _toml_loader.load(handle)
91+
except (decode_error, OSError):
92+
return {}
93+
94+
tool_config = data.get("tool", {}).get("dir2md")
95+
if not isinstance(tool_config, dict):
96+
return {}
97+
98+
sanitized: dict[str, Any] = {}
99+
for raw_key, value in tool_config.items():
100+
key = raw_key.replace('-', '_')
101+
if key not in _CONFIG_KEYS:
102+
continue
103+
if key in {"include_glob", "exclude_glob", "omit_glob"}:
104+
if value is None:
105+
continue
106+
if isinstance(value, list):
107+
sanitized[key] = [str(item) for item in value]
108+
else:
109+
sanitized[key] = [str(value)]
110+
continue
111+
if key == "only_ext":
112+
if value is None:
113+
continue
114+
if isinstance(value, list):
115+
sanitized[key] = ",".join(str(item) for item in value)
116+
else:
117+
sanitized[key] = str(value)
118+
continue
119+
if key in {"budget_tokens", "max_file_tokens", "dedup", "sample_head", "sample_tail", "max_bytes", "max_lines"}:
120+
try:
121+
sanitized[key] = int(value)
122+
except (TypeError, ValueError):
123+
continue
124+
continue
125+
if key in {"respect_gitignore", "follow_symlinks", "emit_manifest", "stats", "capsule", "dry_run", "no_timestamp", "explain"}:
126+
sanitized[key] = bool(value)
127+
continue
128+
sanitized[key] = value
129+
return sanitized
130+
return {}
131+
132+
41133
def main(argv: list[str] | None = None) -> int:
42-
ap = argparse.ArgumentParser(prog="dir2md", description="Directory → Markdown exporter with LLM optimization")
134+
config_from_file = _load_pyproject_config()
135+
136+
ap = argparse.ArgumentParser(prog="dir2md", description="Directory -> Markdown exporter with LLM optimization")
43137
ap.add_argument("path", nargs="?", default=".")
44-
ap.add_argument("-o", "--output", default="PROJECT_BLUEPRINT.md")
45-
46-
# Preset options
47-
ap.add_argument("--preset", default="raw", choices=["iceberg","pro","raw"], help="Preset mode: iceberg/pro/raw")
48-
49-
# Token and selection control
50-
ap.add_argument("--llm-mode", choices=["off","ref","summary","inline"], default=None)
51-
ap.add_argument("--budget-tokens", type=int, default=6000)
52-
ap.add_argument("--max-file-tokens", type=int, default=1200)
53-
ap.add_argument("--dedup", type=int, default=16)
54-
ap.add_argument("--sample-head", type=int, default=120)
55-
ap.add_argument("--sample-tail", type=int, default=40)
138+
ap.add_argument("-o", "--output")
139+
140+
ap.add_argument("--preset", choices=["iceberg", "pro", "raw"], help="Preset mode: iceberg/pro/raw")
141+
142+
ap.add_argument("--llm-mode", choices=["off", "ref", "summary", "inline"])
143+
ap.add_argument("--budget-tokens", type=int)
144+
ap.add_argument("--max-file-tokens", type=int)
145+
ap.add_argument("--dedup", type=int)
146+
ap.add_argument("--sample-head", type=int)
147+
ap.add_argument("--sample-tail", type=int)
56148
ap.add_argument("--explain", action="store_true", help="Include selection rationale and drift_score in capsule comments")
57149

58-
# Filtering and safety controls
59-
ap.add_argument("--include-glob", action="append", default=[])
60-
ap.add_argument("--exclude-glob", action="append", default=[])
61-
ap.add_argument("--omit-glob", action="append", default=[])
62-
ap.add_argument("--only-ext", default="")
150+
ap.add_argument("--include-glob", action="append", help="Gitignore-style include pattern (gitwildmatch syntax)")
151+
ap.add_argument("--exclude-glob", action="append", help="Gitignore-style exclude pattern")
152+
ap.add_argument("--omit-glob", action="append", help="Gitignore-style omit pattern (skips content)")
153+
ap.add_argument("--only-ext", help="Comma-separated extension list (e.g. py,md)")
63154
ap.add_argument("--respect-gitignore", action="store_true")
64155
ap.add_argument("--follow-symlinks", action="store_true")
65-
ap.add_argument("--max-bytes", type=positive_int, default=200_000)
66-
ap.add_argument("--max-lines", type=positive_int, default=2000)
156+
ap.add_argument("--max-bytes", type=positive_int)
157+
ap.add_argument("--max-lines", type=positive_int)
67158

68-
# Output options
69-
ap.add_argument("--emit-manifest", action="store_true")
159+
ap.add_argument("--emit-manifest", action="store_true", help="Write JSON manifest (raw preset overrides to off)")
70160
ap.add_argument("--stats", action="store_true")
71161
ap.add_argument("--capsule", action="store_true", help="Package md+manifest into zip")
72162
ap.add_argument("--dry-run", action="store_true")
73163
ap.add_argument("--no-timestamp", action="store_true", help="Omit timestamp for reproducible output")
74-
ap.add_argument("--masking", choices=["off", "basic", "advanced"], default="off", help="Secret masking mode (advanced requires Pro license)")
164+
ap.add_argument("--masking", choices=["off", "basic", "advanced"], help="Secret masking mode (advanced requires Pro license)")
75165

76166
ap.add_argument("-V", "--version", action="version", version=f"dir2md {__version__}")
77167

168+
if config_from_file:
169+
ap.set_defaults(**config_from_file)
170+
78171
ns = ap.parse_args(argv)
79172

80173
root = Path(ns.path).resolve()
81-
output = Path(ns.output)
82-
only_ext = {e.strip().lstrip('.') for e in ns.only_ext.split(',') if e.strip()} or None
174+
175+
if ns.output:
176+
output = Path(ns.output)
177+
else:
178+
if root.is_dir():
179+
output = root / f"{root.name}.md"
180+
else:
181+
output = Path(DEFAULT_OUTPUT).resolve()
182+
only_ext = {e.strip().lstrip('.') for e in (ns.only_ext or "").split(',') if e.strip()} or None
83183

84184
cfg = Config(
85185
root=root,
86186
output=output,
87-
include_globs=list(ns.include_glob),
88-
exclude_globs=list(ns.exclude_glob) + DEFAULT_EXCLUDES,
89-
omit_globs=list(ns.omit_glob),
90-
respect_gitignore=bool(ns.respect_gitignore),
91-
follow_symlinks=bool(ns.follow_symlinks),
92-
max_bytes=int(ns.max_bytes) if ns.max_bytes else None,
93-
max_lines=int(ns.max_lines) if ns.max_lines else None,
187+
include_globs=list(ns.include_glob or []),
188+
exclude_globs=list(ns.exclude_glob or []) + DEFAULT_EXCLUDES,
189+
omit_globs=list(ns.omit_glob or []),
190+
respect_gitignore=bool(ns.respect_gitignore or False),
191+
follow_symlinks=bool(ns.follow_symlinks or False),
192+
max_bytes=int(ns.max_bytes) if ns.max_bytes is not None else 200_000,
193+
max_lines=int(ns.max_lines) if ns.max_lines is not None else 2000,
94194
include_contents=True,
95195
only_ext=only_ext,
96-
add_stats=bool(ns.stats),
196+
add_stats=bool(ns.stats or False),
97197
add_toc=False,
98198
llm_mode=(ns.llm_mode or "ref"),
99-
budget_tokens=int(ns.budget_tokens),
100-
max_file_tokens=int(ns.max_file_tokens),
101-
dedup_bits=int(ns.dedup),
102-
sample_head=int(ns.sample_head),
103-
sample_tail=int(ns.sample_tail),
199+
budget_tokens=int(ns.budget_tokens) if ns.budget_tokens is not None else 6000,
200+
max_file_tokens=int(ns.max_file_tokens) if ns.max_file_tokens is not None else 1200,
201+
dedup_bits=int(ns.dedup) if ns.dedup is not None else 16,
202+
sample_head=int(ns.sample_head) if ns.sample_head is not None else 120,
203+
sample_tail=int(ns.sample_tail) if ns.sample_tail is not None else 40,
104204
strip_comments=False,
105-
emit_manifest=bool(ns.emit_manifest),
106-
preset=str(ns.preset),
107-
explain_capsule=bool(ns.explain),
108-
no_timestamp=bool(ns.no_timestamp),
109-
masking_mode=str(ns.masking),
205+
emit_manifest=bool(ns.emit_manifest if ns.emit_manifest is not None else True),
206+
preset=str(ns.preset or "raw"),
207+
explain_capsule=bool(ns.explain or False),
208+
no_timestamp=bool(ns.no_timestamp or False),
209+
masking_mode=str(ns.masking or "off"),
110210
)
111211

112212
md = generate_markdown_report(cfg)
@@ -125,8 +225,12 @@ def main(argv: list[str] | None = None) -> int:
125225
try:
126226
print(f"[dir2md] Wrote: {output}")
127227
except UnicodeEncodeError:
128-
print(f"[dir2md] Wrote: (File path contains unprintable characters, but the file was likely created successfully)")
228+
print("[dir2md] Wrote: (File path contains unprintable characters, but the file was likely created successfully)")
129229
return 0
130230

231+
131232
if __name__ == "__main__":
132233
raise SystemExit(main())
234+
235+
236+

0 commit comments

Comments
 (0)