LLMs often ignore instructions to avoid smart quotes, EM/EN dashes, and other symbols. This macOS menu bar app combines spaCy NLP for context-aware processing with a rule-based system to scrub typographic characters from LLM (or any other) output.
See TODO.md for planned improvements.
- Menu Bar: Runs as a menu bar app
- NLP Processing: Uses spaCy for context detection
- Configurable: All character replacements can be customized via JSON config
- Smart Quotes: Replaces
""''with straight quotes"' - Smart Dashes: Converts em dashes
โand en dashesโto hyphens-with context-aware logic - Ellipsis: Replaces
โฆwith three dots... - Symbols: Converts typographic symbols to ASCII equivalents
- Unicode: Handles accented characters by removing diacritics
- Various Others: Supports trademarks, fractions, mathematical symbols, currency, units, and more
- Smart Quotes: Replaces
- Notifications: Shows success/error notifications
- NLP Stats: Built-in performance monitoring and statistics
# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Build and install the app
make build
make install# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Set up environment (handles Python version compatibility and spaCy model)
make setup
# Run the app
make run# Clone the repository
git clone https://github.com/nisc/LLM-output-scrub.git
cd LLM-output-scrub
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies (includes spaCy and English language model)
pip install -e .[dev,build]
# Run the app
PYTHONPATH=src python src/run_app.py- Copy LLM output with smart quotes or typographic characters
- Click the robot icon ๐ค in your menu bar
- Select "Scrub Clipboard" from the menu
- Paste anywhere - now with plain ASCII characters!
The app uses spaCy's natural language processing for context-aware EM dash replacement:
The system uses spaCy's linguistic analysis instead of hardcoded wordlists:
- Part-of-Speech (POS) Analysis: Identifies nouns, verbs, adjectives, etc.
- Dependency Parsing: Understands grammatical relationships
- Sentence Structure Analysis: Detects boundaries and context
- Token-level Processing: Analyzes individual words and their roles
The system detects and handles these EM dash contexts:
- Compound Words:
selfโdrivingโself-driving - Parenthetical/Appositive:
textโadditional infoโmore textโtext, additional info, more text - Emphasis:
The resultโamazinglyโwas perfectโThe result, amazingly, was perfect - Dialogue:
"Hello"โshe saidโ"Hello", she said - Conjunctions:
Aโor BโA, or B - Default Cases:
simpleโtextโsimple-text
All settings can be managed via the app's menu:
- Click the menu bar icon ๐ค and select "Configuration"
- Toggle any setting or sub-setting by number
- Restore defaults with option 0
A JSON config file is also stored at ~/.llm_output_scrub/config.json for advanced/manual editing.
| Setting | Effect |
|---|---|
| Decompose Unicode | Converts composed chars (รฉ) to base + accent (e + ฬ) |
| Remove Accent Marks | Removes combining marks (e + ฬ โ e) |
| Remove All Non-ASCII | Removes any character not in standard ASCII |
| Clean Up Extra Spacing | Normalizes whitespace, trims excess, removes extra blank lines |
| Enable Debug Mode | Shows "NLP Stats" menu item for performance monitoring |
| Category | Replacement |
|---|---|
| Smart Quotes | " " ' ' โ " ' |
| Em Dashes | โ โ - (context-aware, see below) |
| En Dashes | โ โ - |
| Ellipsis | โฆ โ ... |
| Angle Quotes | โน โบ ยซ ยป โ < > << >> |
| Trademarks | โข ยฎ โ (TM) (R) |
| Mathematical | โค โฅ โ โ ยฑ โ <= >= != ~ +/- |
| Fractions | ยผ ยฝ ยพ โ 1/4 1/2 3/4 |
| Footnotes | โ โก โ * ** |
| Units | ร รท โฐ โฑ โ * / per thousand per ten thousand |
| Currency | โฌ ยฃ ยฅ ยข โ EUR GBP JPY cents |
Em Dashes โ Contextual/NLP mode: When enabled (default), EM dashes are replaced using spaCy NLP for context-aware output. When off, a simple hyphen is used. Toggle this in the menu.
make setup # Set up environment
make build # Build the standalone macOS app
make install # Install the app to /Applications
make run # Run the app
make test-unit # Unit tests
make test # Integration tests
make clean # Clean build artifacts
make distclean # Remove all build artifacts and the virtual environment
make uninstall # Remove the app from /Applications- Virtual environment issues: Run
make clean-venv && make setupto recreate the environment. - Import errors: The app uses package-style imports. Run with
make runor manually withPYTHONPATH=src python src/run_app.py.
Follow existing code style, add tests for new features, and run make test-unit before submitting PRs.
llm_output_scrub/
โโโ src/llm_output_scrub/ # Source code
โ โโโ __init__.py # Python init
โ โโโ app.py # Main application
โ โโโ config_manager.py # Configuration management
โ โโโ nlp.py # spaCy-based NLP processing
โ โโโ py.typed # Type hints marker
โโโ src/run_app.py # Entry point script
โโโ tests/ # Test suite
โ โโโ test_scrub.py # Unit tests
โ โโโ integration-test.sh # Integration test script
โ โโโ input.txt # Test input data
โโโ assets/ # App assets (icons, spaCy model)
โโโ typings/ # Type stubs (e.g., rumps.pyi)
โโโ pyproject.toml # Project configuration & dependencies
โโโ setup.py # py2app build configuration
โโโ Makefile # Build commands
โโโ TODO.md # Development roadmap
โโโ LICENSE # MIT license
Key dependencies: rumps (menu bar), pyperclip (clipboard), spacy (NLP), py2app (bundling). See pyproject.toml for full list.
This project is licensed under the MIT License - see the LICENSE file for details.