A powerful subtitle file converter that ensures proper UTF-8 encoding with robust support for Arabic and other languages. SubZilla automatically detects the input file encoding and converts it to UTF-8, making it perfect for fixing subtitle encoding issues. Built with SOLID, YAGNI, KISS, and DRY principles in mind.
- Automatic encoding detection.
- Converts subtitle files to UTF-8.
- Supports multiple subtitle formats (
.srt,.sub,.txt). - Strong support for Arabic and other non-Latin scripts.
- Simple command-line interface.
- Batch processing with glob pattern support.
- Parallel processing for better performance.
- Preserves original file formatting.
- Creates backup of original files.
- Node.js (v14 or higher)
- Yarn package manager
# Install globally using yarn
yarn global add subzilla
# Or using npm
npm install -g subzilla# Clone the repository
git clone https://github.com/onyxdevs/subzilla.git
cd subzilla
# Install dependencies (installs all workspace packages)
yarn install
# Build all packages
yarn build
# Run the CLI
yarn start
# Development mode (watch for changes)
yarn dev# Convert a single subtitle file
subzilla convert path/to/subtitle.srt
# The converted file will be saved as path/to/subtitle.utf8.srt
# Strip HTML formatting
subzilla convert input.srt --strip-html
# Strip color codes
subzilla convert input.srt --strip-colors
# Strip style tags
subzilla convert input.srt --strip-styles
# Replace URLs with [URL]
subzilla convert input.srt --strip-urls
# Strip all formatting
subzilla convert input.srt --strip-all
# Create backup and strip formatting
subzilla convert input.srt -b --strip-all
# Create numbered backups instead of overwriting existing backup
subzilla convert input.srt -b --no-overwrite-backup
# Combine multiple strip options
subzilla convert input.srt --strip-html --strip-colorsConvert multiple subtitle files at once using glob patterns:
# Convert all .srt files in current directory
subzilla batch "*.srt"
# Convert files recursively in all subdirectories
subzilla batch "**/*.srt" -r
# Convert multiple formats
subzilla batch "**/*.{srt,sub,txt}" -r
# Specify output directory
subzilla batch "**/*.srt" -o converted/
# Process files in parallel for better performance
subzilla batch "**/*.srt" -p
# Skip existing UTF-8 files
subzilla batch "**/*.srt" -s
# Combine basic options for maximum efficiency
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/
# Advanced Directory Processing
# Limit recursive depth to 2 levels
subzilla batch "**/*.srt" -r -d 2
# Only process files in specific directories
subzilla batch "**/*.srt" -r -i "movies" "series"
# Exclude specific directories
subzilla batch "**/*.srt" -r -x "temp" "backup"
# Preserve directory structure in output
subzilla batch "**/*.srt" -r -o converted/ --preserve-structure
# Complex example combining all features
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \
-d 3 -i "movies" "series" -x "temp" "backup" --preserve-structure
# Strip formatting in batch mode
subzilla batch "**/*.srt" -r --strip-all
# Strip specific formatting in batch mode
subzilla batch "**/*.srt" -r --strip-html --strip-colors
# Create backups and strip formatting
subzilla batch "**/*.srt" -r -b --strip-all
# Create numbered backups instead of overwriting existing ones
subzilla batch "**/*.srt" -r -b --no-overwrite-backup --strip-all
# Complex example with formatting options
subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \
-d 3 -i "movies" "series" -x "temp" "backup" \
--preserve-structure --strip-all -bOptions:
-o, --output-dir <dir>: Save converted files to specified directory.-r, --recursive: Search for files in subdirectories.-p, --parallel: Process files in parallel (faster for many files).-s, --skip-existing: Skip files that already have a UTF-8 version.-d, --max-depth <depth>: Maximum directory depth for recursive search.-i, --include-dirs <dirs...>: Only process files in these directories.-x, --exclude-dirs <dirs...>: Exclude files in these directories.--preserve-structure: Preserve directory structure in output.-b, --backup: Create backup of original files.--no-overwrite-backup: Create numbered backups instead of overwriting existing backup.--strip-html: Strip HTML tags.--strip-colors: Strip color codes.--strip-styles: Strip style tags.--strip-urls: Replace URLs with [URL].--strip-all: Strip all formatting (equivalent to all strip options).
Features:
- Progress bar showing conversion status.
- Per-directory progress tracking.
- Detailed statistics after completion.
- Error tracking and reporting.
- Parallel processing support.
- Skip existing files option.
- Time tracking and performance metrics.
- Directory structure preservation.
- Directory filtering and depth control.
- HTML tag stripping.
- Color code removal.
- Style tag removal.
- URL replacement.
- Whitespace normalization.
- Original file backup.
Example Output:
π Found 25 files in 5 directories...
Converting |==========| 100% | 25/25 | Total Progress
Converting |==========| 100% | 8/8 | Processing movies
Converting |==========| 100% | 7/7 | Processing series/season1
Converting |==========| 100% | 5/5 | Processing series/season2
Converting |==========| 100% | 3/3 | Processing series/specials
Converting |==========| 100% | 2/2 | Processing extras
π Batch Processing Summary:
ββββββββββββββββββββββββββ
Total files processed: 25
Directories processed: 5
β
Successfully converted: 23
β Failed: 1
βοΈ Skipped: 1
β±οΈ Total time: 5.32s
β‘ Average time per file: 0.22s
π Directory Statistics:
ββββββββββββββββββββ
movies:
Total: 8
β
Success: 8
β Failed: 0
βοΈ Skipped: 0
series/season1:
Total: 7
β
Success: 6
β Failed: 1
βοΈ Skipped: 0
series/season2:
Total: 5
β
Success: 5
β Failed: 0
βοΈ Skipped: 0
series/specials:
Total: 3
β
Success: 2
β Failed: 0
βοΈ Skipped: 1
extras:
Total: 2
β
Success: 2
β Failed: 0
βοΈ Skipped: 0
β Errors:
βββββββββ
series/season1/broken.srt: Failed to detect encoding
SubZilla provides flexible backup options to protect your original files:
# Basic backup creation
subzilla convert input.srt -b
# By default, subsequent runs overwrite the existing backup
# First run: creates input.srt.bak
# Second run: overwrites input.srt.bak (clean, no accumulation)
# Create numbered backups instead (legacy behavior)
subzilla convert input.srt -b --no-overwrite-backup
# First run: creates input.srt.bak
# Second run: creates input.srt.bak.1
# Third run: creates input.srt.bak.2
# Configure backup behavior in config file
# .subzillarc:
# output:
# createBackup: true
# overwriteBackup: false # Creates numbered backupsBackup Behavior Summary:
overwriteBackup: true(default): Clean backup management - always overwrites existing backupoverwriteBackup: false: Legacy behavior - creates numbered backups (.bak.1,.bak.2, etc.)- CLI override: Use
--no-overwrite-backupto temporarily disable backup overwriting
# Specify output file (single file conversion)
subzilla convert input.srt -o output.srt
# Get help
subzilla --help
# Get version
subzilla --version
# Get help for specific command
subzilla convert --help
subzilla batch --helpSubZilla supports flexible configuration through YAML files and environment variables. All settings are optional with sensible defaults.
SubZilla looks for configuration files in the following order:
- Path specified via
--configoption .subzillarcin the current directory.subzilla.ymlor.subzilla.yamlsubzilla.config.ymlorsubzilla.config.yaml
Several example configurations are provided in the examples/config directory:
-
Full Configuration (
.subzillarc):input: encoding: auto # auto, utf8, utf16le, utf16be, ascii, windows1256 format: auto # auto, srt, sub, ass, ssa, txt output: directory: ./converted # Output directory path createBackup: true # Create backup of original files overwriteBackup: true # Overwrite existing backup files (default: true) format: srt # Output format encoding: utf8 # Always UTF-8 bom: false # Add BOM to output files lineEndings: lf # lf, crlf, or auto # ... and more settings
-
Minimal Configuration (
minimal.subzillarc):input: encoding: auto format: auto output: directory: ./converted createBackup: true overwriteBackup: true # Overwrite existing backup files format: srt strip: html: true colors: true styles: true batch: recursive: true parallel: true skipExisting: true preserveStructure: true # Maintain directory structure chunkSize: 5
-
Performance-Optimized (
performance.subzillarc):output: createBackup: false # Skip backups overwriteBackup: true # When backups are created, overwrite existing ones overwriteInput: true # Overwrite input files overwriteExisting: true # Don't check existing files batch: parallel: true preserveStructure: false # Flat output structure chunkSize: 20 # Larger chunks retryCount: 0 # No retries failFast: true # Stop on first error
-
Arabic-Optimized (
arabic.subzillarc):input: encoding: windows1256 # Common Arabic encoding output: bom: true # Add BOM for compatibility lineEndings: crlf # Windows line endings batch: includeDirectories: - arabic - Ω Ψ³ΩΨ³ΩΨ§Ψͺ - Ψ£ΩΩΨ§Ω
You can also configure SubZilla using environment variables. Copy .env.example to .env and modify as needed:
# Input Settings
SUBZILLA_INPUT_ENCODING=utf8
SUBZILLA_INPUT_FORMAT=srt
SUBZILLA_INPUT_DEFAULT_LANGUAGE=ar
# Output Settings
SUBZILLA_OUTPUT_DIRECTORY=./output
SUBZILLA_OUTPUT_CREATE_BACKUP=true
# Complex settings use JSON
SUBZILLA_STRIP='{"html":true,"colors":true,"styles":true}'
SUBZILLA_BATCH_INCLUDE_DIRECTORIES='["movies","series"]'Settings are merged in the following order (later ones override earlier ones):
- Default values.
- Configuration file.
- Environment variables.
- Command-line arguments.
encoding: Input file encoding (auto,utf8,utf16le,utf16be,ascii,windows1256).format: Input format (auto,srt,sub,ass,ssa,txt).
directory: Output directory path.createBackup: Create backup of original files.overwriteBackup: Overwrite existing backup files (default:true).format: Output format.encoding: Output encoding (alwaysutf8).bom: Add BOM to output files.lineEndings: Line ending style (lf,crlf,auto).overwriteInput: Overwrite input files.overwriteExisting: Overwrite existing files.
html: Remove HTML tags.colors: Remove color codes.styles: Remove style tags.urls: Replace URLs with[URL].timestamps: Replace timestamps with[TIMESTAMP].numbers: Replace numbers with#.punctuation: Remove punctuation.emojis: Replace emojis with[EMOJI].brackets: Remove brackets.
recursive: Process subdirectories.parallel: Process files in parallel.skipExisting: Skip existing UTF-8 files.maxDepth: Maximum directory depth.includeDirectories: Only process these directories.excludeDirectories: Skip these directories.preserveStructure: Maintain directory structure.chunkSize: Files per batch.retryCount: Number of retry attempts.retryDelay: Delay between retries (ms).failFast: Stop on first error.
SubZilla follows a modular monorepo architecture with clear separation of concerns:
@subzilla/cli
βββ @subzilla/core
β βββ @subzilla/types
βββ @subzilla/types
- @subzilla/types: Foundation package with no dependencies
- @subzilla/core: Depends on types, provides core functionality
- @subzilla/cli: Depends on both core and types, provides user interface
- SOLID Principles: Single responsibility, open/closed, Liskov substitution, interface segregation, dependency inversion
- YAGNI: You Aren't Gonna Need It - avoid over-engineering
- KISS: Keep It Simple, Stupid - prioritize simplicity and clarity
- DRY: Don't Repeat Yourself - shared code in appropriate packages
The monorepo uses TypeScript project references for:
- Faster incremental builds
- Better IDE support
- Proper dependency tracking
- Type-safe cross-package imports
SubZilla is organized as a Yarn Workspaces monorepo with three main packages:
subzilla/
βββ packages/
β βββ cli/ # @subzilla/cli - Command-line interface
β β βββ src/
β β β βββ commands/ # CLI command implementations
β β β βββ constants/# Shared CLI options
β β β βββ registry/ # Command registration system
β β β βββ utils/ # CLI utilities
β β β βββ main.ts # CLI entry point
β β βββ package.json
β βββ core/ # @subzilla/core - Core processing logic
β β βββ src/
β β β βββ utils/ # Output strategies
β β β βββ *.ts # Core services and processors
β β β βββ index.ts # Package exports
β β βββ package.json
β βββ types/ # @subzilla/types - TypeScript definitions
β βββ src/
β β βββ cli/ # CLI-related types
β β βββ core/ # Core functionality types
β β βββ index.ts # Main exports
β β βββ validation.ts # Zod schemas
β βββ package.json
βββ examples/ # Configuration examples
βββ package.json # Workspace root configuration
βββ tsconfig.json # TypeScript project references
Each package has comprehensive documentation:
- @subzilla/cli - Command-line interface with all available commands and options
- @subzilla/core - Core processing services and batch operations
- @subzilla/types - TypeScript interfaces and validation schemas
SubZilla includes a comprehensive Jest testing framework with 83 passing tests across all packages:
# Run all tests
yarn test
# Test specific package
yarn workspace @subzilla/core test
yarn workspace @subzilla/cli test
yarn workspace @subzilla/types testTest Coverage:
- @subzilla/types (13 tests): Zod schema validation, configuration validation
- @subzilla/core (57 tests): Encoding detection/conversion, formatting stripping, end-to-end processing
- @subzilla/cli (13 tests): Command registration, CLI parsing, error handling
Key Features:
- Multi-project Jest setup with TypeScript support
- Real file system testing with temporary directories
- CLI integration tests using
execSync - Proper TypeScript mocking with generic type annotations
- Arabic text encoding tests for Windows-1256 support
- CI/CD integration with GitHub Actions
Workspace-level scripts:
yarn build: Build all packages in dependency orderyarn start: Run the SubZilla CLIyarn dev: Development mode with watch for all packagesyarn test: Run tests across all packagesyarn type-check: TypeScript type checking for all packagesyarn lint: Run linter across all packagesyarn lint:fix: Fix linting issues across all packagesyarn format: Format code using Prettier across all packagesyarn format:check: Check code formatting across all packagesyarn clean: Clean all build artifacts
Package-specific scripts:
# Build specific package
yarn workspace @subzilla/core build
# Run CLI directly
yarn workspace @subzilla/cli start
# Develop specific package
yarn workspace @subzilla/types devThe workspace structure provides several advantages:
- Shared Dependencies: Common dependencies are hoisted to the root, reducing duplication
- Type Safety: Cross-package imports are fully type-checked at compile time
- Atomic Changes: Related changes across packages can be made in a single commit
- Consistent Tooling: Shared linting, formatting, and build configurations
- Simplified Development: Single
yarn installandyarn buildfor the entire project
-
Fork the repository
-
Clone your fork and install dependencies
git clone https://github.com/your-username/subzilla.git cd subzilla yarn install -
Create your feature branch
git checkout -b feature/amazing-feature
-
Make your changes
- Follow the existing code style and patterns
- Add tests for new functionality
- Update documentation as needed
- Ensure all packages build successfully:
yarn build
-
Test your changes
yarn build yarn test yarn lint yarn type-check -
Commit your changes
git commit -m 'Add some amazing feature' -
Push to your branch
git push origin feature/amazing-feature
-
Open a Pull Request
# Start development mode (watches all packages)
yarn dev
# Build specific package
yarn workspace @subzilla/core build
# Test specific package
yarn workspace @subzilla/cli test
# Run CLI during development
yarn start --help
# Clean and rebuild everything
yarn clean
yarn buildThis project is licensed under the ISC License - see the LICENSE file for details.
If you encounter any issues or have questions, please:
- Check the issues page
- Create a new issue if your problem isn't already listed
- Provide as much detail as possible, including:
- SubZilla version
- Node.js version
- Operating system
- Sample subtitle file (if possible)
- Thanks to all contributors.
- Inspired by the need for better subtitle encoding support.
- Built with TypeScript and Node.js.
Planned improvements and feature additions:
-
Enhanced Format Support
- Add support for
.assand.ssasubtitle formats - Handle multiple subtitle files in batch
- Support subtitle format conversion (SRT β ASS β SSA)
- Add WebVTT format support
- Support subtitle timing synchronization
- Add support for
-
User Interface & Experience
- Interactive CLI mode with comprehensive commands
- Progress bars for batch operations
- Create a web interface for browser-based conversion
- Build a native macOS app using Electron
- Add drag-and-drop GUI interface
- Implement real-time encoding preview
-
Performance & Reliability
- Parallel processing for batch operations
- Configurable chunk size for parallel processing
- Retry mechanism for failed conversions
- Batch processing progress tracking and statistics
- Memory usage optimization for large files
- Streaming processing for very large subtitle files
- Performance benchmarking and profiling tools
- Caching mechanism for repeated operations
-
Advanced Features
- Comprehensive subtitle validation with Zod schemas
- Extensive formatting stripping (HTML, colors, styles, emojis)
- Subtitle timing adjustment and synchronization
- Subtitle merging and splitting
- Character encoding preview and detection confidence
- JSON/CSV export for batch processing results
- AI-powered subtitle translation integration
- Subtitle quality analysis and scoring
-
Developer Experience & Infrastructure
- Comprehensive test suite (83 tests across all packages)
- TypeScript monorepo with project references
- Detailed API documentation for all packages
- Configuration examples and templates
- GitHub Actions CI/CD workflow
- Automated release management
- Performance regression testing
- Docker containerization
- Plugin system for custom processors
- Webhook integration for automated workflows
-
Integration & Ecosystem
- VS Code extension for subtitle editing
- API server mode for remote processing
- Integration with popular media players
- Cloud storage integration (S3, Google Drive, Dropbox)
- Batch processing via file watching
- Integration with subtitle databases (OpenSubtitles, etc.)
Want to contribute to these enhancements? Check our Contributing section!