Fast and accurate language detection CLI tool that analyzes your codebase while automatically filtering out dependencies, build artifacts, and cache files.
Sherlock uses a two-stage intelligent filtering system:
-
Smart Filtering: Automatically skips non-source files using sophisticated pattern matching
- Dependencies:
node_modules/
,venv/
,vendor/
,target/
- Build artifacts:
dist/
,build/
,__pycache__/
,.class
files - IDE/Editor files:
.vscode/
,.idea/
,.git/
- 25+ language ecosystems with deep knowledge of their tooling
- Dependencies:
-
Advanced Language Detection: Identifies programming languages using sophisticated multi-stage analysis
- Extension-based detection with conflict resolution (
.xml
→ XML vs Maven) - Content analysis with advanced pattern scoring (shebangs, syntax patterns, keywords)
- Special files detection (
Dockerfile
,Makefile
,package.json
) - Smart disambiguation for ambiguous files using content patterns
- 100+ supported languages with 98%* accuracy
- Extension-based detection with conflict resolution (
Result: Analyze only your actual source code, not the noise!
# Install from crates.io (recommended)
cargo install sherlock-io
# Or build from source
cargo build --release
cargo install --path .
# Analyze current directory
sherlock
# Analyze specific directory
sherlock /path/to/project
# Set depth limit
sherlock -d 5 /path/to/project
# Output formats
sherlock --format table # default
sherlock --format json
sherlock --format csv
# Options
sherlock --verbose # detailed output
sherlock --include-hidden # include hidden files
sherlock --min-percentage 1.0 # minimum threshold
Language Detection Report
Total Files: 127 (filtered from 50,000+ files)
Total Size: 2.3 MB
Language Files Percentage Size Bar
──────────────────────────────────────────────────────────────────────
Rust 45 35.4% 1.2 MB ███████████████░░░░░ 35.4%
JavaScript 32 25.2% 654.2 KB ██████████░░░░░░░░░░ 25.2%
TypeScript 18 14.2% 423.1 KB █████░░░░░░░░░░░░░░░ 14.2%
JSON 12 9.4% 89.4 KB ███░░░░░░░░░░░░░░░░░ 9.4%
Markdown 8 6.3% 45.2 KB ██░░░░░░░░░░░░░░░░░░ 6.3%
Notice: Only source files analyzed - dependencies, build artifacts, and cache files automatically filtered out!
SherlockIO automatically ignores these non-source files:
Dependencies & Packages
node_modules/
,venv/
,vendor/
,target/
,deps/
__pycache__/
,.pytest_cache/
,.gradle/
,.m2/
Build Artifacts
dist/
,build/
,out/
,bin/
,*.class
,*.o
,*.so
.next/
,.nuxt/
,coverage/
,*.min.js
IDE & Editor Files
.vscode/
,.idea/
,.git/
,.DS_Store
- Lock files:
package-lock.json
,yarn.lock
,Cargo.lock
- Programming: Rust, Python, JavaScript, TypeScript, Go, Java, C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala, Haskell, Elixir, Clojure, and more
- Web: HTML, CSS, SCSS, Vue, React (JSX/TSX), Svelte
- Data: JSON, YAML, TOML, XML, SQL, GraphQL
- Config: Dockerfile, Makefile, CMake, Gradle
- Documentation: Markdown, reStructuredText, AsciiDoc
🚀 Major Detection Improvements
- Fixed XML Detection Bug: XML files are no longer incorrectly filtered out
- Advanced Conflict Resolution: Smart disambiguation between similar file types (XML vs Maven POM files)
- Improved Pattern Scoring: Sophisticated algorithm considering pattern rarity, specificity, and context
- Enhanced Content Analysis: Better detection of languages with shared syntax patterns
- 98%+ Accuracy: Comprehensive testing across all supported languages
🔧 Technical Enhancements
- Multi-language extension support (one extension can map to multiple languages)
- Content-based disambiguation for ambiguous file extensions
- Advanced pattern scoring with keyword specificity bonuses
- Improved shebang detection and special file handling
MIT License