This document provides an overview of all the scripts in the scripts/ directory and their usage.
Builds the FTS5 ICU tokenizer for a specific locale or the universal tokenizer.
Usage:
# Build the universal tokenizer
./scripts/build.sh
# Build for a specific locale (e.g., Japanese)
./scripts/build.sh jaBuilds all supported locale-specific tokenizers and the universal tokenizer using API v2 (current).
Usage:
./scripts/build_all.shThis script:
- Builds separate shared libraries for each locale with optimized rules (API v2)
- Places all libraries in the
build/directory - Shows progress and any warnings during the build process
Builds the FTS5 ICU tokenizer using API v1 (legacy) for a specific locale or the universal tokenizer.
Usage:
# Build the universal tokenizer (API v1)
./scripts/build_legacy.sh
# Build for a specific locale (e.g., Japanese, API v1)
./scripts/build_legacy.sh jaBuilds all supported locale-specific tokenizers and the universal tokenizer using API v1 (legacy) only.
Usage:
./scripts/build_all_legacy.shThis script:
- Builds separate shared libraries for each locale with optimized rules (API v1 only)
- Places all libraries in the
build/directory with appropriate suffixes - Shows progress and any warnings during the build process
Builds all supported locale-specific tokenizers and the universal tokenizer for both API v1 and API v2.
Usage:
./scripts/build_all_with_legacy.shThis script:
- Builds separate shared libraries for each locale with optimized rules (API v1 and API v2)
- Places all libraries in the
build/directory with appropriate suffixes - Shows progress and any warnings during the build process
Builds all supported locale-specific tokenizers using the Clang compiler to check portability.
Usage:
./scripts/build_all_with_clang.shThis script:
- Uses Clang instead of GCC for compilation
- Builds all tokenizers to ensure cross-compiler compatibility
- Performs the same operations as
build_all.shbut with Clang
Builds and runs tests for the universal tokenizer and Japanese tokenizer (API v2 by default).
Usage:
./scripts/test.shBuilds and runs tests for the universal tokenizer and Japanese tokenizer using API v1 (legacy).
Usage:
./scripts/test_legacy.shTests all built libraries (API v2 by default) with appropriate sample text.
Usage:
./scripts/test_all.shThis script:
- Tests the universal tokenizer with mixed language content
- Tests each locale-specific tokenizer with sample text in that language
- Reports success or failure for each test
Tests all built libraries using API v1 (legacy) with appropriate sample text.
Usage:
./scripts/test_all_legacy.shThis script:
- Tests the universal tokenizer (API v1) with mixed language content
- Tests each locale-specific tokenizer (API v1) with sample text in that language
- Reports success or failure for each test
Formats all source code files using clang-format and removes trailing whitespace.
Usage:
./scripts/code-format.shThis script:
- Applies clang-format rules to all .c and .h files in the src/ directory
- Removes trailing whitespace from all .c and .h files
- Modifies files in-place using the
-iflag
Performs static code analysis using cppcheck.
Usage:
./scripts/lint-check.shThis script runs cppcheck with the following configuration:
- Exhaustive checking level
- Enables warning, style, performance, and portability checks
- Uses C11 standard compliance
- Provides verbose output
Builds the ICU transliterator test program for development and debugging.
Usage:
./scripts/build_test.shBuilds and runs the ICU transliterator test program.
Usage:
./scripts/run_test.shThis script:
- Builds the test program
- Runs the original test program
- Runs the locale-specific test program
Provides a demonstration of building and testing the Japanese and universal tokenizers.
Usage:
./scripts/demo.shThis script:
- Builds the Japanese tokenizer
- Checks the built libraries
- Tests both Japanese and universal tokenizers
- Shows how to use the implementation in projects