FTS5 ICU Tokenizer - Scripts Reference

This document provides an overview of all the scripts in the scripts/ directory and their usage.

Build Scripts

`build.sh`

Builds the FTS5 ICU tokenizer for a specific locale or the universal tokenizer.

Usage:

# Build the universal tokenizer
./scripts/build.sh

# Build for a specific locale (e.g., Japanese)
./scripts/build.sh ja

`build_all.sh`

Builds all supported locale-specific tokenizers and the universal tokenizer using API v2 (current).

Usage:

./scripts/build_all.sh

This script:

Builds separate shared libraries for each locale with optimized rules (API v2)
Places all libraries in the build/ directory
Shows progress and any warnings during the build process

`build_legacy.sh`

Builds the FTS5 ICU tokenizer using API v1 (legacy) for a specific locale or the universal tokenizer.

Usage:

# Build the universal tokenizer (API v1)
./scripts/build_legacy.sh

# Build for a specific locale (e.g., Japanese, API v1)
./scripts/build_legacy.sh ja

`build_all_legacy.sh`

Builds all supported locale-specific tokenizers and the universal tokenizer using API v1 (legacy) only.

Usage:

./scripts/build_all_legacy.sh

This script:

Builds separate shared libraries for each locale with optimized rules (API v1 only)
Places all libraries in the build/ directory with appropriate suffixes
Shows progress and any warnings during the build process

`build_all_with_legacy.sh`

Builds all supported locale-specific tokenizers and the universal tokenizer for both API v1 and API v2.

Usage:

./scripts/build_all_with_legacy.sh

This script:

Builds separate shared libraries for each locale with optimized rules (API v1 and API v2)
Places all libraries in the build/ directory with appropriate suffixes
Shows progress and any warnings during the build process

`build_all_with_clang.sh`

Builds all supported locale-specific tokenizers using the Clang compiler to check portability.

Usage:

./scripts/build_all_with_clang.sh

This script:

Uses Clang instead of GCC for compilation
Builds all tokenizers to ensure cross-compiler compatibility
Performs the same operations as build_all.sh but with Clang

Test Scripts

`test.sh`

Builds and runs tests for the universal tokenizer and Japanese tokenizer (API v2 by default).

Usage:

./scripts/test.sh

`test_legacy.sh`

Builds and runs tests for the universal tokenizer and Japanese tokenizer using API v1 (legacy).

Usage:

./scripts/test_legacy.sh

`test_all.sh`

Tests all built libraries (API v2 by default) with appropriate sample text.

Usage:

./scripts/test_all.sh

This script:

Tests the universal tokenizer with mixed language content
Tests each locale-specific tokenizer with sample text in that language
Reports success or failure for each test

`test_all_legacy.sh`

Tests all built libraries using API v1 (legacy) with appropriate sample text.

Usage:

./scripts/test_all_legacy.sh

This script:

Tests the universal tokenizer (API v1) with mixed language content
Tests each locale-specific tokenizer (API v1) with sample text in that language
Reports success or failure for each test

Code Quality Scripts

`code-format.sh`

Formats all source code files using clang-format and removes trailing whitespace.

Usage:

./scripts/code-format.sh

This script:

Applies clang-format rules to all .c and .h files in the src/ directory
Removes trailing whitespace from all .c and .h files
Modifies files in-place using the -i flag

`lint-check.sh`

Performs static code analysis using cppcheck.

Usage:

./scripts/lint-check.sh

This script runs cppcheck with the following configuration:

Exhaustive checking level
Enables warning, style, performance, and portability checks
Uses C11 standard compliance
Provides verbose output

Test Development Scripts

`build_test.sh`

Builds the ICU transliterator test program for development and debugging.

Usage:

./scripts/build_test.sh

`run_test.sh`

Builds and runs the ICU transliterator test program.

Usage:

./scripts/run_test.sh

This script:

Builds the test program
Runs the original test program
Runs the locale-specific test program

Utility Scripts

`demo.sh`

Provides a demonstration of building and testing the Japanese and universal tokenizers.

Usage:

./scripts/demo.sh

This script:

Builds the Japanese tokenizer
Checks the built libraries
Tests both Japanese and universal tokenizers
Shows how to use the implementation in projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FTS5 ICU Tokenizer - Scripts Reference

Build Scripts

`build.sh`

`build_all.sh`

`build_legacy.sh`

`build_all_legacy.sh`

`build_all_with_legacy.sh`

`build_all_with_clang.sh`

Test Scripts

`test.sh`

`test_legacy.sh`

`test_all.sh`

`test_all_legacy.sh`

Code Quality Scripts

`code-format.sh`

`lint-check.sh`

Test Development Scripts

`build_test.sh`

`run_test.sh`

Utility Scripts

`demo.sh`

FilesExpand file tree

SCRIPTS_REFERENCE.md

Latest commit

History

SCRIPTS_REFERENCE.md

File metadata and controls

FTS5 ICU Tokenizer - Scripts Reference

Build Scripts

build.sh

build_all.sh

build_legacy.sh

build_all_legacy.sh

build_all_with_legacy.sh

build_all_with_clang.sh

Test Scripts

test.sh

test_legacy.sh

test_all.sh

test_all_legacy.sh

Code Quality Scripts

code-format.sh

lint-check.sh

Test Development Scripts

build_test.sh

run_test.sh

Utility Scripts

demo.sh

`build.sh`

`build_all.sh`

`build_legacy.sh`

`build_all_legacy.sh`

`build_all_with_legacy.sh`

`build_all_with_clang.sh`

`test.sh`

`test_legacy.sh`

`test_all.sh`

`test_all_legacy.sh`

`code-format.sh`

`lint-check.sh`

`build_test.sh`

`run_test.sh`

`demo.sh`