- Developer Guide
So you want to contribute code to neural-sparse-cpp? Excellent! We're glad you're here. Here's what you need to do.
Fork opensearch-project/neural-sparse-cpp and clone locally.
Example:
git clone https://github.com/[your username]/neural-sparse-cpp.gitneural-sparse-cpp requires C++20 and uses CMake as its build system. You will need:
- A C++20 compatible compiler (with OpenMP support version 2 or higher), such as GCC 11+, Clang 14+, or MSVC 2022+
- CMake 3.15 or higher
- OpenMP
- SWIG (only if building Python bindings)
Linux (Ubuntu/Debian)
sudo apt update
sudo apt install -y g++ cmake libomp-dev swig libabsl-devmacOS
brew install cmake libomp swig abseilNote: Abseil is an optional system dependency. If not found, CMake will automatically fetch it from GitHub during the build.
If you plan to build or use the Python bindings, you will also need:
- Python 3.8+ with development headers
- pip
Linux (Ubuntu/Debian)
sudo apt install -y python3-dev python3-pipNote: Replace
python3-devwith your specific version package (e.g.,python3.12-dev) if needed.
Configure and build the project using CMake:
cmake -S . -B build
cmake --build build -j| Option | Default | Description |
|---|---|---|
NSPARSE_OPT_LEVEL |
generic |
SIMD optimization level |
NSPARSE_ENABLE_PYTHON |
OFF |
Build Python bindings |
NSPARSE_ENABLE_TESTS |
OFF |
Build unit tests |
NSPARSE_ENABLE_BENCHMARKS |
OFF |
Build benchmarks |
Example with multiple options:
cmake -S . -B build -DNSPARSE_ENABLE_TESTS=ON -DNSPARSE_OPT_LEVEL=avx2
cmake --build build -jThe NSPARSE_OPT_LEVEL option controls which SIMD instruction sets are compiled:
| Value | Architecture | Description |
|---|---|---|
generic |
Any | No SIMD specialization (default) |
avx2 |
x86_64 | AVX2 + FMA + F16C + POPCNT |
avx512 |
x86_64 | AVX-512 (F, CD, VL, DQ, BW) + AVX2 |
sve |
ARM (non-Apple) | Scalable Vector Extension |
Note: ARM NEON is used automatically on ARM platforms. SVE is not supported on Apple Silicon.
Build with tests enabled and run:
cmake -S . -B build -DNSPARSE_ENABLE_TESTS=ON
cmake --build build -j
ctest --test-dir build --output-on-failureTo run specific test suites using GoogleTest filters:
./build/tests/nsparse_test --gtest_filter="SparseVectors*"
./build/tests/nsparse_test --gtest_filter="SeismicIndex*"Build with benchmarks enabled and run:
cmake -S . -B build -DNSPARSE_ENABLE_BENCHMARKS=ON
cmake --build build -j
./build/benchmarks/nsparse_benchmarkOn Linux, the benchmarks support hardware performance counters via libpfm. Install libpfm4-dev to enable this.
python3 -m venv venv
source venv/bin/activate
pip install -r nsparse/python/requirements.txt
cmake -S . -B build -DNSPARSE_ENABLE_PYTHON=ON -DNSPARSE_OPT_LEVEL=avx2
cmake --build build -j
cd build/nsparse/python
pip install .conda create -n nsparse python=3.12 numpy
conda activate nsparse
cmake -S . -B build -DNSPARSE_ENABLE_PYTHON=ON -DNSPARSE_OPT_LEVEL=avx2
cmake --build build -j
cd build/nsparse/python
pip install .After building and installing, you can run the demo scripts:
python demos/seismic_sq.py
python demos/seismic_sq_idmap.py
python demos/seismic_sq_idmap_idselector.pyFor debugging with GDB or LLDB, build in Debug mode:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DNSPARSE_ENABLE_TESTS=ON
cmake --build build -jThen attach your debugger to the test binary:
# GDB
gdb ./build/tests/nsparse_test
# LLDB
lldb ./build/tests/nsparse_testIn VS Code, you can set breakpoints and debug directly from the IDE using the test or benchmark targets.
| Dependency | Purpose | Acquisition |
|---|---|---|
| Abseil | Hash containers (flat_hash_set, flat_hash_map) |
System or auto-fetched |
| GoogleTest | Unit testing framework | Auto-fetched via CMake |
| Google Benchmark | Benchmarking framework | Auto-fetched via CMake |
| OpenMP | Parallelism | System package |
| SWIG | Python bindings generation | System package |
See CONTRIBUTING.
Class names should use CamelCase. File names should use snake_case.
Header files use the .h extension and source files use .cpp.
Try to put new classes into existing directories if the directory name abstracts the purpose of the class. The project is organized as follows:
nsparse/— Core library (index types, sparse vectors, inverted index)nsparse/cluster/— Clustering algorithms (k-means, inverted list clusters)nsparse/invlists/— Inverted list storagensparse/io/— Serialization and I/Onsparse/utils/— Utilities (distance functions, SIMD, quantization, ranker)nsparse/python/— Python bindings (SWIG)
Organize code into small classes and methods with a single concise purpose. Prefer multiple small methods over a single long one that does everything.
Document your code. That includes the purpose of new classes, every public method, and code sections that have critical or non-trivial logic.
Use C++ style comments:
/**
* Brief description of the class/method.
*
* @param name Description of parameter
* @return Description of return value
*/The project uses Google C++ Style as a base with 4-space indentation, configured via .clang-format:
BasedOnStyle: Google
IndentWidth: 4
AccessModifierOffset: -4
Additional conventions:
- Use descriptive names for classes, methods, fields, and variables.
- Avoid abbreviations unless they are widely accepted.
- Use
constwherever possible. - Prefer smart pointers (
std::unique_ptr,std::shared_ptr) over raw pointers for ownership. - Use
overrideon all overridden virtual methods. - SWIG
.ifiles are excluded from formatting (see.clang-format-ignore).
The project uses clang-format for code formatting and clang-tidy for static analysis.
To format code:
# Format a single file
clang-format -i nsparse/index.cpp
# Format all source files
find nsparse -name '*.cpp' -o -name '*.h' | xargs clang-format -iTo run static analysis:
# Run clang-tidy on a single file (requires compile_commands.json)
cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
clang-tidy -p build nsparse/index.cppThe .clang-tidy configuration enables checks from bugprone-*, modernize-*, performance-*, and readability-* categories.
Write unit tests for your new functionality using GoogleTest. Tests live in the tests/ directory with the naming convention <module>_test.cpp.
Unit tests are preferred as they are fast and cheap. Try to cover all possible combinations of parameters.
If your changes could affect backward compatibility, please include relevant tests along with your PR.
Do not submit code that is not used or needed, even if it's commented. We rely on GitHub as a version control system; code can be restored if needed.