TFHE-rs v1.1.0
·
650 commits
to main
since this release
Summary
TFHE-rs v1.1.0 brings several new features and improvements on both the CPU & GPU backends:
- CPU: This release introduces new scalar operations including CMUX/Select, subtraction with the scalar on the left, and dot product between a vector of Booleans and scalars. It also adds user-friendly APIs to manage noise squashing.
- GPU: This release adds 128-bit Programmable Bootstrapping (PBS) and upgrades cryptographic parameters to match the CPU standard, now offering a failure probability of 2⁻¹²⁸ for FHE operations.
What's Changed
Breaking changes
Warning
- Integer block rotations and block shift primitives' directions have been inverted to fix their meaning.
- The NTT for the prime
$$2^{64} - 2^{32} + 1$$ now uses new twiddle factors, allowing bit shifts instead of multiplications. Older NTT keys are now incompatible.
New features
CPU
- Add scalar subtraction with the scalar as the left operand in the integer and High-Level API
- Add scalar
Select
in the integer and High-Level API, allowing use of scalar values - Add dot product between vectors of
FheBool
- Add trivial encrypt/decrypt support for string types
- Add chunked
LweBootstrapKey
andSeededLweBootstrapKey
generation for memory-constrained systems - Add a noise squashing API in the integer and High-Level API to support use cases requiring noise flooding
- Add the
extended-types
feature, enabling more static typing in the High-Level API - Add GLWE keyswitch primitives
GPU
- Implement
fft128
in the CUDA backend - Implement 128-bit classic PBS
Improvements
CPU
- The NTT for the Solinas prime
$$2^{64} - 2^{32} + 1$$ now uses twiddles enabling bit shifts instead of costly multiplications - Removed usage of
unwrap
in various conformance checks
GPU
- Add modulus-switch noise reduction on GPU for the classical PBS
- Update GPU cryptographic parameters to reach a 2⁻¹²⁸ probability of failure, as on CPU
- Use hexes to initialize twiddles for 64-bit FFT for better precision
- Refactor
double2
operators to use CUDA intrinsics and match CPU floating-point arithmetic - Track degree and noise level in all integer operations in the CUDA backend
- Fix block comparison logic with zero to match the CPU implementation
- Retain LUT indexes on the CPU for each LUT application to avoid copying them back from GPU
- Add alias for GPU compression parameters
- Detect first/last iteration of split-kernel multi-bit & classical PBS via template argument
- Detect first/last iteration of 128-bit PBS via template argument
- Modify integer & ERC20 throughput benchmarks for better multi-GPU performance
Fixes
CPU
- Fix a corner case in encryption where negative values were sometimes not sign-extended
GPU
- Fix max shared memory bug for cooperative-groups PBS
Resources
- https://docs.zama.ai/tfhe-rs/configuration/run_on_gpu/multi_gpu
- https://docs.zama.ai/tfhe-rs/fhe-computation/operations/dot-product
- https://docs.zama.ai/tfhe-rs/fhe-computation/operations/ternary-conditional-operations
- https://docs.zama.ai/tfhe-rs/developers/contributing
- https://docs.zama.ai/tfhe-rs/get-started/benchmarks