Skip to content

TFHE-rs v1.1.0

Compare
Choose a tag to compare
@IceTDrinker IceTDrinker released this 10 Apr 12:50
· 650 commits to main since this release
tfhe-rs-1.1.0

Summary

TFHE-rs v1.1.0 brings several new features and improvements on both the CPU & GPU backends:

  • CPU: This release introduces new scalar operations including CMUX/Select, subtraction with the scalar on the left, and dot product between a vector of Booleans and scalars. It also adds user-friendly APIs to manage noise squashing.
  • GPU: This release adds 128-bit Programmable Bootstrapping (PBS) and upgrades cryptographic parameters to match the CPU standard, now offering a failure probability of 2⁻¹²⁸ for FHE operations.

What's Changed

Breaking changes

Warning

  • Integer block rotations and block shift primitives' directions have been inverted to fix their meaning.
  • The NTT for the prime $$2^{64} - 2^{32} + 1$$ now uses new twiddle factors, allowing bit shifts instead of multiplications. Older NTT keys are now incompatible.

New features

CPU

  • Add scalar subtraction with the scalar as the left operand in the integer and High-Level API
  • Add scalar Select in the integer and High-Level API, allowing use of scalar values
  • Add dot product between vectors of FheBool
  • Add trivial encrypt/decrypt support for string types
  • Add chunked LweBootstrapKey and SeededLweBootstrapKey generation for memory-constrained systems
  • Add a noise squashing API in the integer and High-Level API to support use cases requiring noise flooding
  • Add the extended-types feature, enabling more static typing in the High-Level API
  • Add GLWE keyswitch primitives

GPU

  • Implement fft128 in the CUDA backend
  • Implement 128-bit classic PBS

Improvements

CPU

  • The NTT for the Solinas prime $$2^{64} - 2^{32} + 1$$ now uses twiddles enabling bit shifts instead of costly multiplications
  • Removed usage of unwrap in various conformance checks

GPU

  • Add modulus-switch noise reduction on GPU for the classical PBS
  • Update GPU cryptographic parameters to reach a 2⁻¹²⁸ probability of failure, as on CPU
  • Use hexes to initialize twiddles for 64-bit FFT for better precision
  • Refactor double2 operators to use CUDA intrinsics and match CPU floating-point arithmetic
  • Track degree and noise level in all integer operations in the CUDA backend
  • Fix block comparison logic with zero to match the CPU implementation
  • Retain LUT indexes on the CPU for each LUT application to avoid copying them back from GPU
  • Add alias for GPU compression parameters
  • Detect first/last iteration of split-kernel multi-bit & classical PBS via template argument
  • Detect first/last iteration of 128-bit PBS via template argument
  • Modify integer & ERC20 throughput benchmarks for better multi-GPU performance

Fixes

CPU

  • Fix a corner case in encryption where negative values were sometimes not sign-extended

GPU

  • Fix max shared memory bug for cooperative-groups PBS

Resources