GPU Acceleration
Hardware acceleration for cryptographic operations
GPU Acceleration
Lux crypto libraries support GPU acceleration through Metal (Apple Silicon) and CUDA (NVIDIA) backends.
Overview
GPU acceleration provides 8-50x speedups for computationally intensive cryptographic operations:
- BLS Pairings: Accelerated elliptic curve operations
- NTT/FFT: Fast polynomial multiplication for lattice crypto
- FHE Operations: Homomorphic encryption with GPU parallelism
Backend Selection
import "github.com/luxfi/crypto/gpu"
// Check available backends
backends := gpu.AvailableBackends()
// Returns: ["metal", "cuda", "cpu"]
// Select backend (auto-selects best available)
ctx := gpu.NewContext(gpu.BackendAuto)
// Or specify explicitly
ctx := gpu.NewContext(gpu.BackendMetal) // Apple Silicon
ctx := gpu.NewContext(gpu.BackendCUDA) // NVIDIA
ctx := gpu.NewContext(gpu.BackendCPU) // SIMD fallbackSupported Operations
Array Operations
import "github.com/luxfi/crypto/gpu"
// Create arrays on GPU
a := gpu.NewArray([]float64{1, 2, 3, 4})
b := gpu.NewArray([]float64{5, 6, 7, 8})
// Element-wise operations
c := gpu.Add(a, b)
d := gpu.Mul(a, b)
// Matrix operations
m1 := gpu.NewMatrix([][]float64{{1, 2}, {3, 4}})
m2 := gpu.NewMatrix([][]float64{{5, 6}, {7, 8}})
result := gpu.MatMul(m1, m2)FFT/NTT
import "github.com/luxfi/crypto/gpu"
// Fast Fourier Transform
data := gpu.NewArray(signal)
spectrum := gpu.FFT(data)
recovered := gpu.IFFT(spectrum)
// Number Theoretic Transform (for lattice crypto)
poly := gpu.NewArray(coefficients)
ntt := gpu.NTT(poly, modulus)
intt := gpu.INTT(ntt, modulus)BLS Acceleration
import "github.com/luxfi/crypto/bls"
// GPU-accelerated signing
sig := bls.Sign(privateKey, message) // Uses GPU if available
// Batch verification (highly parallel)
results := bls.VerifyBatch(publicKeys, messages, signatures)
// Aggregate verification
agg := bls.AggregateSignatures(signatures)
valid := bls.VerifyAggregate(publicKeys, message, agg)Performance Benchmarks
Apple M1 Max
| Operation | CPU | Metal GPU | Speedup |
|---|---|---|---|
| BLS Sign | 1.2 ms | 0.15 ms | 8x |
| BLS Verify | 2.5 ms | 0.3 ms | 8x |
| BLS Batch (100) | 250 ms | 15 ms | 17x |
| NTT (n=4096) | 50 μs | 5 μs | 10x |
| NTT (n=65536) | 1 ms | 50 μs | 20x |
| FFT (n=1M) | 100 ms | 5 ms | 20x |
| MatMul (1024x1024) | 500 ms | 10 ms | 50x |
NVIDIA RTX 4090
| Operation | CPU | CUDA GPU | Speedup |
|---|---|---|---|
| BLS Sign | 1.2 ms | 0.1 ms | 12x |
| BLS Verify | 2.5 ms | 0.2 ms | 12x |
| BLS Batch (100) | 250 ms | 8 ms | 31x |
| NTT (n=4096) | 50 μs | 3 μs | 17x |
| NTT (n=65536) | 1 ms | 25 μs | 40x |
| FFT (n=1M) | 100 ms | 2 ms | 50x |
FHE Acceleration
Fully Homomorphic Encryption benefits greatly from GPU acceleration:
import "github.com/luxfi/crypto/fhe"
// Create FHE context with GPU
ctx := fhe.NewContext(fhe.Config{
Backend: fhe.BackendGPU,
Scheme: fhe.CKKS,
Params: fhe.PN14QP438,
})
// Encrypt vectors
ct1 := ctx.Encrypt([]float64{1.0, 2.0, 3.0})
ct2 := ctx.Encrypt([]float64{4.0, 5.0, 6.0})
// Homomorphic operations (run on GPU)
sum := ctx.Add(ct1, ct2) // ~10 μs
prod := ctx.Mul(ct1, ct2) // ~30 μs
rotated := ctx.Rotate(ct1, 1) // ~50 μsFHE Performance
| Operation | CPU | GPU | Speedup |
|---|---|---|---|
| CKKS Encrypt | 500 μs | 50 μs | 10x |
| CKKS Add | 100 μs | 10 μs | 10x |
| CKKS Multiply | 500 μs | 30 μs | 17x |
| CKKS Rotate | 200 μs | 20 μs | 10x |
| TFHE Bootstrap | 20 ms | 1 ms | 20x |
Memory Model
Unified Memory (Metal)
Apple Silicon provides unified memory between CPU and GPU:
// Data automatically available on both CPU and GPU
arr := gpu.NewArray(data)
// No explicit transfers needed
result := gpu.Add(arr, arr)
// Access result on CPU
values := result.ToSlice()Discrete Memory (CUDA)
NVIDIA GPUs have separate memory:
// Explicit transfers for CUDA
arr := gpu.NewArray(data) // Copies to GPU
arr.ToDevice() // Explicit GPU transfer
result := gpu.Add(arr, arr) // Runs on GPU
values := result.ToHost().ToSlice() // Copy back to CPUBuilding with GPU Support
macOS (Metal)
# Metal support is automatic on Apple Silicon
go build -tags=metal ./...
# Test GPU availability
go test -v -run TestGPUAvailable ./gpuLinux (CUDA)
# Install CUDA toolkit first
# https://developer.nvidia.com/cuda-downloads
# Build with CUDA support
CGO_ENABLED=1 go build -tags=cuda ./...
# Test CUDA availability
go test -v -run TestCUDAAvailable ./gpuFallback Behavior
When GPU is unavailable, operations automatically fall back to CPU:
ctx := gpu.NewContext(gpu.BackendAuto)
if ctx.Backend() == gpu.BackendCPU {
log.Println("Running on CPU (GPU not available)")
}
// Operations work the same regardless of backend
result := gpu.Add(a, b)C++ Libraries
For direct C++ usage, see the C++ Libraries documentation.
The Go packages wrap these C++ libraries:
| Go Package | C++ Library |
|---|---|
github.com/luxfi/crypto/gpu | luxcpp/gpu |
github.com/luxfi/crypto/bls | luxcpp/crypto |
Next Steps
- C++ Libraries - Native C++ implementation
- BLS Signatures - Signature aggregation
- Post-Quantum Crypto - Lattice-based algorithms