GPU Acceleration

Hardware acceleration for cryptographic operations

GPU Acceleration

Lux crypto libraries support GPU acceleration through Metal (Apple Silicon) and CUDA (NVIDIA) backends.

Overview

GPU acceleration provides 8-50x speedups for computationally intensive cryptographic operations:

  • BLS Pairings: Accelerated elliptic curve operations
  • NTT/FFT: Fast polynomial multiplication for lattice crypto
  • FHE Operations: Homomorphic encryption with GPU parallelism

Backend Selection

import "github.com/luxfi/crypto/gpu"

// Check available backends
backends := gpu.AvailableBackends()
// Returns: ["metal", "cuda", "cpu"]

// Select backend (auto-selects best available)
ctx := gpu.NewContext(gpu.BackendAuto)

// Or specify explicitly
ctx := gpu.NewContext(gpu.BackendMetal)  // Apple Silicon
ctx := gpu.NewContext(gpu.BackendCUDA)   // NVIDIA
ctx := gpu.NewContext(gpu.BackendCPU)    // SIMD fallback

Supported Operations

Array Operations

import "github.com/luxfi/crypto/gpu"

// Create arrays on GPU
a := gpu.NewArray([]float64{1, 2, 3, 4})
b := gpu.NewArray([]float64{5, 6, 7, 8})

// Element-wise operations
c := gpu.Add(a, b)
d := gpu.Mul(a, b)

// Matrix operations
m1 := gpu.NewMatrix([][]float64{{1, 2}, {3, 4}})
m2 := gpu.NewMatrix([][]float64{{5, 6}, {7, 8}})
result := gpu.MatMul(m1, m2)

FFT/NTT

import "github.com/luxfi/crypto/gpu"

// Fast Fourier Transform
data := gpu.NewArray(signal)
spectrum := gpu.FFT(data)
recovered := gpu.IFFT(spectrum)

// Number Theoretic Transform (for lattice crypto)
poly := gpu.NewArray(coefficients)
ntt := gpu.NTT(poly, modulus)
intt := gpu.INTT(ntt, modulus)

BLS Acceleration

import "github.com/luxfi/crypto/bls"

// GPU-accelerated signing
sig := bls.Sign(privateKey, message)  // Uses GPU if available

// Batch verification (highly parallel)
results := bls.VerifyBatch(publicKeys, messages, signatures)

// Aggregate verification
agg := bls.AggregateSignatures(signatures)
valid := bls.VerifyAggregate(publicKeys, message, agg)

Performance Benchmarks

Apple M1 Max

OperationCPUMetal GPUSpeedup
BLS Sign1.2 ms0.15 ms8x
BLS Verify2.5 ms0.3 ms8x
BLS Batch (100)250 ms15 ms17x
NTT (n=4096)50 μs5 μs10x
NTT (n=65536)1 ms50 μs20x
FFT (n=1M)100 ms5 ms20x
MatMul (1024x1024)500 ms10 ms50x

NVIDIA RTX 4090

OperationCPUCUDA GPUSpeedup
BLS Sign1.2 ms0.1 ms12x
BLS Verify2.5 ms0.2 ms12x
BLS Batch (100)250 ms8 ms31x
NTT (n=4096)50 μs3 μs17x
NTT (n=65536)1 ms25 μs40x
FFT (n=1M)100 ms2 ms50x

FHE Acceleration

Fully Homomorphic Encryption benefits greatly from GPU acceleration:

import "github.com/luxfi/crypto/fhe"

// Create FHE context with GPU
ctx := fhe.NewContext(fhe.Config{
    Backend: fhe.BackendGPU,
    Scheme:  fhe.CKKS,
    Params:  fhe.PN14QP438,
})

// Encrypt vectors
ct1 := ctx.Encrypt([]float64{1.0, 2.0, 3.0})
ct2 := ctx.Encrypt([]float64{4.0, 5.0, 6.0})

// Homomorphic operations (run on GPU)
sum := ctx.Add(ct1, ct2)      // ~10 μs
prod := ctx.Mul(ct1, ct2)     // ~30 μs
rotated := ctx.Rotate(ct1, 1) // ~50 μs

FHE Performance

OperationCPUGPUSpeedup
CKKS Encrypt500 μs50 μs10x
CKKS Add100 μs10 μs10x
CKKS Multiply500 μs30 μs17x
CKKS Rotate200 μs20 μs10x
TFHE Bootstrap20 ms1 ms20x

Memory Model

Unified Memory (Metal)

Apple Silicon provides unified memory between CPU and GPU:

// Data automatically available on both CPU and GPU
arr := gpu.NewArray(data)

// No explicit transfers needed
result := gpu.Add(arr, arr)

// Access result on CPU
values := result.ToSlice()

Discrete Memory (CUDA)

NVIDIA GPUs have separate memory:

// Explicit transfers for CUDA
arr := gpu.NewArray(data)           // Copies to GPU
arr.ToDevice()                       // Explicit GPU transfer
result := gpu.Add(arr, arr)          // Runs on GPU
values := result.ToHost().ToSlice()  // Copy back to CPU

Building with GPU Support

macOS (Metal)

# Metal support is automatic on Apple Silicon
go build -tags=metal ./...

# Test GPU availability
go test -v -run TestGPUAvailable ./gpu

Linux (CUDA)

# Install CUDA toolkit first
# https://developer.nvidia.com/cuda-downloads

# Build with CUDA support
CGO_ENABLED=1 go build -tags=cuda ./...

# Test CUDA availability
go test -v -run TestCUDAAvailable ./gpu

Fallback Behavior

When GPU is unavailable, operations automatically fall back to CPU:

ctx := gpu.NewContext(gpu.BackendAuto)

if ctx.Backend() == gpu.BackendCPU {
    log.Println("Running on CPU (GPU not available)")
}

// Operations work the same regardless of backend
result := gpu.Add(a, b)

C++ Libraries

For direct C++ usage, see the C++ Libraries documentation.

The Go packages wrap these C++ libraries:

Go PackageC++ Library
github.com/luxfi/crypto/gpuluxcpp/gpu
github.com/luxfi/crypto/blsluxcpp/crypto

Next Steps