Details
Transform Definition
\(\omega_{N}^{k,n} = e^{-2\pi i \frac{k n}{N}}\): Forward transform from space domain to frequency domain
\(\omega_{N}^{k,n} = e^{2\pi i \frac{k n}{N}}\): Backward transform from frequency domain to space domain
Complex Number Format
SpFFT always assumes an interleaved format in double or single precision. The alignment of memory provided for space domain data is guaranteed to fulfill to the requirements for std::complex (for C++17), C complex types and GPU complex types of CUDA or ROCm.
Indexing
Indices for a dimension of size n must be either in the interval \([0, n - 1]\) or \(\left [ \left \lfloor \frac{n}{2} \right \rfloor - n + 1, \left \lfloor \frac{n}{2} \right \rfloor \right ]\). For Real-To-Complex transforms additional restrictions apply (see next section).
Real-To-Complex Transforms
Only non-redundent z-coloumns on the y-z plane at \(x = 0\) have to be provided. A z-coloumn must be complete and can be provided at either \(y\) or \(-y\).
All redundant values in the z-coloumn at \(x = 0\), \(y = 0\) can be omitted.
Normalization
Normalization is only available for the forward transform with a scaling factor of \(\frac{1}{N_x N_y N_z}\). Applying a forward and backwards transform with scaling enabled will therefore yield identical output (within numerical accuracy).
Optimal sizing
The underlying computation is done by FFT libraries such as FFTW and cuFFT, which provide optimized implementations for sizes, which are of the form \(2^a 3^b 5^c 7^d\) where \(a, b, c, d\) are natural numbers. Typically, smaller prime factors perform better. The size of each dimension is ideally set accordingly.
Data Distribution
MPI Exchange
The MPI exchange is based on a collective MPI call. The following options are available:
- SPFFT_EXCH_BUFFERED
Exchange with MPI_Alltoall. Requires repacking of data into buffer. Possibly best optimized for large number of ranks by MPI implementations, but does not adjust well to non-uniform data distributions.
- SPFFT_EXCH_COMPACT_BUFFERED
Exchange with MPI_Alltoallv. Requires repacking of data into buffer. Performance is usually close to MPI_alltoall and it adapts well to non-uniform data distributions.
- SPFFT_EXCH_UNBUFFERED
Exchange with MPI_Alltoallw. Does not require repacking of data into buffer (outside of the MPI library). Performance varies widely between systems and MPI implementations. It is generally difficult to optimize for large number of ranks, but may perform best in certain conditions.
Thread-Safety
The creation of Grid and Transform objects is thread-safe only if:
No FFTW library calls are executed concurrently.
In the distributed case, MPI thread support is set to MPI_THREAD_MULTIPLE.
The execution of transforms is thread-safe if
Each thread executes using its own Grid and associated Transform object.
In the distributed case, MPI thread support is set to MPI_THREAD_MULTIPLE.
GPU
Note
Additional environment variables may have to be set for some MPI implementations, to allow GPUDirect usage.
Note
The execution of a transform is synchronized with the default stream.
Multi-GPU
Multi-GPU support is not available for individual transform operations, but each Grid / Transform can be associated to a different GPU. At creation time, the current GPU id is stored internally and used for operations later on. So by either using the asynchronous execution mode or using the multi-transform functionality, multiple GPUs can be used at the same time.