top of page
Buscar
  • Foto del escritorCarlos Osorio

Fast Hartley Transform (FHT) For GPU Acceleration

Recent progress in edge computing has been significantly influenced by innovations that have introduced specialized accelerators for achieving high levels of hardware parallelism. This has been particularly impactful in the field of computer imaging (CI), where the use of GPU acceleration plays a crucial role, notably in reconstructing 2D images through techniques such as Single-Pixel Imaging (SPI). Within SPI, the application of algorithms such as compressive sensing (CS), deep learning, and Fourier transformation is essential for the reconstruction of 2D images. These algorithms benefit immensely from parallel processing, which in turn enhances performance by shortening processing times. To optimize the performance of GPUs, strategies such as memory usage optimization, loop unrolling, the creation of efficient kernels to minimize operations, the use of asynchronous operations, and an increase in the utilization of active threads and warps are employed. In laboratory settings, the integration of embedded GPUs is key to improving the efficiency of algorithms on System-on-Chip GPUs (SoC-GPUs). This study emphasizes the accelerated optimization of the fast Harley Transform (FHT) for 2D image reconstruction on the Nvidia Xavier platform. Through the application of various parallelism techniques using PyCUDA, we have managed to triple the processing speed, approaching real-time processing capabilities


Fig.1. Improve the 2D reconstruction process of FHT by implementing diverse optimization methods, with a particular emphasis on leveraging CUDA for parallelizing the algorithms.


Our team has developed a range of optimization methods for enhancing the Fast Hartley Transform (FHT) algorithm on the NVIDIA Xavier NX GPU. We've introduced two distinct kernel types: one to improve the calculation of the Inverse FHT (IKFHT) and another for managing the digit reversal process. Our tests reveal notable execution time disparities for the FHT algorithm on different computing platforms. On a Central Processing Unit (CPU), the FHT algorithm's execution time is 103 ms, while on the GPU, it drops to 45 ms. This significant reduction highlights the GPU's superior parallel processing capabilities, making it exceptionally suited for the FHT algorithm's requirements.

Furthermore, we've adopted a pre-indexing technique to boost the FHT algorithm's efficiency further. Pre-indexing pre-calculates specific frequently used values, thus shortening the duration of each algorithm iteration. With pre-indexing on the CPU, execution time is nearly halved to 43 ms, marking a substantial improvement. For the GPU, this technique reduces processing time to 34 ms, demonstrating its effectiveness in decreasing the computational effort needed for tasks that require quick recalculations and adjustments in image gradients. Memory consumption remains modest on both platforms, though the GPU exhibits slightly higher memory usage. However, pre-processing significantly lowers memory demand, particularly on the GPU, where it falls from 1.92% to 1.24%. Lastly, we measured the speedup percentage, an essential indicator of performance improvement with GPU use over the CPU. Without pre-processing, the GPU's performance is roughly 2.28 times faster than that of the CPU. With pre-processing, this acceleration increases to 2.39 times on the CPU and to 3 times on the GPU. This underscores the advantages of leveraging both the GPU and pre-processing methods for optimizing the FHT algorithm.

4 visualizaciones0 comentarios

Comments


bottom of page