top of page

Research Blog

Buscar




 

Abstract

Capturing 3D details of human pose and body shape from just one monocular image poses a significant challenge in computer vision. Traditional approaches rely on RGB images, which are limited by changes in lighting and obstructions. However, advances in imaging technology have led to novel methods like single-pixel imaging (SPI), which overcome these obstacles. SPI, especially in the near-infrared (NIR) spectrum, excels at detecting 3D human poses. This wavelength can go through clothing and is less affected by lighting changes than visible light, offering a dependable way to capture accurate body shape and pose information, even in challenging environments. In this research, we investigate using an SPI camera operating in the NIR spectrum with time-of-flight (TOF) technology at 850 and 1550 nm wavelengths. This setup is designed to identify humans in low-light conditions. We employ the vision transformers (ViT) model to recognize and extract human features, integrating them into a 3D body model called SMPL-X through deep learning-based 3D body shape regression. To test the effectiveness of NIR-SPI for 3D image reconstruction, we created a lab environment that mimics night conditions, allowing us to explore the potential of NIR-SPI as a vision sensor in outdoor night settings. By analyzing the data from this experiment, we aim to showcase NIR-SPI's capabilities as a powerful tool for nighttime human detection and for capturing precise 3D human body pose and shape.


Human modeling


Using parametric human models, such as SMPL-X, allows for a concise representation of human shapes by utilizing shape and pose parameters to encode variations [6]. The SMPL-X model offers various advantages:


  • It disentangles the human shape and pose, allowing for independent analysis and control of each shape.

  • It avoids modeling rugged and twisted shapes directly, which can pose difficulties for neural network-based methods, by utilizing a skinning process to model deformation and

  • It is differentiable and can be easily integrated with neural networks. For this research, we used SMPL-X as the underlying representation for modeling 3D humans.


Proposed Method


The process used to obtain the 3D human model from NIR-SPI. It involves several steps that use different computer vision techniques to reconstruct a 3D human pose from a single low-resolution image. Here is a detailed explanation of each step:


  • Take a single-pixel low-resolution image. This step involves capturing an image of a human. The image contrast is adjusted to extract the basic shape of the person, and then the background is removed using U2Net. This deep-learning model can accurately segment the foreground and background of an image. Thus, an image segmentation technique isolates the person from the background to obtain the image’s silhouette. This image only shows the person's outline without any details of the surface or texture.

  • Applied over the silhouette image, ViT can identify four human poses: lying, bending, sitting, and standing. Once the pose is identified, it can be used to generate a 3D human pose using the VIBE method, a deep learning model that can estimate the 3D pose of a human from a single image or video.

  • Finally, we can reconstruct the human body shape and pose in 3D space. As discussed above, this can be done using a tool such as SMPL-X.

Fig.1. Overview of the proposed network architecture, which takes NIR single-pixel imaging input and outputs 3D body reconstruction based on SMPL-X shape and pose parameters. The entire network consists of three main modules: (i) NIR-SPI-based image acquisition. (ii) Feature extraction using deep learning: The NIR-SPI image extracts the background to obtain the silhouette. (iii) 3D pose estimation using a regression-based approach: The silhouette image is used to obtain the gait features (shape estimation), which are then used to pose the human using ViT and skeleton joint features. These features are used to pre-define the pose SMPL-X model; from the pre-defined parameters (pose 𝜃, shape 𝛽 and camera s, R, T), the SMPL-X model is fed to the off-the-shelf SMPL-X model to obtain the reconstructed 3D human mesh.


Fig.2. Capture human pose imaging at a distance of 1 m: (a) Capture NIR-SPI imaging of human pose standing, sitting, and bending, (b) silhouette image, and (c) 3D human pose regression based on SMPL-X model.


The proposed methods to obtain a 3D human model from NIRSPI imaging, for human poses such as lying, bending, sitting, and standing. The best accuracy was achieved in the sitting position, with an accuracy of around 91%, as shown in the V2V and MPJPE errors. The results demonstrate the effectiveness of the proposed approach, with limitations in hand positioning due to the low contrast of the NIR-SPI image. However, the level position of the core person detection shows an accurate estimation of the 3D pose of the person through qualitative and quantitative evaluations. These findings highlight the potential of the proposed approach for 3D human modeling from a single low-resolution image.


In comparison, the presented SMPL-X model captures the body, face, and hands jointly, and the SMPL-X approach fits the model to a single NIR-SPI image and 2D joint detections. The results of this work demonstrate the expressivity of SMPL-X in capturing bodies, hands, and faces from NIR-SPI images. However, we observed that the bending and lying pose presented the highest V2V and MPJPE error levels, indicating limitations in the pose parameters θ. Therefore, it is recommended that a compensation model be implemented in future applications. Future work may involve the development of a dataset of in-the-wild SMPL-X fits and the direct regression of SMPL-X parameters from NIR-SPI images.


BibTeX

@article{OsorioQuero:24, author = {Carlos Osorio Quero and Daniel Durini and Jose Rangel-Magdaleno and Jose Martinez-Carranza and Ruben Ramos-Garcia}, journal = {J. Opt. Soc. Am. A},keywords = {Image metrics; Imaging techniques; Machine vision; Single pixel imaging; Three dimensional imaging; Three dimensional reconstruction}, number = {3}, pages = {414--423}, publisher = {Optica Publishing Group}, title = {Enhancing 3D human pose estimation with NIR single-pixel imaging and time-of-flight technology: a deep learning approach}, volume = {41}, month = {Mar}, year = {2024}, url = {https://opg.optica.org/josaa/abstract.cfm?URI=josaa-41-3-414},doi = {10.1364/JOSAA.499933},}
 

Foto del escritorCarlos Osorio

Recent progress in edge computing has been significantly influenced by innovations that have introduced specialized accelerators for achieving high levels of hardware parallelism. This has been particularly impactful in the field of computer imaging (CI), where the use of GPU acceleration plays a crucial role, notably in reconstructing 2D images through techniques such as Single-Pixel Imaging (SPI). Within SPI, the application of algorithms such as compressive sensing (CS), deep learning, and Fourier transformation is essential for the reconstruction of 2D images. These algorithms benefit immensely from parallel processing, which in turn enhances performance by shortening processing times. To optimize the performance of GPUs, strategies such as memory usage optimization, loop unrolling, the creation of efficient kernels to minimize operations, the use of asynchronous operations, and an increase in the utilization of active threads and warps are employed. In laboratory settings, the integration of embedded GPUs is key to improving the efficiency of algorithms on System-on-Chip GPUs (SoC-GPUs). This study emphasizes the accelerated optimization of the fast Harley Transform (FHT) for 2D image reconstruction on the Nvidia Xavier platform. Through the application of various parallelism techniques using PyCUDA, we have managed to triple the processing speed, approaching real-time processing capabilities


Fig.1. Improve the 2D reconstruction process of FHT by implementing diverse optimization methods, with a particular emphasis on leveraging CUDA for parallelizing the algorithms.


Our team has developed a range of optimization methods for enhancing the Fast Hartley Transform (FHT) algorithm on the NVIDIA Xavier NX GPU. We've introduced two distinct kernel types: one to improve the calculation of the Inverse FHT (IKFHT) and another for managing the digit reversal process. Our tests reveal notable execution time disparities for the FHT algorithm on different computing platforms. On a Central Processing Unit (CPU), the FHT algorithm's execution time is 103 ms, while on the GPU, it drops to 45 ms. This significant reduction highlights the GPU's superior parallel processing capabilities, making it exceptionally suited for the FHT algorithm's requirements.

Furthermore, we've adopted a pre-indexing technique to boost the FHT algorithm's efficiency further. Pre-indexing pre-calculates specific frequently used values, thus shortening the duration of each algorithm iteration. With pre-indexing on the CPU, execution time is nearly halved to 43 ms, marking a substantial improvement. For the GPU, this technique reduces processing time to 34 ms, demonstrating its effectiveness in decreasing the computational effort needed for tasks that require quick recalculations and adjustments in image gradients. Memory consumption remains modest on both platforms, though the GPU exhibits slightly higher memory usage. However, pre-processing significantly lowers memory demand, particularly on the GPU, where it falls from 1.92% to 1.24%. Lastly, we measured the speedup percentage, an essential indicator of performance improvement with GPU use over the CPU. Without pre-processing, the GPU's performance is roughly 2.28 times faster than that of the CPU. With pre-processing, this acceleration increases to 2.39 times on the CPU and to 3 times on the GPU. This underscores the advantages of leveraging both the GPU and pre-processing methods for optimizing the FHT algorithm.

Foto del escritorCarlos Osorio

Actualizado: 19 abr 2024

Abstract


Recent advancements in vision technology, especially in Single-Pixel Imaging (SPI) cameras, have caught the eye of many in the tech world. This document delves into this cutting-edge imaging method's progress and potential uses. By leveraging advanced reconstruction algorithms, SPI can create images from compressed data collected by just one detector element. This technology's development towards smaller, more integrated forms has allowed it to be embedded in lightweight mobile devices, broadening its application spectrum. Real-time imaging and video recording achievements enable capturing and examining moving scenes. Through innovative hardware and computational strategies, improvements in sensitivity and resolution have been realized. Applying deep learning techniques further advances the imaging process and helps extract valuable insights from the gathered information. Fields as varied as medical imaging, biophotonics, object identification, tracking, remote sensing, observation of the Earth, industrial inspection, and quality control are reaping the benefits of SPI technology. The ongoing progress in SPI camera technology is set to transform numerous sectors, opening up exciting new possibilities for imaging and analysis.


Challenges Faced by Single-Pixel Imaging Technology


In the last ten years, there has been an impressive leap forward in technologies such as autonomous robots, self-driving cars, and drones. Alongside this, there has been remarkable progress in improving vision systems with innovative methods. Various sensors, like LiDAR, RADAR, thermal, and infrared (IR) cameras, have played a crucial role in these advancements. A groundbreaking development in vision technology is the use of the SPI system. This new approach allows sensors to adjust to various wavelengths, from visible light to near-infrared (IR) and even longer wavelengths. The key benefit of this flexibility is its effectiveness under challenging weather conditions like fog, rain, or low visibility. The SPI system maintains its efficiency by merely changing the light source, without reconfiguring the SPI setup. As commonly presented in academic studies, SPI's potential uses go well beyond theoretical concepts. For instance, it can be applied to create an effective obstacle-detection system for vehicles in foggy or rainy weather. Furthermore, when combined with LiDAR technology, SPI can significantly improve the detail and accuracy of scene reconstruction, achieving a more precise and efficient process by tailoring the number of samples according to the SPI principle.


Fig.1 presents a compelling illustration of the possible future uses of single-pixel camera infrared technology, highlighting its improved image-capturing abilities in a range of difficult lighting and scattering scenarios. Such progress is particularly beneficial in intricate areas such as urban landscapes or dense forests.


Integrating Deep Learning Models into SPI Technology


Single Pixel Imaging (SPI) technology represents a significant leap forward in image capture, utilizing a single-pixel detector alongside structured illumination patterns to achieve innovative results. This technique has broadened the horizons for object detection, segmentation, tracking, and depth mapping applications. Enhancements from recent developments in deep learning models have significantly improved SPI's image reconstruction and analysis capabilities. A key innovation in this space is the Single-Pixel Object Detection (SPOD) method, which employs deep learning algorithms to detect and pinpoint objects within a scene through SPI measurements alone. This method is particularly beneficial when traditional imaging struggles, such as in low light or with limited hardware. Another deep learning application within SPI technology is image segmentation. By training neural networks with segmented images reconstructed from single-pixel data, it's feasible to delineate precise object boundaries, leading to superior-quality segmented images. This advancement aids in better object recognition and analysis across various sectors, including medical imaging and autonomous vehicles.


Fig.2. New era in imaging as Deep Learning Models seamlessly merge with SPI Technology.


Deep learning's impact extends to object tracking through single-pixel imaging as well. Utilizing convolutional neural networks (CNNs) to recognize object motion patterns from SPI data enables tracking algorithms to accurately monitor objects across dynamic environments, overcoming obstacles like occlusions or clutter. This opens up new possibilities for sophisticated tracking in surveillance, robotics, and augmented reality. Depth mapping benefits similarly, with deep learning models trained on extensive datasets of single-pixel depth measurements and accurate ground truth depth maps, facilitating the creation of precise depth maps from sparse data. This is crucial for 3D reconstruction, virtual reality, and autonomous navigation. Integrating deep learning with SPI also explores innovative avenues beyond conventional imaging. Combining SPI with Neural Radiance Fields (NeRF), for instance, allows for the reconstruction of detailed 3D human poses, objects, and 4D spatial-temporal data from limited measurements, finding uses in virtual try-on, computer graphics, and telepresence technologies. Moreover, merging hyperspectral imaging and high-speed video with SPI enables the capturing of multispectral information in dynamic scenes. Employing deep learning for data fusion enhances the analysis and comprehension of complex environments and is useful in environmental monitoring and remote sensing.


Deep learning models further elevate SPI quality through techniques like Deep Image Prior (DIP) and diffusion models, which refine and clarify low-resolution or noisy single-pixel images. These approaches leverage learned priors or diffusion processes to improve image restoration and sharpening, delivering higher fidelity in SPI applications. Overall, the fusion of deep learning models with SPI technology marks a series of breakthroughs in the field. From enhancing object detection, segmentation, and tracking to depth mapping and image quality improvement, deep learning algorithms facilitate more precise and efficient image reconstruction and analysis. These advancements not only broaden the applicability of SPI across various domains but also set the stage for ongoing innovation in imaging technology.


BibTeX

@ARTICLE{10472984,author={Quero, Carlos Osorio and Durini, Daniel and de Jesús Rangel-Magdaleno, José and Martinez-Carranza, José and Ramos-Garcia, Rubén},journal={IEEE Instrumentation & Measurement Magazine}, title={Emerging Vision Technology: SPI Camera an Overview}, year={2024},volume={27},number={2},pages={38-47},keywords={Deep learning;Industries;Visualization;Surveillance;Robotvisionsystems;Transfor ms;Cameras},doi={10.1109/MIM.2024.10472984}}
 




bottom of page