publications | Saurabh Singh

An up-to date list is available on Google Scholar

2025

GPU Memory Safety

Let-Me-In: (Still) Employing In-pointer Bounds Metadata for Fine-grained GPU Memory Safety

Jaewon Lee, Euijun Chung, Saurabh Singh, and 4 more authors

In 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2025

Abs DOI

The importance of ensuring the robustness of GPU systems has grown significantly, especially as GPUs have become vital in critical decision-making systems such as autonomous driving and medical diagnostics. However, GPU programming languages, primarily based on C/C++, inherit memory vulnerabilities that threaten the robustness of GPU applications. The heterogeneous GPU memory hierarchy makes it more difficult to find effective universal solutions. While several studies have proposed advanced GPU memory safety mechanisms, they still grapple with significant challenges, including substantial metadata storage and access overhead, elevated hardware implementation costs, and limited security coverage, particularly regarding fine-grained memory safety. We address this issue with Let-Me-In (LMI), a fine-grained memory safety mechanism specifically designed for GPUs. LMI features an efficient hardware bounds-checking mechanism that ensures negligible impact on performance and hardware costs, even in scenarios where thousands of concurrent threads perform memory operations across buffers in heap and local memory. This is achieved by aligning memory allocation to powers of two and performing static analysis to identify and mark pointer arithmetic instructions. This approach also enables storing metadata inside the unused upper bits of pointers, which are shrinking due to the expansion of the virtual memory address space. The unique characteristics of GPU programs make this approach feasible, unlike in CPU programs, where the inherent complexity of programs poses challenges. Our evaluation shows that LMI incurs only negligible hardware and performance overhead, making it a practical and efficient solution for enhancing GPU memory safety.

2022

Approx computing

MEGA-MAC: A Merged Accumulation based Approximate MAC Unit for Error Resilient Applications

Vishesh Mishra, Sparsh Mittal, Saurabh Singh, and 2 more authors

In Proceedings of the Great Lakes Symposium on VLSI 2022, Irvine, CA, USA, 2022

Abs DOI Website

This paper proposes a novel merged-accumulation-based approximate MAC (multiply-accumulate) unit, MEGA-MAC, for accelerating error-resilient applications. MEGA-MAC utilizes a novel rearrangement and compression strategy in the multiplication stage and a novel approximate "carry predicting adder" (CPA) in the accumulation stage. Addition and multiplication operations are merged, which reduces the delay. MEGA-MAC provides knobs to exercise a tradeoff between accuracy and resource overhead. Compared to the accurate MAC unit, MEGA-MAC(8,6) (i.e., a MEGA-MAC unit with a chunk size of 6 bits, operating on 8-bit input operands) reduces the power-delay-product (PDP) by 49.4%, while incurring a mean error percentage of only 4.2%. Compared to state-of-art approximate MAC units, MEGA-MAC achieves a better balance between resource-saving and accuracy-loss. The source code is available at https://sites.google.com/view/mega-mac-approximate-mac-unit/.
Approx computing

ART-MAC: Approximate Rounding and Truncation based MAC Unit for Fault-Tolerant Applications

Vishesh Mishra, Divy Pandey, Saurabh Singh, and 4 more authors

In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), 2022

Abs DOI

n recent times, approximate computing has emerged as a promising technique to achieve significant power and energy benefits in computational systems. It is widely employed in fault-tolerant computationally intensive applications that require large arithmetic blocks. Applications such as image processing and machine learning often invoke the Multiply-Accumulate (MAC) unit for convolution operations. This paper proposes a novel architecture for an (unsigned x unsigned) approximate rounding and truncation based MAC unit named ART-MAC. It replaces the accurate multiplier architecture with an approximate multiplier proposed along with this work, thus improving the overall Quality of Results (QoR). The proposed design consumes 35.35% less power and showcases a significant speedup of 1.23 times when compared to the conventional MAC unit. On an average, the ART-MAC consumes 7.44% lesser on-chip area and showcases 13.49% lesser power-delay-product (PDP) compared to existing state-of-the-art designs.
Approx computing

AxLEAP: Enabling Low-Power Approximations Through Unified Power Format

Sagar Satapathy, Saurabh Singh, Kaustav Goswami, and 3 more authors

In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), 2022

Abs DOI

Approximate Computing aims at achieving better performance at a marginal loss of accuracy in error-resilient applications. Several approximate arithmetic circuits have been proposed in the past which use carry prediction schemes, block-based approaches and genetic algorithms. However, these architectures are usually non power-aware and often incur large area overhead with the introduction of re-configurability. This work explores a new facet of approximation, which involves using the Unified Power Format (UPF) model to introduce approximation on additions. We call this methodology AxLEAP. Further, we validate the proposed methodology on a new approximate adder, which we term as AxL-Add. AxL-Add has a simple and re-configurable design with a marginal area overhead of 1.69% over accurate adder. After extensive evaluation, we show that our methodology is up to 67% better in terms of power consumption while providing near accurate results at the end application.
Approx computing

HPAM: An 8-bit High-Performance Approximate Multiplier Design for Error Resilient Applications

Divy Pandey, Vishesh Mishra, Saurabh Singh, and 3 more authors

In 2022 23rd International Symposium on Quality Electronic Design (ISQED), 2022

Abs DOI

In recent times, approximate computing is widely employed in the design of power-aware hardware architectures. Approximate computing techniques can be used to benefit a major class of error-resilient applications. It has emerged as a computing paradigm that can efficiently cater several popular applications that can tolerate bounded imprecision in results. Applications such as image processing, machine learning, and deep learning extensively use multiplication and addition operations on 8-bit numbers. This work proposes an 8-bit High-Performance Approximate Multiplier (HPAM) for error resilient applications. HPAM is capable of providing significant speedup at application end while simultaneously maintaining high accuracy standards. It is designed the motivation of providing an broad error bound thus making it worthy in catering applications with high accuracy demands as well as low accuracy standards. Additionally, an approximate version of conventional ripple carry adder (RCA), a Segmented Ripple Carry Approximate Adder (SRCA) is also proposed along with this work. To validate the efficacy of the proposed design, its performance is compared with the conventional Wallace tree multiplier and the existing state-of-the-art designs such as TOSAM, DSM, and LETAM. On average, HPAM provides a speedup of 27.08% and 48.06% more accurate results in comparison to the existing state-of-the-art designs.
Approx computing

EFCSA: An Efficient Carry Speculative Approximate Adder with Rectification

Saurabh Singh, Vishesh Mishra, Sagar Satapathy, and 4 more authors

In 2022 23rd International Symposium on Quality Electronic Design (ISQED), 2022

Abs DOI

Approximate computing offers the flexibility to trade-off accuracy for computational speed, reduced power consumption, and lesser on-chip area. Such techniques have accumulated extensive attention in recent times as these can be used in most error-resilient applications. Although several approximate adder designs have been proposed in the past, there still exists scope for further improvement. Existing state-of-the-art designs often involve a trade-off between the margin of acceptable error and its Quality of Results (QoR). This paper proposes an approximate adder with higher accuracy and better QoR for error-resilient applications called an efficient reconfigurable carry speculative approximate adder with rectification, or simply EFCSA adder. Its reconfigurable sister version, called REFCSA adder, is inherently reconfigurable, allowing accurate configuration during runtime. The proposed design aims to limit the carry chain’s length in the conventional ripple carry adder (RCA) using a block-based mechanism. EFCSA showcases results that are 12.3x faster than the conventional RCA. On average, the adder is 45.1% more accurate and has 31.97% better power-delay-product (PDP) than several existing state-of-the-art approximate designs.
Approx computing

HPAM: An 8-bit High-Performance Approximate Multiplier Design for Error Resilient Applications

Divy Pandey, Vishesh Mishra, Saurabh Singh, and 3 more authors

In 2022 23rd International Symposium on Quality Electronic Design (ISQED), 2022

Abs DOI

In recent times, approximate computing is widely employed in the design of power-aware hardware architectures. Approximate computing techniques can be used to benefit a major class of error-resilient applications. It has emerged as a computing paradigm that can efficiently cater several popular applications that can tolerate bounded imprecision in results. Applications such as image processing, machine learning, and deep learning extensively use multiplication and addition operations on 8-bit numbers. This work proposes an 8-bit High-Performance Approximate Multiplier (HPAM) for error resilient applications. HPAM is capable of providing significant speedup at application end while simultaneously maintaining high accuracy standards. It is designed the motivation of providing an broad error bound thus making it worthy in catering applications with high accuracy demands as well as low accuracy standards. Additionally, an approximate version of conventional ripple carry adder (RCA), a Segmented Ripple Carry Approximate Adder (SRCA) is also proposed along with this work. To validate the efficacy of the proposed design, its performance is compared with the conventional Wallace tree multiplier and the existing state-of-the-art designs such as TOSAM, DSM, and LETAM. On average, HPAM provides a speedup of 27.08% and 48.06% more accurate results in comparison to the existing state-of-the-art designs.

2020

Approx computing

An Approximate Carry Estimating Simultaneous Adder with Rectification

Rajat Bhattacharjya, Vishesh Mishra, Saurabh Singh, and 2 more authors

In Proceedings of the 2020 on Great Lakes Symposium on VLSI, Virtual Event, China, 2020

Abs DOI

Approximate computing has in recent times found significant applications towards lowering power, area, and time requirements for arithmetic operations. Several works done in recent years have furthered approximate computing along these directions. In this work, we propose a new approximate adder that employs a carry prediction method. This allows parallel propagation of the carry allowing faster calculations. In addition to the basic adder design, we also propose a rectification logic which would enable higher accuracy for larger computations. Experimental results show that our adder produces results 91.2% faster than the conventional ripple-carry adder. In terms of accuracy, the addition of rectification logic to the basic design produces results that are more accurate than state-of-the-art adders like SARA[13] and BCSA[5] by 74%.