arXiv QuantumArchived Apr 07, 2026✓ Full text saved
arXiv:2604.03816v1 Announce Type: new Abstract: Classical simulation of quantum circuits remains indispensable for algorithm development, hardware validation, and error analysis in the noisy intermediate-scale quantum (NISQ) era. However, state-vector simulation faces exponential memory scaling, with an n-qubit system requiring O(2^n) complex amplitudes, and existing simulators often lack the flexibility to exploit heterogeneous computing resources at runtime. This paper presents a GPU-accelerat
Full text archived locally
✦ AI Summary· Claude Sonnet
Quantum Physics
[Submitted on 4 Apr 2026]
GPU-Accelerated Quantum Simulation: Empirical Backend Selection, Gate Fusion, and Adaptive Precision
Poornima Kumaresan, Pavithra Muruganantham, Lakshmi Rajendran, Santhosh Sivasubramani
Classical simulation of quantum circuits remains indispensable for algorithm development, hardware validation, and error analysis in the noisy intermediate-scale quantum (NISQ) era. However, state-vector simulation faces exponential memory scaling, with an n-qubit system requiring O(2^n) complex amplitudes, and existing simulators often lack the flexibility to exploit heterogeneous computing resources at runtime. This paper presents a GPU-accelerated quantum circuit simulation framework that introduces three contributions: (1) an empirical backend selection algorithm that benchmarks CuPy, PyTorch-CUDA, and NumPy-CPU backends at runtime and selects the optimal execution path based on measured throughput; (2) a directed acyclic graph (DAG) based gate fusion engine that reduces circuit depth through automated identification of fusible gate sequences, coupled with adaptive precision switching between complex64 and complex128 representations; and (3) a memory-aware fallback mechanism that monitors GPU memory consumption and gracefully degrades to CPU execution when resources are exhausted. The framework integrates with Qiskit, Cirq, PennyLane, and Amazon Braket through a unified adapter layer. Benchmarks on an NVIDIA A100-SXM4 (40 GiB) GPU demonstrate speedups of 64x to 146x over NumPy CPU execution for state-vector simulation of circuits with 20 to 28 qubits, with speedups exceeding 5x from 16 qubits onward. Hardware validation on an IBM quantum processing unit (QPU) confirms Bell state fidelity of 0.939, a five-qubit Greenberger-Horne-Zeilinger (GHZ) state fidelity of 0.853, and circuit depth reduction from 42 to 14 gates through the fusion pipeline. The system is designed for portability across NVIDIA consumer and data-center GPUs, requiring no vendor-specific compilation steps.
Comments: 27 pages, 6 figures, 8 tables
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
Cite as: arXiv:2604.03816 [quant-ph]
(or arXiv:2604.03816v1 [quant-ph] for this version)
https://doi.org/10.48550/arXiv.2604.03816
Focus to learn more
Submission history
From: Santhosh Sivasubramani Prof [view email]
[v1] Sat, 4 Apr 2026 17:46:37 UTC (699 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
quant-ph
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
cs.DC
cs.ET
References & Citations
INSPIRE HEP
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)