# Optimizing Superconducting Three-Qubit Gates for Surface-Code Error Correction

Stephan Tasler,<sup>1,\*</sup> Josias Old,<sup>2,3,†</sup> Lukas Heunisch,<sup>1</sup> Verena Feulner,<sup>1</sup> Timo Eckstein,<sup>1</sup> Markus Müller,<sup>2,3</sup> and Michael J. Hartmann<sup>1</sup>

<sup>1</sup>Physics Department, Friedrich-Alexander-Universität Erlangen Nürnberg, Germany
<sup>2</sup>Institute for Quantum Information, RWTH Aachen University, D-52056 Aachen, Germany
<sup>3</sup>Peter Grünberg Institute, Theoretical Nanoelectronics,
Forschungszentrum Jülich, D-52425 Jülich, Germany

(Dated: June 11, 2025)

Quantum error correction (QEC) is one of the crucial building blocks for developing quantum computers that have significant potential for reaching a quantum advantage in applications. Prominent candidates for QEC are stabilizer codes for which periodic readout of stabilizer operators is typically implemented via successive two-qubit entangling gates, and is repeated many times during a computation. To improve QEC performance, it is thus beneficial to make the stabilizer readout faster and less prone to fault-tolerance-breaking errors. Here we design a 3-qubit CZZ gate for superconducting transmon qubits that maps the parity of two data qubits onto one measurement qubit in a single step. We find that the gate can be executed in a duration of 35 ns with a fidelity of F= 99.96%. To optimize the gate, we use an error model obtained from the microscopic gate simulation to systematically suppress Pauli errors that are particularly harmful to the QEC protocol. Using this error model, we investigate the implementation of this 3-qubit gate in a surface code syndrome readout schedule. We find that for the rotated surface code, the implementation of CZZ gates increases the error threshold by nearly 50% to  $\approx 1.2\%$  and decreases the logical error rate, in the experimental relevant regime, by up to one order of magnitude, compared to the standard CZ readout protocol. We also show that for the unrotated surface code, strictly fault-tolerant readout schedules can be found. This opens a new perspective for below-threshold surface-code error correction, where it can be advantageous to use multi-qubit gates instead of two-qubit gates to obtain a better QEC performance.

#### I. INTRODUCTION

Quantum computing is regarded as a technology that enables solving particular computationally hard problems much faster than classical computation [1]. For this reason the interest in building universal quantum computers has experienced a rapid growth over the last decades. Although the fidelities of physical quantum gates have steadily improved over the last years, quantum circuits are still vulnerable to residual errors. Therefore quantum error correction (QEC) is vital in developing scalable, fault-tolerant quantum computers.

Promising candidates for realizable near-term QEC codes are originating from the group of topological error correcting codes [2, 3]. In this work we focus on the surface code [4–6], which is a particularly suitable topological code for the implementation on superconducting circuit hardware, as it can be implemented on a 2D lattice chip, using only nearest-neighbour interactions. It also has one of the highest circuit-level noise error thresholds of about 1% [7]. Recent experiments demonstrated increasingly larger implementations on superconducting hardware [8–11], including first entangling operations between logical qubits via lattice surgery [12].

The surface code is a stabilizer code, where repetitive measurements of stabilizers are executed during the error correction process. Stabilizer operators don't change the encoded logical quantum state but, if measured, give information about errors that may have occurred. Successful QEC typically requires frequent stabilizer measurements, which, in the surface code, are commonly done via a sequence of two-qubit gates.

Given the high number of repeated error correction cycles during a quantum computation and the limitation of correctable physical qubit errors imposed by the fidelity of the two-qubit gates [6], improving the stabilizer measurement protocol can significantly accelerate the overall computation and boost QEC performance. By parallelizing the stabilizer readout [13, 14], the duration of a stabilizer measurement cycle can be reduced. Several studies for different platforms have shown that parallelized stabilizer readout protocols can increase the physical error correction threshold of surface code QEC schedules [15–17]. Moreover, gate optimizations for suppression of experimental systematic errors [18] and for logical qubit performance [19] in the context of erasure conversion in Rydberg atoms have been proposed.

In this work, we design and optimize a hardwarespecific gate for parallelizing stabilizer readout in superconducting qubits and show that the best performance of the error correcting code is not achieved when maximizing the fidelity of the gate, as common knowledge would suggest. Instead, the performance of the code is further enhanced, if, among all errors that are generated by the gate, those, which result in errors that cannot be corrected by the surface code of a given distance, are

<sup>\*</sup> stephan.tasler@fau.de

<sup>†</sup> j.old@fz-juelich.de

suppressed the most. More specifically, we parallelize the



FIG. 1. Schematic illustration of the unrotated (a) (rotated (b)) surface code readout schedule. The dark blue circles represent the data qubits, the blue (red) squares show the Z- (X-) stabilizer plaquettes. The here depicted Z-stabilizer readout consists of two steps: in each step multiple CZZ gates represented by the green lines are applied in parallel. The thick blue (red) lines depict the logical Z- (X-) Pauli operators of the underlying surface code. Because of the larger qubit overhead, the unrotated surface code has a higher robustness against fault-tolerance-breaking errors, which allows for a fault-tolerant CZZ readout schedule. c) shows the readout circuits for the Z- (X-) stabilizer displayed in the blue (red) box. The commonly used readout circuit, based on four sequential two-qubit gates, is displayed on the left side, the readout using the CZZ gates as proposed in this work is depicted on the right side.

stabilizer readout protocol, as depicted in Fig. 1 c) and d), by designing a CZZ gate for superconducting qubits, where one qubit acts as a measurement qubit and the

other two qubits act as data qubits. The CZZ gate can be viewed as a Z operation that is applied on the measurement qubit, controlled by the parity of the two data qubits [20]. This CZZ gate can thus be used as a parity mapping gate for the Z-, X- and Y-stabilizer readout, which makes it a natural building block of two-qubit stabilizer readouts as they are needed in sub-system QEC codes [21, 22] and Floquet codes [23–25]. In addition to this, it is well-adopted for surface-code error correction, where only Z-stabilizer measurements (Fig.1 c)) and X-stabilizer measurements (Fig.1 d)) are needed. We optimize this to maximize the logical qubit fidelity by explicitly analyzing which Pauli errors it causes and aiming to suppress those Pauli errors which harm the surface code QEC procedure most.

We find that, by using the QEC-performance-optimized CZZ gate, the logical error of the rotated surface code can be suppressed by about one order of magnitude for higher code distances in the range of experimentally realistic physical qubit error rates. Moreover, the physical error threshold can be increased significantly, by almost 50%, from  $\approx 0.66\,\%$  to  $\approx 1.1\,\%$ . For the unrotated surface code we find a similar physical error threshold improvement from  $\approx 0.66\,\%$  to  $\approx 1.1\,\%$ . Although the unrotated surface code is operated in a fault-tolerant readout schedule, we still find an improvement in the logical error rate.

The paper is organized as follows: Section II introduces the fundamental concept of the CZZ gate under investigation and discusses the proposed parallelized parity-check protocol. In Section III, we provide a detailed quantum electrodynamic description of the CZZ gate. Section IV covers the derivation of the gate-based Pauli error model, and addresses the suppression of fault-tolerance-breaking errors using optimal control. Finally, we discuss the performance of the resulting surface-code error correction protocol in Section V.

#### II. PARALLELIZED CZ GATES

In this work we focus on the execution of two parallelized controlled-Z (CZ) gates forming a CZZ gate. CZ gates are generated by ZZ-interactions. Hence, the fundamental concept of parallelizing such CZ gates is based on the commutativity of the involved ZZ-interactions, which allows one to combine the execution of multiple CZ gates in one step. For an optimal parallelization of the CZ gates for three qubits (labeled  $Q_1,Q_2,Q_3$ ), the ZZ-interactions between the two outer qubits ( $Q_1,Q_3$ ) and the center qubit ( $Q_2$ ) should be identical. We can describe the ideal interaction by a simplified two-level Hamiltonian,

$$H = \sum_{i=1,3} J_i \sigma_i^Z \sigma_2^Z \,, \tag{1}$$

where  $J_i$  denotes the coupling strength,  $\sigma_i^Z$  is the Pauli-Z matrix of the i-th (outer) qubit and  $\sigma_2^Z$  the Pauli-Z ma-



FIG. 2. Dynamics for the idealized toy model of Eq. (1). Depending on the parity of the states of the outer qubits, the center qubit oscillates between the states  $|+\rangle$  and  $|-\rangle$ . Depending on the value of J, here at 45 ns, there is a maximal contrast between the oscillations corresponding to even and odd ZZ-parity input states.

trix of the 2nd (center) qubit. The dynamics generated by this Hamiltonian is shown in Fig. 2. The parallel ZZ-interactions cause a phase oscillation, which is dependent on the parity of the two outer qubits. We choose the duration of the ZZ-interactions in a way that the phase accumulation is equal to the one obtained by applying a CZ gate between each of the i-th (outer) qubit and the 2nd (center) qubit. The resulting dynamics then corresponds to the unitary

$$U_{\text{CZZ}} = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix}, \tag{2}$$

with matrix elements  $(U_{\text{CZZ}})_{j,l} = \langle \tilde{\mathbf{Q}}_1, \tilde{\mathbf{Q}}_2, \tilde{\mathbf{Q}}_3 | U_{\text{CCZ}} | \mathbf{Q}_1, \mathbf{Q}_2, \mathbf{Q}_3 \rangle$ , where  $\tilde{\mathbf{Q}}_1, \tilde{\mathbf{Q}}_2$  and  $\tilde{\mathbf{Q}}_3$   $(\mathbf{Q}_1, \mathbf{Q}_2 \text{ and } \mathbf{Q}_3)$  are the digits of  $j = 0, 1, \dots 7$   $(l = 0, 1, \dots 7)$  in binary representation. As our aim is to use this gate for parity checks, we now discuss a protocol that maps the parity of the two outer qubits  $(\mathbf{Q}_1, \mathbf{Q}_3)$  onto the state of the center qubit  $(\mathbf{Q}_2)$ .

#### A. Parity Check Protocol

For the following discussion of the parity check measurement procedure we focus only on checks of Z-parity. X- and Y-parity checks can be easily obtained by adding additional single qubit rotations on each qubit at the beginning and the end of the circuit. To simplify the reading of this paper, we will refer to Hadamard and S gates when mentioning the basis rotations for the different stabilizers. It should be noted, however, that in an experiment, both the Hadamard and S gates are executed by



FIG. 3. The CZZ gate can be used to measure the Z-,X- and Y-stabilizer, by parallelizing the ZZ interactions between the two outer qubits (circle with white surface) and the middle qubit (circle with black surface).

single-qubit rotations  $(R_X(\theta), R_Y(\theta), R_Z(\theta))$ . The readout protocols for the individual bases are shown in Fig. 3.

For the further discussion, we divide the protocol into five steps. In the first step, we prepare the center (or measurement) qubit in the X-basis eigenstate  $|+\rangle$  with a Hadamard gate. In the second step we activate multiple parallel ZZ-interactions between the center qubit and the outer qubits. This causes an oscillation of the center qubit between the  $|+\rangle$  and  $|-\rangle$  states. Thus, if the ZZ-interactions have the same strength, the oscillation depends on the parity of the data qubits, and after a time that depends on the strength of the ZZ-interactions, the Pauli X expectation value of the ancilla qubit is perfectly split depending on whether the outer qubits have even or odd parity, see Fig. 2. At this point, in a third step, we switch off the ZZ-interaction and apply in a fourth step a second Hadamard gate to the center qubit. This transforms the center qubit back into the Z-basis eigenstate  $(|+\rangle \rightarrow |0\rangle \text{ and } |-\rangle \rightarrow |1\rangle)$ . The final step involves measuring the measurement qubit to extract the parity of the outer qubits.

#### III. CIRCUIT ANALYSIS

While the simplified model explains the working principle, it does not describe the full dynamics of a possible experimental implementation. To investigate this in more depth, we consider a system consisting of three transmon qubits  $(Q_1,Q_2,Q_3)$  and two intermediate tunable couplers (TC<sub>1</sub>, TC<sub>2</sub>), displayed in Fig. 4, in a lumped-element description. For transmon qubits, there are multiple techniques for designing a CZ gate. In a system of two qubits coupled by a coupler (|qubit, coupler, qubit), one technique is to bring the states  $|101\rangle$  and  $|200\rangle$  in resonance and control the interaction time to accumulate a desired phase in a Larmor precession [26]. Another approach is to use dispersive shifts, which alter specific eigenenergies of the system [27, 28]. To engineer and tune the desired ZZ-interaction, these experiments employ a controllable coupling element, allowing them to regulate the gained phase via the duration of the interaction. This adiabatic approach to CZ gates offers the advantage of avoiding modifications to qubit transition frequencies, which can



FIG. 4. The parallel CZ gate is implemented on a superconducting circuit, that consists of three fixed-frequency transmons (blue, orange) and two frequency-tunable transmons (tunable couplers, green). By simultaneously tuning the two tunable couplers, parallel ZZ-interactions between the outer qubits and the center qubit can be engineered.

cause unwanted resonances with spectator qubits. We here adopt this second method to design a strategy for executing two CZ gates in parallel.

The circuit we investigate consists of five circuit elements, two of which are frequency-tunable couplers and three are fixed-frequency transmon qubits consisting of a Josephson junction and a large shunt capacitance. While fixed-frequency transmons tend to have longer coherence times, one could also consider frequency-tunable transmons for enhanced flexibility, e.g. to avoid resonances with two-level defects. The circuit diagram is shown in Fig. 4. In the transmon regime the qubits are well protected against charge noise. The two couplers (green) are frequency-tunable, i.e. they each consist of a capacitively shunted DC-SQUID [29, 30]. Our design keeps the fixed-frequency qubits at their sweet spot while only adjusting the frequency of the tunable couplers [26, 27, 31].

To analyze the physics of the circuit depicted in Fig. 4, we derive the following Hamiltonian using canonical circuit quantization [29, 30],

$$H = \sum_{i=1,2,3} \omega_{i} b_{i}^{\dagger} b_{i} + \frac{\alpha_{i}}{2} b_{i}^{\dagger} b_{i}^{\dagger} b_{i} b_{i}$$

$$+ \sum_{j=1,2} \omega_{c_{j}} c_{j}^{\dagger} c_{j} + \frac{\alpha_{j}}{2} c_{j}^{\dagger} c_{j}^{\dagger} c_{j} c_{j}$$

$$+ \sum_{k=1,2,3;l=1,2} g_{k,c_{l}} (b_{k} - b_{k}^{\dagger}) (c_{l} - c_{l}^{\dagger})$$

$$+ \sum_{m < o=1,2,3} g_{mo} (b_{m} - b_{m}^{\dagger}) (b_{o} - b_{o}^{\dagger})$$

$$+ g_{c_{1},c_{2}} (c_{1} - c_{1}^{\dagger}) (c_{2} - c_{2}^{\dagger}). \tag{3}$$

Here the  $b_i^{\dagger}$  ( $b_i$ ) are the bosonic creation (annihilation) operators for the three qubits (i = 1, 2, 3) and  $c_j^{\dagger}$  ( $c_j$ ) are the creation (annihilation) operators for the two tunable couplers (j = 1, 2). For simplicity, the constant terms were neglected. The frequencies are denoted by  $\omega_i$ , the anharmonicities by  $\alpha_i$  and the individual couplings by

 $g_{i,j}$ , where the indices correspond to the respective qubits or couplers.

The qubit frequencies are given in terms of charging and Josephson energies by  $\omega_i = \sqrt{8E_{C_{ii}}E_{J_i}} - E_{C_{ii}}$  for i=1,2,3 (qubits) and  $i=c_1,c_2$  (couplers). The anharmonicities are given by  $\alpha_i=-E_{C_{ii}}$  for  $i=1,c_1,2,c_2,3$ . The qubit-couplings are expressed by  $g_{k,c_l}=-4E_{C_{kc_l}}(\sqrt{\zeta_k\zeta_{c_l}})^{-1}$ , where k=1,2,3 and l=1,2 and the qubit-qubit couplings are represented by  $g_{mo}=-4E_{C_{mo}}(\sqrt{\zeta_m\zeta_o})^{-1}$ , with m< o=1,2,3 and  $g_{c_1,c_2}=-4E_{C_{c_1c_2}}(\sqrt{\zeta_{c_1}\zeta_{c_2}})^{-1}$  defines the coupler-coupler coupling. In these expressions, the impedance is defined as  $\zeta_i=\sqrt{(8E_{C_{ii}})/E_{J_i}}$  for  $i=1,c_1,2,c_2,3$ . Here  $E_{C_{xy}}$  describes the capacitative energy for the corresponding active circuit nodes x,y.  $E_{J_x}$  represents the Josephson energy for the active circuit node x, where the active circuit nodes are defined as  $x,y\in(1,c_1,2,c_2,3)$ , for a more detailed circuit quantization see Appendix A.

To analyze the parallel execution of two CZ gates, we investigate tuning the couplers with two magnetic fluxes near the frequency of the qubits. This causes the prepared initial state to acquire a phase that depends on the state of the qubits. Besides dynamics in the qubit subspace, leakage into non-computational states will also occur during the execution of the gates. The control pulses of the couplers therefore have to be chosen in a way that, after the gate, any leakage population has returned into the computational subspace.

In an experimental realization, the relevant computational basis is formed by the eigenstates of the Hamiltonian in Eq. 3 in the idle regime, where interactions between the individual qubit-circuits are maximally suppressed. Since these computational basis states do not deviate strongly from the bare states  $|ijklm\rangle_b$ , where the indices i, j, k, l, m correspond to the excitation numbers of  $Q_1, TC_1, Q_2, TC_2, Q_3$ , we use the same indices, for the computational basis state  $|ijklm\rangle_b$ ,

#### IV. GATE OPTIMIZATION

For obtaining an optimal parity check gate, we determined adequate circuit and pulse parameters. Since the gate should involve the same ZZ-interaction between each data gubit and the ancilla gubit, it is beneficial to choose symmetric parameters for the circuit model given in Fig. 4. For the capacitances and Josephson energies of the fixed frequency qubits we choose an experimentally feasible value range [27, 32]. We then optimize the control pulses, i.e. the flux pulses that are applied to the tunable couplers for this purpose. Details of the optimization procedure and final values for the fixed circuit parameters and control pulses are provided in Appendix B. In this way, we obtain gate pulses and unitaries  $U_{\rm sim}$  (see Fig. 5 for an example) which are derived from the numerical time-evolution generated by the Hamiltonian given in Eq. 3. To quantify the gate performance,



FIG. 5. The simulated channel  $U_{\rm sim}$  for the 35 ns gate is a very good approximation of the target unitary  $U_{\rm ideal}$ , up to an irrelevant global phase. The relative phase between input state and the state after the parity check gate is depicted by the color of the squares. The size of the squares and the light grey number corresponds to the occupation probability of the input and time-evolved states. More detail about obtaining the simulated unitary matrix is given in App. B.

one commonly calculates the gate specific error as,

$$\varepsilon_{\text{gate}} \approx 1 - F = 1 - \frac{\left| \text{Tr} \left( U_{\text{sim}}^{\dagger} U_{\text{CZZ}} \right) \right|}{8},$$
(4)

where F is the fidelity of the gate. To make a more realistic assessment, we here also consider the error due to decoherence, which we estimate as

$$\varepsilon_{\rm dec} \approx 1 - e^{-\frac{t_{\rm gate}}{\tau}}.$$
 (5)

and assume a coherence time of a transmon to be  $\tau \approx 50\,\mu s$  [26, 33]. We design our gate times such that the decoherence error and the gate error are of the same order of magnitude. Even though that the gate error could be reduced further at the expense of a longer gate time, this would not lead to a benefit as the accuracy of the gate would still be limited by decoherence. Considering a shorter gate, on the other hand would reduce the effects of decoherence but lead to larger gate errors.

#### A. Error model and error based optimization

The fidelities we obtain by simulating the CZZ gate give a good measure of the performance of a single CZZ gate. However, our main use case for the CZZ gate is to improve surface code quantum error correction. To assess the CZZ gate performance in a surface code error correction protocol, we therefore seek an error model for the gate in terms of Pauli errors. To do so, we take our process matrix displayed in Fig. 5 and follow Ref. [34] to obtain an effective Pauli noise model for our

gate. We write the error channel as a composition of the simulated channel  $\mathcal{U}_{\text{sim}}$  and the ideal quantum channel  $\mathcal{U}_{\text{CZZ}} = U_{\text{CZZ}} \otimes U_{\text{CZZ}}^*$ . Introducing Pauli errors P and Q we write the closest Pauli error channel as

$$\mathcal{U}_{\text{err}} = \mathcal{U}_{\text{sim}} \circ \mathcal{U}_{\text{CZZ}}^{-1} = \sum_{PQ} w_{PQ} P \otimes Q^*.$$
 (6)

Here "closest" means with respect to the metric induced by the norm  $||A|| = \sqrt{\langle A,A \rangle}$  with the normalized Frobenius inner product  $\langle A,B \rangle = \mathrm{Tr}(A^\dagger B)/\mathrm{dim}(A)$ . The Pauli coefficients  $w_{PQ}$  correspond to the probability of the Pauli channel PQ and  $\circ$  denotes the successive execution of two channels. We calculate the Pauli coefficients according to [34] by flattening  $\mathcal{U}_{\mathrm{err}}(|a\rangle\langle b|)$  into a column vector  $|\mathcal{U}_{\mathrm{err}},ab\rangle = \mathcal{U}_{\mathrm{err}}|ab\rangle$ . With this technique we calculate all the  $4^N$  column vectors for all  $a,b\in\{0,1\}^N$ , to obtain the matrix  $M=\sum_{a,b}\mathcal{U}_{\mathrm{err}}|ab\rangle(ab|$  and calculate the Pauli coefficients  $w_{PQ}$  by taking the normalized Frobenius inner product  $w_{PQ}=\langle P\otimes Q^*,M\rangle$ .

We are only interested in the symmetric Pauli channels, which we define as  $w_{PP}$ . Therefore we apply the Pauli twirling approximation (PTA) [35] to neglect the asymmetric Pauli coefficients. The asymmetric Pauli coefficients correspond to the leakage out of the computational subspace and are neglected by the PTA. We ensure that leakage is suppressed via an additional contribution to the cost function used in the optimization. In an experimental realization, remaining leakage errors can also be converted into Pauli errors via reset protocols [36–39]. As the diagonal elements corresponding to the PTA do not sum to unity, we normalize their coefficients. We then define a base error probability  $\tilde{p} = 1 - w_{III}$ , where  $w_{III}$ denotes the Pauli coefficient for the Pauli channel III. In our surface code simulations where we vary a physical error parameter p, we thus rescale all Pauli coefficients such that  $1 - w_{III} = \sum_{P \neq III} w_P = p$ .

A core feature of our gate optimization is the choice of our cost function that can address multiple goals depending on the task of the gate. In our case we want to design a gate, which specifically suppresses fault-tolerance-breaking errors in the surface code QEC. We design our cost to consist of three terms, which we weight by different weights  $\eta_1, \eta_2$  and  $\eta_3$ . We can thus fine-tune our cost by adapting the weights  $\eta_j$  according to our aims. Here we define our cost as

$$cost = \eta_1 \varepsilon_L + \eta_2 \sum_P w_{PP} + \eta_3 \sum_K \gamma_K \cdot w_{KK}. \quad (7)$$

Independently of the task of the gate, we want to minimize leakage out of the qubit subspace. This is ensured by the first term  $\eta_1 \varepsilon_L = \eta_1 \left(1 - \frac{1}{8} \cdot \left| \text{Tr} \left( U_{\text{sim}}^\dagger U_{\text{sim}} \right) \right| \right)$  that penalizes such leakage. The second term that is implemented in the cost regardless of the task is the sum over all unweighted Pauli coefficients  $w_{PP}$ , excluding the unity coefficient. This prevents certain Pauli coefficients from being unintentionally maximized due to their absence from the cost function. The third term sums up

the Pauli coefficients  $w_{KK}$  of specific error channels that one aims to particularly suppress. These error channels have individual weightings  $\gamma_K$ , which can be chosen individually for a given task.

This error-based gate optimization approach enables the targeted suppression of specific Pauli error channels. It is particularly well suited for designing gates with a noise bias or for gates involved in quantum error correction (QEC), where certain Pauli errors are more detrimental than others. We chose all Pauli channels that lead to fault tolerance braking errors to be in the specific set of the third term in Eq. (7). In the following section, we describe how the weightings  $\gamma_K$  are determined to prioritize the suppression of errors that are most harmful to the QEC process.

#### V. APPLICATION TO THE SURFACE CODE

We now focus on the integration of our CZZ gate in a surface-code error correction protocol. A key characteristic of QEC codes, including the surface code, is the code distance d, defined as the minimum number of physical gubits that need to be faulty in order to create a non-detectable logical error. This means that a QEC code with a distance d can detect errors that affect d-1 physical qubits, and can correct errors on at least  $t = \lfloor \frac{d-1}{2} \rfloor$  faulty physical qubits. In the following we investigate two variants of the surface code: The unrotated surface code, shown in Fig. 1 a) for d = 3, which requires  $2d^2 + 2(d-1)^2 - 1$  physical qubits to encode one logical qubit (including auxiliary qubits for stabilizer readout) and the more qubit-efficient rotated surface code, shown in Fig. 1 b), which requires only  $2d^2 - 1$  physical qubits. Because of the smaller qubit count, experimental realizations mainly focus on the rotated surface code, as illustrated by a number of recent experimental demonstrations on superconducting hardware [8–10].

In Ref. [40] some of us have shown that – contrary to the folklore that multi-qubit gates are incompatible with strictly fault-tolerant circuit designs – the unrotated surface code can retain its fault-tolerance when employing a syndrome readout protocol based on 3-qubit CZZ gates. Fault-tolerance for our purposes means that a circuit is distance preserving in the following sense: If we implement the stabilizer measurements of a distance-d QEC code using a circuit with a circuit-level noise model where every location can be faulty, all up to weight-t combinations of faults ('order  $p^t$  faults') can still be corrected. Even with 3-qubit depolarizing noise applied after the CZZ gates, unrotated surface code stabilizer readout circuits remain strictly fault-tolerant and achieve a higher threshold than the standard protocol using successive two-qubit CZ gates. Rotated surface codes, however, have their circuit distance halved when allowing for arbitrary 3-qubit depolarizing noise errors on the support of the 3-qubit CZZ gates.

Here we improve the performance of the rotated sur-

face code, by optimizing the CZZ gate in a fault-tolerance adapted way to better suppress fault-tolerance breaking errors for both the rotated and unrotated surface code. To that end, we first characterize which faults are particularly harmful in the rotated surface-code implementation.

#### A. Fault-tolerance breaking faults

The following analysis examines which of the  $4^3 - 1 =$ 63 non-identity Pauli terms on the support of 3-qubit gates are particularly harmful. Although a logical error is a global result of the protocol, we can still identify which local errors on individual gates lead to uncorrectable errors more frequently than others. For concreteness, we restrict ourselves to 5 rounds of distance-5 rotated surface code syndrome measurement circuits, but the approach is extendible to larger distances. For this analysis, we initialize a graph  $\mathcal{G} = (\mathcal{V}, \mathcal{E})$  with the Pauli terms as nodes V, |V| = 63, and simulate noisy circuits by explicitly placing all order  $p^2$  faults. As the rotated surface code with CZZ gates is not fault-tolerant, this set of two-fault processes includes faults that lead to an error syndrome which will lead to a logical error. Whenever the syndrome of a combination of Pauli operators (P, P') at any combination of locations leads to an error-guess resulting in a logical error, we draw an edge  $E = (P, P') \to \mathcal{E}$ between the respective nodes and call the Paulis conspiring. We then record the occurrence probability of the fault combination,  $p_{(P,P')} = p_{(P)}p_{(P')}\prod p_{(I)}$ , where  $p_{(P)}$ is the probability of Pauli string P in the corresponding channel,  $p_{(I)}$  is the probability that no fault occurs, and the product runs over all non-faulty locations. For a circuit with  $n_{\rm l}$  locations and a uniform single-parameter noise model, one gets  $p_{(P,P')} = \left(\frac{p}{63}\right)^2 (1-p)^{n_1-2}$ . If an edge already exists, we add the occurrence probability to the respective record. In general, also order  $p^2$  combinations of a CZZ fault and e.g. a single-qubit error can lead to a logical error. While we include all other error mechanisms in the occurrence probability, motivated by higher fidelities of single-qubit operations in current experiments, we only consider how the faults on the threequbit gates conspire to focus on the harmfulness of the gate.

We show the resulting graph for p=0.001 in Fig. 6. We find that few Pauli terms (III,ZIZ,YIY,XIX) are never part of a conspiring combination, while others (e.g. IYY or XXI) lead to a failure in any other (i.e. 61) case. From the obtained data, we calculate single Pauli term marginals as the sum of failure probabilities at adjacent edges,

$$p_P = \sum_{P':(P,P')\in\mathcal{E}} p_{(P,P')}.$$
 (8)

We interpret these marginals as the average harmfulness of the respective Pauli term in a uniform 3-qubit depolarizing channel on the gate. The marginals are displayed



FIG. 6. Graph G of conspiring Pauli terms. We simulate 5 rounds of distance-5 rotated surface code stabilizer measurement circuits in a Z-memory experiment (see main text). We manually place all possible order-2 combinations of faults of uniform 3-qubit depolarizing channels after the 3-qubit CZZ gates. Whenever a fault combination leads to a logical error, there is an edge in between the constituent Pauli terms. We divide the Pauli operators into groups with the same degree, indicating the harmfulness of specific Pauli terms. For example, four Pauli terms (III, ZIZ, YIY, XIX) are never part of a failing combination, while others (e.g. IYI or XYX) lead to a failure in any other case (i.e. in 61 cases including conspiring with itself).

in Fig. 7. Notably, the harmfulness ranges over two orders of magnitude and correlates with the degree of the terms in the graph G. The largest contributions are for example IYI, XYX, YYY and ZYZ.

#### B. Optimized Gates

We now focus on the optimization of the logical QEC performance, therefore we normalize the Pauli marginals given by Eq. 8 and use them as additional weights  $\gamma_K$  for the Pauli channels  $w_{KK}$  in the cost function defined in Eq. 7. These marginals correlate to the probabilities associated with each Pauli error channel contributing to

a fault-tolerance-breaking error event. With this method we can specifically suppress harmful errors and boost the QEC performance. In the following will use the notation  $\eta = [\eta_1, \eta_2, \eta_3]$  for the cost weightings.

We will next discuss two possible pulses for the CZZ gate, a pulse with a duration of 35 ns, obtained with the weightings  $\eta = [1,10,0.1]$ , and one with a duration of 50 ns, obtained with  $\eta = [1,10,0.1]$ . The respective gate and decoherence errors are listed in Table I. For these gates, we calculate the effective Pauli channels as described in Sec. IV A and display them in Fig. 7. We can see that the Pauli terms with the largest uniform marginals (indicated by the bars on the left side of the plot, except for the "III"-entry) are suppressed compared

|                               | $35\mathrm{ns}$        | $50\mathrm{ns}$       |
|-------------------------------|------------------------|-----------------------|
| $\varepsilon_{\mathrm{gate}}$ | $3.646 \cdot 10^{-4}$  | $6.529 \cdot 10^{-4}$ |
| $\varepsilon_{ m dec}$        | $6.9975 \cdot 10^{-4}$ | $9.995 \cdot 10^{-4}$ |

TABLE I. Errors for the  $35\,\mathrm{ns}$  and the  $50\,\mathrm{ns}$  gate.

to the other Pauli terms (bars in the right part of the plot).

# C. Improved surface code performance and resource analysis

We conduct memory experiments in rotated and unrotated surface codes using stim [41]. For a distance-d Z-(X-)memory experiment, we prepare  $|0\rangle^{\otimes n}$   $(|+\rangle^{\otimes n})$ , measure the stabilizers d times using the readout protocol shown in Fig. 1 and finally measure the data qubits in the Z(X) basis. We simulate both bases and report the joint logical error rate as detailed in App. C1. The schedule for the rotated surface code is depicted in panel a), the protocol for the unrotated code can be seen in panel b). The readout for the Z-stabilizers is done in two steps, first all the CZZ gates connecting the south and west (SW) data gubits are executed. In the second step the north-east (NE) CZZ gates are performed. After the gates, the ancilla qubits are measured, while the CZZ gates used to map out the X-stabilizers are applied in a NW-SE ordering. For the readout of the X-stabilizers we additionally apply Hadamard gates before and after the CZZ gates on all qubits as depicted in Fig. 1 c).

We use a circuit level noise model inspired by the SI1000 model of Ref. [41], where each circuit location can be faulty. Single-qubit initializations (measurements) in the Z-basis are followed (preceded) by a bit-flip channel of strengths  $p_{\rm reset}=2p,~p_{\rm measure}=5p.$  Single- and two-qubit gates are modeled as ideal followed by a single- and two-qubit depolarizing channels of strength p. For three-qubit gates we compare gates followed by three-qubit uniform depolarizing channels against gates followed by the noise channel derived for our gate, see also Sec. V A. Additionally, we assume idling noise for each waiting location of  $p_{\rm idle}=0.1p$ .

The results are shown in Fig. 8, where we compare the error correction performance of the surface codes implementing the CZZ gate readout protocols. We compare the two noise models obtained by optimal control with the uniform noise model. For the unrotated surface codes depicted in Fig. 8 b), we observe a small improvement in logical error rate. Due to the fault-tolerance of the circuits, the scaling of the logical error rate in the low error regime is in any case  $\propto p^{t+1}$ . Small deviations from the expected scaling are related to sub-optimal decomposition of faults into matchable faults, see also Ref. [40]. This suggests that optimizing pulses to suppress harmful errors can also enhance standard error correction readout

schemes that use two-qubit gates.

For the rotated surface code, displayed in Fig. 8 a), the effect of the suppression of fault-tolerance breaking errors is much more distinct. Here the improvement amounts to about one order of magnitude for higher code distances. Since the rotated surface code doesn't have a fault-tolerant readout schedule for the CZZ gate, the logical error scaling flattens out with a smaller physical error rate. We observe a scaling  $p_L \propto p^{\lceil \frac{d-1}{2} \rceil}$ . The expected scaling from the non-FT distance  $\frac{d+1}{2}$  is  $\lfloor \frac{d-1}{2} \rfloor + 1$  which can be observed around  $p \approx 10^{-3}$ . We attribute the further reduction for very low physical error rates again to sub-optimal fault decompositions. Nevertheless, in the experimentally relevant regime of  $p \in [10^{-3}\dots 10^{-2}]$  [11], the improvement of the error-suppressed CZZ gates is significant. This becomes even more evident if we look at the comparison of the gain in logical error rate, which is displayed in Fig. 9.

Our simulations also show that the new CZZ gate readout schedules improve the physical qubit error threshold from  $\approx 0.66\,\%$  to  $\approx 1.2\,\%$  for the rotated surface code, and from  $\approx 0.66\,\%$  to  $\approx 1.1\,\%$  for the unrotated surface code, as can be seen in Fig. 10. The significant increase in physical error threshold can be attributed to the reduced number of fault locations, as explained in detail in Ref. [40]. Note that these results can be regarded as an upper bound for the physical threshold and logical error reduction, since our error model does not include all errors present in an experimental realization.

We now compare the resource requirements for a fault-tolerant CZ readout protocol under uniform depolarizing noise and the optimized CZZ gate with the effective Pauli noise model as the underlying noise channel on 3-qubit gates. For that, we calculate the number of physical qubits required to reach a target logical error rate. We plot the logical error rate for different physical errors p over the number of physical qubits (including ancilla qubits) in Fig. 11. We investigate these graphs for no noise on idling qubits in Fig. 11 a) (left) and for a noise of  $p_{\rm idle} = 0.1p$  on idling qubits in Fig. 11 b) (right). We fit the function

$$p_L(n) = c_0 \left(\frac{p}{c_1}\right)^{c_2\sqrt{n}} \tag{9}$$

to the simulated data to allow for an extrapolation to large surface codes and plot a step function that rounds to the next larger surface code available. In both simulations and and in particular for physical error rates  $\geq 0.3\%$ , the CZZ-gate based readout protocols requires fewer physical qubits to reach a target logical error rate. For example, reaching a logical error rate of  $p_L = 10^{-6}$  at p = 0.3% requires 1457 physical qubits (distance 27) for the CZ protocol, but only 1057 physical qubits (distance 23) with the CZZ protocol, an advantage of  $\approx 27.5\%$ . This reduction increases if we include idling noise, where the same 1057 physical qubits (distance 23) CZZ protocol uses  $\approx 37.1\%$  less qubits than the CZ protocols with 1681 (distance 29) qubits.



FIG. 7. Effective Pauli channels are displayed by the orange bars for the 35 ns gate and the green bars for the 50 ns gate. The grey bars show the Pauli channels for a uniform noise distribution and the Pauli-term marginals, c.f. Eq. (8) are depicted by the black outlined bars. All Pauli channels and Pauli marginals are rescaled to a physical error probability of p = 0.001. The Pauli-term marginals  $p_P$  of order-2 faults are calculated from the occurrence probabilities, i.e. we sum the outgoing edges of the nodes in the graph G of Fig. 6, cf. Eqn. 8. We sort the Pauli operators shown on the horizontal axis by decreasing marginal probability and color the background of their labels according to the degree of the corresponding node in G. Notably, the effective Pauli channels are structured in such a way that terms with an odd number of G or G or G on the bottom. Notably, Pauli terms with higher degrees are more strongly suppressed by the 35 ns gate than those with lower degrees.

## VI. CONCLUSION AND OUTLOOK

In this paper we present a new surface-code stabilizer readout protocol for the rotated and unrotated surfacecode using CZZ gates. We design our CZZ gate by using optimal control, to strongly suppress Pauli errors that are explicitly harmful for the surface-code QEC. We show that although the CZZ gate readout protocol is not fault tolerant for the rotated surface-code, in the experimental achievable regime, it outperforms the common twoqubit readout protocol with two-qubit gates in physical error threshold and logical error rate scaling. For the unrotated surface code some of us have shown in a parallel work [40] that a fault-tolerant readout schedule using CZZ gates can be found. We show that even for the fault-tolerant readout schedule our new readout protocol improves the unrotated surface-code QEC slightly in logical error scaling and significantly in the physical qubit threshold.

These results suggest that shifting the gate optimization to suppress fault-tolerance-breaking Pauli errors could also improve conventional fault-tolerant two-qubit gate readout procedures, such as those used by, e.g., Google [10, 11], ETH Zürich [9] and others. Beyond targeting specific uncorrectable quantum error correction (QEC) errors, the gate design approach described here

can also be applied to the design of other quantum gates, also beyond superconducting qubit platforms, and particularly gates with a strong noise bias, since it enables stronger suppression of individual Pauli error channels.

Based on the structure of the underlying superconducting circuit, consisting of only fixed-frequency and tunable transmon qubits, the proposed stabilizer readout scheme using CZZ gates is well-suited for an experimental implementation on existing superconducting circuit hardware, without changing the hardware design. This makes the performance of the CZZ gate/ readout protocol on real hardware an interesting research question. Another promising research direction emerging from this work is the implementation of targeted Pauli channel suppression in quantum gate design.

#### VII. ACKNOWLEDGEMENTS

This research is part of the Munich Quantum Valley (K-8), which is supported by the Bavarian state government with funds from the Hightech Agenda Bayern Plus. We additionally acknowledge support by the BMBF projects GeQCoS (Grant No. 13N15684) and MUNIQC-ATOMS (Grant No. 13N16070). Furthermore, J.O. and M.M. acknowledge support by the Eu-



FIG. 8. Results for rotated (a) and unrotated (b) surface codes: We show the logical error rates of uniform depolarizing error models (circles) and of the effective Pauli channel (epc) of the  $35 \, \mathrm{ns} - [1, 10, 0.1]$  gate (squares) for different surface code distances displayed in the legend. We include an idling noise strength of 0.1p. With dashed lines, we show the logical error rates using a fault-tolerant readout with 2-qubit CZ gates. a) The CZZ protocol for the rotated surface code is represented by the continuous lines. It is not fault tolerant, such that for small error rates, the curves show a reduced scaling. However, for larger physical error rates the CZZ protocol has a lower logical error rate, even with a uniform depolarizing noise model. With the effective error model, this regime extends to smaller physical error rates, such that for distance 11, e.g., only for error rates  $< 10^{-3}$ , the CZ protocol outperforms the CZZ protocol. b) For unrotated surface codes, the CZZ protocol is fault tolerant and therefore outperforms the CZ protocol for almost all simulated physical error rates. The effective error model only marginally improves upon the uniform depolarizing model.



FIG. 9. Logical error rates for CZZ circuits for the uniform gate noise model and the  $35\,\mathrm{ns}-[1,10,0.1]$  gate with corresponding effective Pauli error model (epc) depicted by the squares and the continuous lines compared to the uniform depolarizing channel on CZ gates represented by the dashed lines with circles. The numbers in the legend display the respective surface code distances. When  $p_L^{(\mathrm{CZZI})}/p_L^{(\mathrm{CZ,uniform})} < 1$ , there is an advantage in using the CZZ gate readout. a) For the rotated surface codes and a uniform depolarizing noise CZZ protocol, there is an advantage for larger physical error rates. The regime of advantage shifts to smaller logical error rates when using the effective error model. b) For the unrotated surface codes, both uniform and effective error model on the CZZ protocol outperform the uniform CZ protocol for most simulated physical error rates. The effective error model only marginally improves upon the uniform depolarizing model.



FIG. 10. Threshold plots. a) Rotated surface codes show an increase in threshold from  $p_{\rm th}^{\rm (CZ)} \approx 0.66\%$  to  $p_{\rm th}^{\rm (CZZ)} \approx 0.83\%$  and  $p_{\rm th}^{\rm (CZZ,epc)} \approx 1.2\%$ . b) Unrotated surface codes show an increase in threshold from  $p_{\rm th}^{\rm (CZ)} \approx 0.66\%$  to  $p_{\rm th}^{\rm (CZZ)} \approx 0.84\%$  and  $p_{\rm th}^{\rm (CZZ,epc)} \approx 1.1\%$ . The colors encode different code distances, circles refer to uniform noise and squares to effective Pauli channel (epc) noise, see legend. Dashed lines mark circuits with two-qubit gates, while solid lines mark circuits using the CZZ gate. Uncertainties are obtained from a finite size scaling analysis, detailed in the appendix.



FIG. 11. Number of physical qubits required to reach a target logical error rate, including auxiliary qubits, i.e.  $n=2d^2-1$ . We compare rotated surface code implemented with CZZ gates and a uniform depolarizing error model (continuous lines and circles), to rotated surface code implemented with CZZ gates and the effective Pauli channel of the 35 ns gate (dashed lines and squares). We plot the resource estimations for different physical error probabilities from p=0.1% to p=0.6%, see legend for color coding. a) No idling noise. For example, to reach a logical error rate of  $p_L=10^{-6}$  at p=0.3%, 1457 physical qubits (distance 27) are required for the CZ protocol, but only 1057 physical qubits (distance 23) with the CZZ protocol, an advantage of  $\approx 27.5\%$ . b) With idling noise of strength  $p_{\text{idle}}=0.1p$  this reduction increases. The same 1057 physical qubits (distance 23) CZZ protocol uses  $\approx 37.1\%$  less qubits than the CZ protocols with 1681 qubits (distance 29).

ropean Union's Horizon Europe research and innovation programme under Grant Agreement No. 101114305 ("MILLENION-SGA1" EU Project) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy "Cluster of Excellence Matter and Light for Quantum Computing (ML4Q) EXC 2004/1" 390534769 and the ERC Start-ing

Grant QNets through Grant No. 804247. The authors gratefully acknowledge the computing time provided to them at the NHR Center NHR4CES at RWTH Aachen University (Project No. p0020074). This is funded by the Federal Ministry of Education and Research and the state governments participating on the basis of the resolutions of the GWK for national high performance computing at universities.

- A. M. Dalzell, S. McArdle, M. Berta, P. Bienias, and C.-F. Chen, Quantum Algorithms A Survey of Applications and End-To-end Complexities (Cambridge University Press, 2025).
- [2] A. Kitaev, Annals of Physics 303, 2 (2003).
- [3] A. Y. Kitaev, "Quantum error correction with imperfect gates," in *Quantum Communication, Computing, and Measurement* (Springer US, 1997) pp. 181–188.
- [4] S. B. Bravyi and A. Y. Kitaev, (1998), https://doi.org/10.48550/arXiv.quant-ph/9811052, arXiv:quant-ph/9811052 [quant-ph].
- [5] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Journal of Mathematical Physics 43, 4452 (2002).
- [6] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Physical Review A 86, 032324 (2012).
- [7] D. S. Wang, A. G. Fowler, and L. C. L. Hollenberg, Physical Review A 83, 020302 (2011).
- [8] C. K. Andersen, A. Remm, S. Lazar, S. Krinner, N. Lacroix, G. J. Norris, M. Gabureac, C. Eichler, and A. Wallraff, Nature Physics 16, 875 (2020).
- [9] S. Krinner, N. Lacroix, A. Remm, A. Di Paolo, E. Genois, C. Leroux, C. Hellings, S. Lazar, F. Swiadek, J. Herrmann, G. J. Norris, C. K. Andersen, M. Müller, A. Blais, C. Eichler, and A. Wallraff, Nature 605, 669 (2022).
- [10] G. Q. AI, Nature **614**, 676 (2023).
- [11] R. e. a. Acharya, Nature **638**, 920 (2024).
- [12] I. Besedin, M. Kerschbaum, J. Knoll, I. Hesner, L. Bödeker, L. Colmenarez, L. Hofele, N. Lacroix, C. Hellings, F. Swiadek, A. Flasby, M. B. Panah, D. C. Zanuz, M. Müller, and A. Wallraff, "Realizing lattice surgery on two distance-three repetition codes with superconducting qubits," (2025).
- [13] D. P. DiVincenzo and F. Solgun, New Journal of Physics 15, 075001 (2013).
- [14] A. Ciani and D. P. DiVincenzo, Physical Review B 96, 214511 (2017).
- [15] D. Schwerdt, Y. Shapira, T. Manovitz, and R. Ozeri, Physical Review A 105, 022612 (2022).
- [16] G. Üstün, A. Morello, and S. Devitt, Quantum Science and Technology 9, 035037 (2024).
- [17] M. J. Reagor, T. C. Bohdanowicz, D. R. Perez, E. A. Sete, and W. J. Zeng, "Hardware optimized parity check gates for superconducting surface codes," (2022).
- [18] P. Cerfontaine, R. Otten, and H. Bluhm, Physical Review Applied 13, 044071 (2020).
- [19] S. Jandura, J. D. Thompson, and G. Pupillo, PRX Quantum 4, 020336 (2023).
- [20] Note that the 'control' of the CZZ gate is on the measurement qubit, but since CZZ =  $|0\rangle \langle 0| \otimes II + |1\rangle \langle 1| \otimes ZZ = I \otimes (|00\rangle \langle 00| + |11\rangle \langle 11|) + Z \otimes (|01\rangle \langle 01| + |10\rangle \langle 10|)$ , this

- gate can also be thought of as a X-parity-controlled Z-gate.
- [21] D. Kribs, R. Laflamme, and D. Poulin, Physical Review Letters 94, 180501 (2005).
- [22] D. W. Kribs, R. Laflamme, D. Poulin, and M. Lesosky, (2005), 10.48550/ARXIV.QUANT-PH/0504189.
- [23] M. B. Hastings and J. Haah, Quantum 5, 564 (2021).
- [24] M. Davydova, N. Tantivasadakarn, and S. Balasubramanian, PRX Quantum 4, 020341 (2023).
- [25] J. C. Magdalena de la Fuente, J. Old, A. Townsend-Teague, M. Rispler, J. Eisert, and M. Müller, PRX Quantum 6, 010360 (2025).
- [26] Y. Sung, L. Ding, J. Braumüller, A. Vepsäläinen, B. Kannan, M. Kjaergaard, A. Greene, G. O. Samach, C. McNally, D. Kim, A. Melville, B. M. Niedzielski, M. E. Schwartz, J. L. Yoder, T. P. Orlando, S. Gustavsson, and W. D. Oliver, Physical Review X 11, 021058 (2021).
- [27] M. C. Collodo, J. Herrmann, N. Lacroix, C. K. Andersen, A. Remm, S. Lazar, J.-C. Besse, T. Walter, A. Wallraff, and C. Eichler, Physical Review Letters 125, 240502 (2020).
- [28] L. Heunisch, C. Eichler, and M. J. Hartmann, Physical Review Applied 20, 064037 (2023).
- [29] U. Vool and M. Devoret, International Journal of Circuit Theory and Applications 45, 897 (2017).
- [30] S. Rasmussen, K. Christensen, S. Pedersen, L. Kristensen, T. Bækkegaard, N. Loft, and N. Zinner, PRX Quantum 2, 040204 (2021).
- [31] S. e. a. Li, Chinese Physics Letters **39**, 030302 (2022).
- [32] A. J. Baker, G. B. P. Huber, N. J. Glaser, F. Roy, I. Tsitsilin, S. Filipp, and M. J. Hartmann, Applied Physics Letters 120 (2022), 10.1063/5.0077443.
- [33] C. e. a. Wang, npj Quantum Information 8 (2022), 10.1038/s41534-021-00510-2.
- [34] M. A. Perlin, "A short note on effective pauli noise models," (2023).
- [35] M. R. Geller and Z. Zhou, Physical Review A 88, 012314 (2013).
- [36] M. e. a. McEwen, Nature Communications 12 (2021), 10.1038/s41467-021-21982-y.
- [37] K. C. e. a. Miao, Nature Physics 19, 1780 (2023).
- [38] S. Jandura and G. Pupillo, "Surface code stabilizer measurements for rydberg atoms," (2024).
- [39] N. Lacroix, L. Hofele, A. Remm, O. Benhayoune-Khadraoui, A. McDonald, R. Shillito, S. Lazar, C. Hellings, F. Swiadek, D. Colao-Zanuz, A. Flasby, M. B. Panah, M. Kerschbaum, G. J. Norris, A. Blais, A. Wallraff, and S. Krinner, Physical Review Letters 134, 120601 (2025).
- [40] J. Old, S. Tasler, M. J. Hartmann, and M. Müller, to appear (2025).

- [41] C. Gidney, Quantum 5, 497 (2021).
- [42] J. Johansson, P. Nation, and F. Nori, Computer Physics Communications 183, 1760 (2012).
- [43] J. Johansson, P. Nation, and F. Nori, Computer Physics Communications 184, 1234 (2013).
- [44] A. G. Manes and J. Claes, Quantum 9, 1618 (2025).
- [45] P.-J. H. S. Derks, A. Townsend-Teague, A. G. Burchards, and J. Eisert, "Designing fault-tolerant circuits using de-
- tector error models," (2024).
- [46] O. Higgott, "Pymatching: A python package for decoding quantum codes with minimum-weight perfect matching," (2021).
- [47] A. Sorge, (2015), 10.5281/zenodo.35293.

#### Appendix A: Circuit quantisation

The circuit given in Fig. 4 is quantized in the standard QED procedure [29, 30], therefore the active nodes are determined by choosing a spanning tree. The active nodes are depicted in Fig. 12 by the red dots. Thus the active node fluxes can be written in a flux vector

$$\phi = \begin{pmatrix} \phi_1 \\ \phi_{c1} \\ \phi_2 \\ \phi_{c2} \\ \phi_3 \end{pmatrix}. \tag{A1}$$

With the flux vector the capacitance matrix  $\mathbf{C}$  can be set up.



FIG. 12. Circuit diagram of the parity check circuit. The active nodes are depicted by the red dots.

$$C = \begin{pmatrix} C_{1c_{1}} + C_{12} + C_{1} & -C_{1c_{1}} & -C_{12} & 0 & 0 \\ -C_{1c_{1}} & C_{1c_{1}} + C_{2c_{1}} + C_{c_{1}} & -C_{2c_{1}} & 0 & 0 \\ -C_{12} & -C_{2c_{1}} & C_{12} + C_{2c_{1}} + C_{2c_{2}} + C_{23} + C_{2} & -C_{2c_{2}} & -C_{23} \\ 0 & 0 & -C_{2c_{2}} & C_{2c_{2}} + C_{3c_{2}} + C_{c_{2}} & -C_{3c_{2}} \\ 0 & 0 & -C_{23} & -C_{3c_{2}} & C_{23} + C_{3c_{2}} + C_{3} \end{pmatrix}$$

$$(A2)$$

The Lagrangian is given by

$$\mathcal{L} = \frac{1}{2}\dot{\boldsymbol{\phi}}^{\mathrm{T}}\boldsymbol{C}\dot{\boldsymbol{\phi}} - E_{pot}.$$
 (A3)

The capacitance matrix can now be used to calculate the conjugate charges for each node by  $q = C\dot{\phi}$ . The Hamiltonian can be derived by applying a Legendre transformation on the Lagrangian. For this one needs to express  $\dot{\phi}$  in terms of q. This is done by inverting the capacitance matrix.

$$\mathcal{H} = \dot{\phi}^T \mathbf{q} - \mathcal{L} = \frac{1}{2} \mathbf{q}^T \mathbf{C}^{-1} \mathbf{q} - E_{\text{pot}}(\phi)$$
(A4)

Furthermore the effective capacitative energy matrix  $E_{C_{ij}}$  can be defined as

$$E_{C_{ij}} = \frac{e^2}{2} C_{ij}^{-1}.$$
 (A5)

The net number of Cooper pairs for the mode i can be written as

$$\hat{n}_i = -\frac{\hat{q}_i}{2e}.\tag{A6}$$

And the potential energy of the Josephson Junctions is defined as

$$E_{\text{pot}} = -E_{J_1} \cos(\phi_1) - E_{J_{c_1}} \cos(\phi_{c_1}) \cos(\varphi_{c_1}^{ext}) - E_{J_2} \cos(\phi_2) - E_{J_{c_2}} \cos(\phi_{c_2}) \cos(\varphi_{c_2}^{ext}) - E_{J_3} \cos(\phi_3).$$
(A7)

After quantization, equations A5, A6, A7 can now be used to rewrite the system Hamiltonian in terms of the quantized Cooper pair number  $\hat{n}$  and the quantized flux  $\hat{\phi}$ . In the following, the transmon approximation of the potential energy  $E_J \cos(\phi) \simeq E_J - \frac{1}{2} E_J \phi^2 + \frac{1}{24} E_J \phi^4$  is used. For clarity reasons, in the following part the hat is left out for quantum mechanical operators.

$$H = \sum_{i=1,c_1,2,c_2,3} 4 E_{C_{ii}} n_i^2 + \frac{1}{2} E_{J_i} \phi_i^2 - \frac{1}{24} E_{J_i} \phi_i^4$$

$$+ \sum_{k=1,2,3;l=1,2} 8 E_{C_{kc_l}} n_k n_{c_l} + \sum_{m < o = 1,2,3} 8 E_{C_{mo}} n_m n_o$$

$$+ 8 E_{C_{c_1c_2}} n_{c_1} n_{c_2}$$
(A8)

The obtained Hamiltonian can now be written in second quantization by introducing creation and annihilation operator  $b_i^{\dagger}, b_i$ , where i=1,A,3 for the data qubits and ancilla qubit and  $c_k^{\dagger}, c_k$ , where k=1,2 for the two couplers. The Hamiltonian in second quantisation has now the following shape

$$H = \sum_{i=1,2,3} \omega_{i} b_{i}^{\dagger} b_{i} + \frac{\alpha_{i}}{2} b_{i}^{\dagger} b_{i}^{\dagger} b_{i} b_{i}$$

$$+ \sum_{j=1,2} \omega_{c_{j}} c_{j}^{\dagger} c_{j} + \frac{\alpha_{j}}{2} c_{j}^{\dagger} c_{j}^{\dagger} c_{j} c_{j}$$

$$+ \sum_{k=1,2,3;l=1,2} g_{k,c_{l}} (b_{k} - b_{k}^{\dagger}) (c_{l} - c_{l}^{\dagger})$$

$$+ \sum_{m < o=1,2,3} g_{mo} (b_{m} - b_{m}^{\dagger}) (b_{o} - b_{o}^{\dagger})$$

$$+ g_{c_{1}c_{2}} (c_{1} - c_{1}^{\dagger}) (c_{2} - c_{2}^{\dagger})$$
(A9)

#### Appendix B: Parameter Optimization

For obtaining an optimal parity check gate, the main challenge is to find adequate circuit and pulse parameters. Since the parity check gate should produce the same ZZ-interaction between each data qubit and ancilla qubit, it is beneficial to choose symmetric parameters for the circuit model given in Fig. 4. For the choice of capacitance values and also frequency values of the fixed frequency qubits, [27, 32] were used as orientation. The final values for the fixed circuit parameter are displayed in Table II and III.

After fixing the capacitances and fixed qubit frequencies, the dynamical circuit parameters, the frequencies of the two tunable couplers, are obtained by using optimal control. In the idle position the two couplers are parked at the frequencies  $\omega_{c_1} = 7.496\,\mathrm{GHz}$  and  $\omega_{c_2} = 7.44\,\mathrm{GHz}$ . To determine the pulses, which tune the two couplers, an optimal control optimization is performed. The pulse function is based on a flattop Gaussian function, which has the following shape.

$$\frac{A_{c_i}}{4} \cdot \left(1 + \operatorname{erf}\left(\frac{t - \Delta t_i - t_{\text{ramp}}^i}{t_{\text{ramp}}^i}\right)\right) \cdot \left(1 + \operatorname{erf}\left(\frac{t_{\text{gate}}^i - t + \Delta t_i - t_{\text{ramp}}^i}{t_{\text{ramp}}^i}\right)\right), \quad i = 1, 2$$
(B1)

The optimizable parameters are the amplitude  $A_{c_i}$ , the gate time  $t_{gate}^i$  and the ramp up time  $t_{ramp}^i$ , the index i represents the corresponding coupler and runs from 1 to 2. In contrast to common gate optimizations we use a cost

| Capacitor                                   | Capacity in fF |
|---------------------------------------------|----------------|
| $C_1 = C_2 = C_3$                           | 77.8           |
| $C_{c_1} = C_{c_2}$                         | 60.4           |
| $C_{12} = C_{23}$                           | 0.46           |
| $C_{1c_1} = C_{2c_2} = C_{2c_1} = C_{2c_2}$ | 6.4            |

TABLE II. The Table shows the chosen capacitor values for the two qubit ancilla circuit. The capacitances were chosen symmetrically to obtain similar ZZ-couplings between data qubits and ancilla.

| Qubit      | Frequency<br>[GHz] |
|------------|--------------------|
| $\omega_1$ | 4.89               |
| $\omega_2$ | 5.31               |
| $\omega_3$ | 4.83               |

TABLE III. The Table shows the frequency choices for the three different fixed frequency qubits Q1, Q2 and Q3.

function given in Eqn. 7, that minimizes fault-tolerance breaking errors and leakage. For the calculation of the Pauli coefficients, we explained in section IV A, we define the ideal CZZ unitary in the three qubit two level subspace consisting of the two data qubits and the ancilla qubit. We define the two data qubits  $(|i\rangle, |j\rangle \in (|1\rangle, |0\rangle)$  and the ancilla qubit  $(|k\rangle \in (|0\rangle, |1\rangle)$  in the Z-basis. This results in the 8x8 matrix mentioned in equation 2, which is constructed by the product states  $|ijk\rangle$ .

For the calculation of the Pauli weights, we also need the simulated Pauli channel. To obtain the channel  $\mathcal{U}_{\text{sim}}$  from the simulation, the inner product of the input state and the state after the time evolution is taken, which can be written as

$$\mathcal{U}_{\text{sim}}[i,j] = \langle i|\mathcal{U}_{\text{gate}}|j\rangle$$
. (B2)

where  $|i\rangle$ ,  $|j\rangle \in \{|000\rangle$ ,  $|001\rangle$ ,  $|010\rangle$ ,  $|011\rangle$ ,  $|100\rangle$ ,  $|101\rangle$ ,  $|110\rangle$ ,  $|111\rangle$  are representing the entries of the channel  $\mathcal{U}_{\text{sim}}$ . We use the python package QuTiP [42, 43] for the gate simulation, and the Nelder-Mead algorithm of SciPy minimize for the the pulse-optimization. The such obtained optimal pulses for the two couplers can be seen in Fig. 13. We performed the full simulation of the parity check gate using QuTiP [42, 43].

## Appendix C: QEC simulations - numerical methods

## 1. Memory experiments

We benchmark the performance of the error correction circuits in Z- (X-)memory experiments. For a distance d surface code, we

- 1. Prepare n data qubits in  $|0\rangle^{\otimes n}$   $(|+\rangle^{\otimes n})$ .
- 2. Using n-1 ancilla qubits, perform d rounds of (Z- and X-) syndrome measurements.
- 3. Measure the n data qubits in the Z- (X-) basis.

We construct the circuits as described in the next section in stim[41] and declare DETECTORs representing sets of deterministic measurement as the parity of consecutive syndrome measurements. For a Z- (X-)basis memory experiment, the first Z-(X-)stabilizer measurement is already deterministic and considered a detector on its own. Also, the last stabilizer measurement is compared to the syndrome reconstructed from the single qubit measurements. From the final single-qubit Z-(X-) measurements, we can also reconstruct the value of the logical Z (X) operator, which we annotate as OBSERVABLE\_INCLUDE. Asymmetries arising from effective or biased noise channels, as well as stabilizer readout circuit details can influence the symmetry of X- and Z- memory experiments. We therefore always simulate both memory experiments and report

$$p_L = 1 - (1 - p_L^{(X)})(1 - p_L^{(Z)})$$
(C1)



FIG. 13. The two tunable couplers are strongly tuned close to the ancilla qubit frequency, this results in a large ZZ-coupling, which enables a fast parity check gate.

as the overall logical error rate assuming independence of X- and Z- logical error rates.

## 2. Stabilizer measurement circuits

There are numerous ways to schedule and order the entangling gates required for the projective measurement of the stabilizer generators. While in unrotated surface codes, the order does not influence the fault-distance of the circuits [5, 44], rotated surface code require a scheduling such that the last two gates interact with data qubits that are orthogonal to the logical operator corresponding to the Pauli type of the measured stabilizer [6]. We show the different schedules in Fig. 14 a), and the respective logical X- and Z- error rates for a distance 7 surface codes implementing the CZZ gate using these schedule in Fig. 14 b). We use a uniform depolarizing noise model as described in the main text. These results confirm the ordering-independence of the unrotated surface code circuits. For the rotated surface codes and an ordering that is orthogonal (21) or parallel (22) to the respective logical, the logical error rates are also symmetric in X and Z. We can, however, turn one of the memory experiments fault-tolerant by ordering both, Z- and X- stabilizer measurements orthogonal to a fixed logical operators. For ordering 24, e.g., all gates are orthogonal to the X-logical. This allows for a distance preserving protection against X-logical errors, such that the Z-basis memory experiments shows the expected FT scaling (here  $\propto p^4$  for d=7). This holds analogously for ordering 25 with Paulis X and Z interchanged. In Fig. 14 c), we show the combined logical error rate (Eqn. C1). The asymmetry for different bases and orderings disappear when relying on this metric.

Finally, we show one round of stabilizer measurements using CZZ gates for rotated (a) and unrotated (b) surface codes in Fig. 15. Here, and in all simulations shown in the main text, we use ordering 21. In the actual simulations, we additionally reduce the number of idling locations by placing reset operations parallel to entangling gates (i.e. merging timesteps 4 and 5 in Fig. 15).



FIG. 14. a) Orderings 21, 22, 24 and 25 of three-qubit CZZ gates for Z- and X-stabilizer measurements. On the plaquettes, we draw in dashed (dotted) lines the overlap of a Z- (X-) logical operator in the rotated surface code. b) Logical error rate of X- and Z-basis memory experiments on distance d=7 rotated surface codes. For orderings 21 and 22, these show a symmetric behavior. Orderings 24 and 25 perform differently for the two bases as described in the text. c) Combining the X- and Z- logical error rates shows that the logical error rates are very similar across the orderings.



FIG. 15. One round of stabilizer measurements of distance 3 surface codes in the a) rotated and b) unrotated implementation using ordering 21. X- and Z-stabilizers are drawn in light red and blue respectively. We also indicate minimum weight logical operators. CZZ gates always have their control qubit on the ancillary qubit in the center of a plaquette. The implementation with CZZ gates requires two timesteps of entangling gates per Pauli type, compared to 4 timesteps for an equivalent implementation with CZ gates.

#### 3. Noise Models

If not otherwise specified, we implement a single-parameter, superconducting qubit architecture inspired circuit level noise model with noise parameters

$$p_{\rm H} = p_{\rm CZ} = p_{\rm CZZ} = p \tag{C2}$$

$$p_{\text{idle}} = 0.1p \tag{C3}$$

$$p_{\text{reset}} = 2p$$
 (C4)

$$p_{\text{measure}} = 5p.$$
 (C5)

#### 4. Implementing three-qubit gates and n-qubit depolarizing errors in stim

There is no native way to include CZZ-gates and depolarizing channels acting on n>2 qubits in stim. We therefore place two CZ-gates in the same TICK and use stims included correlated error feature to mimic three-qubit depolarizing error channels. In the simulation, an ELSE\_CORRELATED\_ERROR(p) P1\*P2\*... only occurs with probability p if none of the preceding ELSE\_CORRELATED\_ERRORs occurred. Given a non-uniform depolarizing channel on p qubits with probabilities  $\vec{p} = \{p_i\}_{i=1}^{4^n-1}$  of (Pauli) errors  $\vec{E} = \{P_i\}_{i=1}^{4^n-1}$ , we can construct the corresponding correlated error channel in stim by rescaling

$$\tilde{p}_i = \frac{p_i}{\prod_{i=0}^{i-1} (1 - \tilde{p}_i)} = \frac{p_i}{1 - \sum_{j=0}^{i-1} p_j}.$$
(C6)

As an example, consider a 2-qubit bit-flip channel with probabilities  $p(XI) = p_{XI}, p(IX) = p_{IX}, p(XX) = 0$  and  $p(II) = 1 - p_{XI} - p_{IX}$ . We cannot reconstruct the probability distribution of this channel using *independent* single qubit channels, because then  $p(XX) = p_{IX}p_{XI} + p_{XX} \neq 0$ . Using correlated errors, this 2-qubit bit-flip channel can be written as

For  $p_{IX} = p_{XI} = 0.01$ , this is

# 5. Decoding

From the noisy circuit, we construct a *detector error model* [45] that is a list of independent error mechanisms and the corresponding detectors and observables that are flipped by them. From this list, we can construct a parity check matrix or decoding graph used to configure a decoder like pymatching [46].

Note the *independence* of error mechanisms in the detector error model, while the effective Pauli channels give non-independent errors on the CZZ gate. Sampling of the noisy circuit is done with the correlated errors, decoding, however, is done on a detector error model that approximates disjoint error channels as independent. This is handled automatically in stim by setting the flag 'approximate\_disjoint\_errors=True'.

A run of a memory experiment gives the values of detectors, the *syndrome*, and values of observables, the *logical outcome*. The syndrome is then fed to the decoder - which returns a proposal of error mechanisms flipped that produce the same syndrome. These error mechanisms also produce values of observables, the *logical predictions*. If the actual outcome and the prediction differ, we count a logical error. We summarize the simulation workflow in Figure 16.

## 6. Additional simulations

We show the logical error rate of the 50 ns gate compared to the the 35 ns gate and CZ circuit with uniform noise in Fig. 17. The performance of the 50 ns gate is slightly worse, which can be explained by the smaller suppression of high-degree Pauli marginals compared to the 35 ns gate, cf. Fig. 7.



FIG. 16. Simulation workflow using stim and pymatching. The circuit is converted to a detector error model that enables fast decoding using a matching graph. While this relies on independent error mechanisms and decomposes them into a matchable problem, we compile a detector sampler directly from the circuit to sample the circuits detector and observable outcomes under the influence of the actual noise. Whenever the observable prediction of the decoder is different from the actual simulated outcome, we record a logical error.

#### 7. Finite size scaling analysis of thresholds

We simulate the circuits of the main text with physical error rates in the vicinity of the threshold and perform a finite size scaling analysis using pyfssa[47]. We show the results in Fig. 18. For both, rotated and unrotated surface codes, the threshold increases about 25% for a CZZ gate implementation with a uniform depolarizing noise model and another 30-40% with the effective Pauli channel.



FIG. 17. Logical error rates of the 50 ns gate compared to the the 35 ns gate and CZ circuit with uniform noise in Fig. 17. a) rotated and b) unrotated surface codes. The performance of the 50 ns gate is slightly worse, which can be explained by the smaller suppression of high degree Pauli marginals compared to the 35 ns gate.



FIG. 18. Threshold plots and finite size scaling analysis for circuits shown in the main text. a) - d) Rotated surface codes show an increase in threshold from  $p_{\rm th}^{\rm (CZ)} \approx 0.66\%$  to  $p_{\rm th}^{\rm (CZZ)} \approx 0.83\%$  and  $p_{\rm th}^{\rm (CZZ,epc)} \approx 1.2\%$ . e) - h) Unrotated surface codes show an increase in threshold from  $p_{\rm th}^{\rm (CZ)} \approx 0.66\%$  to  $p_{\rm th}^{\rm (CZZ)} \approx 0.84\%$  and  $p_{\rm th}^{\rm (CZZ,epc)} \approx 1.1\%$ .