This repository is dedicated to the writing of Python scripts enabling the realization of quantum control tasks based on closed-loop optimization that incorporates measurements. We build upon the work of Volodymir Sivak on Model-Free Quantum Control with Reinforcement Learning (https://link.aps.org/doi/10.1103/PhysRevX.12.011059) to enable a generic framework for arbitrary state preparation and quantum gate calibration based on Qiskit modules. The interest here is that this repo enables the execution of the algorithm over both simulation and real IBM Backends seamlessly, and therefore enables an in-depth analysis of the robustness of the method for a variety of different backends. In the future, we will add compatibility with further providers. Additionally, this repository is dedicated to the investigation of contextual gate calibration.
The repo is currently divided in five main folders. The first one is the folder "paper_results", where adaptation of the state preparation algorithm developed by Sivak for bosonic systems is adapted to a multiqubit system simulated with a Qiskit QuantumCircuit
running on a QasmSimulator
backend instance.
The two other folders extend the state preparation protocols by integrating quantum gate calibration procedures, that can be described using a usual gate level abstraction (that is one seeks to optimize gate parameters, in the spirit of a variational quantum algorithm), or at the pulse level abstraction, for more expressivity and harware-aware noise suppression. The novelty of this gate calibration procedure is that it can be tailormade to a quantum circuit context, enabling the suppression of spatially and temporally correlated noise.
We briefly explain the overall logic of the algorithm below.
Reinforcement Learning (RL) is an interaction-based learning procedure resulting from the interplay between two entities:
- an Environment: in our case the quantum system of interest formed of
$n$ qubits, characterized by a quantum state$\rho\in \mathcal{H}$ , where the dimension of Hilbert space$\mathcal{H}$ is$d=2^n$ and from which observations/measurements$o_i \in \mathcal{O}$ can be extracted. Measurements could be retrieved as bitstrings ($\mathcal{O}= {0, 1}^{\otimes n}$ ) or potentially as IQ pairs if state discrimination is not done, as shown in this recent work (https://arxiv.org/abs/2305.01169). The point here is that we restrict the observation space to values that can be retrieved from the quantum device. Usually, we do not have direct access to typical quantum control benchmarks such as state or gate fidelity, making the learning procedure more difficult as our quantum environment (in the RL sense) is partially observable. - an Agent: a classical Neural Network on the PC, whose goal is to learn a stochastic policy
$\pi_\theta(a|o_i)$ from which actions$a \in \mathbb{U}$ are sampled to be applied on the Environment to set it in a specific target state. Typically, the goal of the Agent in the frame of Quantum Optimal Control is to find actions (e.g., circuit/pulse parameters) enabling the successful preparation of a target quantum state, or the calibration of a quantum gate operation.
Quantum computing is entering a phase where the prospect of obtaining a comparative advantage over classical computing is now foreseeable in the near future. However, one of the main obstacles to the emergence of this technology lies in the fact that quantum processors are extremely sensitive to environmental disturbances. These disturbances lead to errors in quantum information processing, making it difficult to interpret results when executing a quantum algorithm. Numerous research efforts have been made in recent years to limit error rates (e.g., improving the quality of quantum processors, synthesizing logic gates more robust to noise) and correct residual errors (quantum error correction). Although these methods are all promising for enabling the execution of more ambitious algorithms on large-scale quantum processors, they currently do not provide robustness to dynamic and contextual errors. Specifically, we are interested in errors that emerge when a specific quantum circuit is executed on the processor, whose physical origin can only be attributed to the unique sequence of logic gates executed within this circuit. This circuit context may generate errors carrying either temporal (e.g., non-Markovian noise) or spatial dependency (undesired crosstalk between neighboring qubits on the processor). As this spatio-temporal dependency can be difficult to model or effectively characterize on intermediate-sized processors, we propose an algorithm based on model free reinforcement learning to determine the best possible calibration of error-prone quantum gates that are tailor-made for a specific quantum circuit context.
What we want to achieve here is to show that model-free quantum control with RL (Sivak et. al, https://link.aps.org/doi/10.1103/PhysRevX.12.011059) is able to capture the contextual dependence of such gate calibration task and to provide suitable error suppression strategies. In the examples currently available, we explain the overall logic of the algorithm itself, and detail later how the context dependence can be encompassed through a tailor-made training procedure.
As explained above, our framework builds upon the idea of transforming the quantum control task into a RL problem where an agent is missioned to probe and act upon an environment in order to steer it into a target state. The way this is usually done in RL is to introduce the concept of a reward signal
Where the expectation value is empirically taken over a batch of actions sampled from the policy
The fidelity between our desired pure state
Let
Define the characteristic function
Since those Pauli operators form an orthogonal basis for the density operators under the Hilbert-Schmidt product, one can see that:
Note that for our target state
Once those indices have been sampled, we compute the expectation values
We can define a single shot reward
Our single shot reward
It is easy to see that we have:
$$\mathbb{E}{\pi\theta, k\sim\mathrm{Pr}(k), \sigma} [R]=F(\rho, \sigma)$$
For a gate calibration task, we want to find parameters defining the quantum gate
In the code, each average is empirically done over specific hyperparameters that must be tuned by the user at each experiment:
- For the average over the policy (): the user should adjust the
batchsize
hyperparameter, which indicates how many actions from one policy with fixed parameters should be drawn. - For the average over input states: at each episode of learning, we sample a different random input state
$|\psi_{input}\rangle$ for which the corresponding target state$|\psi_{target}\rangle= G_{target}|\psi_{input}\rangle$ is calculated. Once the target state is deduced, we can start the episode by applying our parametrized gate and measuring the set of Pauli observables to design the reward circuit (Pauli basis rotation at the end of the circuit). - For the average over the random Pauli observables to sample: the user should adjust the
sampling_Pauli_space
hyperparameter. This number will set the number of times an index$k$ shall be sampled based on the previously defined probability$\mathrm{Pr}(k)$ . If the same index is sampled multiple times, this induces that the same Pauli observable will be sampled more accurately as the number of shots for one Pauli observable will scale as "number of times$k$ was sampled$\times N_{shots}$ ". However, due to a constraint of Qiskit, it is not possible to adjust the number of shots for each individual observable. Moreover, it could be that some observables can be inferred from the same measurements (commuting observables). Due to those facts, we settle for - The average over the number of shots to estimate each Pauli observable with a certain accuracy: The user should adjust the parameter
n_shots
, this will be the minimum number of samples that each observable will be estimated with.
Finally, the last hyperparameter of the overall RL algorithm is the number of epochs n_epochs
, which indicates how many times the policy parameters should be updated to try to reach the optimal near-deterministic policy enabling the successful preparation of the target state/gate.
Our repo relies on the modelling of the quantum system as a class QuantumEnvironment
, which is characterized by the following attributes:
target: Dict
: a dictionary containing information about the target that the RL agent should prepare. It could either be a target state or a target gate. This dictionary should contain the following keys:gate: qiskit.circuit.Gate
: if the target type is a gate, which quantum gate is the target the agent should calibratecircuit: qiskit.circuit.QuantumCircuit
: If target type is a state, provides an ideal quantum circuit that should prepare the target state of interestdm: qiskit.quantum_info.DensityMatrix
: Alternatively to thecircuit
key, one can provide aDensityMatrix
describing the target state of interest (if target type is a state).register: Union[QuantumRegister, List[int]]
: List of qubits on which the target state/gate shall be prepared. The indices provided in this list should match the number of qubits addressed by the target gate/state.input_states: Optional[List[Dict[Union["circuit", "dm"]]: Union[qiskit.circuit.QuantumCircuit, qiskit.quantum_info.DensityMatrix]]]
: If target type is a gate, optional list of possible input states that should be prepared for forcing the agent to perform simultaneous successful state preparation. In order to maximize learning, the set of input states should be tomographically complete. Each input state should be provided as if it was a target state (either provide a quantum circuit enabling its perfect preparation, or a DensityMatrix). Note that it does not need to be provided, and the default will load a tomographically complete set.
abstraction_level: str = Union["gate", "pulse"]
: The abstraction level at which the parametrized circuit enabling the target preparation is expressed. If the gate level is chosen, then the simulator that is used is a Statevector simulator, and significantly speeds up the simulation, unless anotherAerBackend
object is provided in theQiskit_config
below. Contrariwise, if the pulse level is chosen, all gates provided in the parametrized circuit should contain a dedicated pulse description, and the simulation of the pulse schedule associated to the circuit is done through Qiskit Dynamics module (https://qiskit.org/documentation/dynamics).Qiskit_config: QiskitConfig
: A dataclass object containing necessary information to execute the algorithm on real/simulated hardware. It should contain the following keys:backend: Optional[qiskit.providers.Backend]
: An IBM Backend on which should be executed the overall workflow. If a real backend is selected, then the user should ensure that he has opened an IBM account to access the resource, and that this backend is selected among the ones enabling the execution of Qiskit Runtime (see code examples about how to declare such real backend). If theabstraction_level
is set to"gate"
, and the user wants to run on a simulated noiseless backend, thebackend
can be set toNone
. Alternatively, the user can provide a customAerBackend
with a custom noise model. Finally, if the user wants to run the workflow on a simulated backend at the pulse level, aqiskit_dynamics.DynamicsBackend
instance should be provided (see pulse level abstraction folder).parametrized_circuit: Callable[qiskit.circuit.QuantumCircuit, Optional[qiskit.circuit.ParameterVector], Optional[qiskit.circuit.QuantumRegister]]
: A function taking as input aQuantumCircuit
instance and appending to it a custom parametrized gate (specified either as a pulse schedule or a parametrized gate depending on the abstraction level). This is the key function that should introduce circuit parameters (qiskit.circuit.Parameter
orqiskit.circuit.ParameterVector
) which will be replaced by the set of actions that the agent will apply on the system.- Other optional arguments can be provided if a real backend or pulse level abstraction are selected. They can be found in the notebook contained in the pulse level abstraction folder.
sampling_Pauli_space: int
: As mentioned earlier, indicates how many indices should be sampled for estimating the fidelity from Pauli observables.n_shots: int
: Minimum number of shots per expectation value sampling subroutine.c_factor: float
: Normalization factor for the reward, should be chosen such that average reward curve matches average gate fidelity curve (if target type is a gate) or state fidelity curve (if target type is a state).
While a QuantumCircuit
can be simulated using Statevector simulation when only dealing with the circuit level abstraction, and does not necessarily require an actual
backend (although one might want to use the ibmq_qasm_simulator
), simulating the circuit by providing a pulse level description requires a dedicated simulation backend.
As our main code revolves around Qiskit structure, we use the recently released Qiskit Dynamics extension, which is a framework for simulating arbitrary pulse schedules.
More specifically, we leverage the introduction of the DynamicsBackend
object to emulate a real backend, receiving Pulse gates (you can learn more by checking Qiskit and Dynamics documentations).
The DynamicsBackend can be declared in two main ways:
- Using the
from_backend(BackendV1)
method, which creates aDynamicsBackend
carrying all the properties (in particular the Hamiltonian) of a real IBM backend. - Creating a custom Hamiltonian (or Linbladian) describing our quantum system of interest, and declare a
Solver
object which will be provided to the backend.
We provide examples of both declarations in the pulse_level_abstraction folder.
The subtlety behind this DynamicsBackend
is that this backend does not carry calibrations for elementary quantum gates on the qubits.
This is an issue, as the current reward scheme is built on the ability to perform Pauli expectation sampling on the quantum computer.
Indeed, since we only have access to a Z-basis measurement in the backend, one needs to be able to perform Pauli basis rotation gates in order to ensure we can extract all Pauli operators. As a consequence, each qubit should have at the start calibrated Hadamard and S gates. The whole idea is to provide such baseline calibration for the
custom DynamicsBackend
instance. To do so, we use the library of elementary calibrations provided in the Qiskit Experiments module (which provide calibration workflows for elementary gates).
Therefore, each time the QuantumEnvironment
object is initialized with a DynamicsBackend
object, a series of baseline calibrations will automatically start, such that the Estimator primitive (in this case the BackendEstimator
present in Qiskit primitives)
can append the appropriate gates for all Pauli expectation value sampling tasks.
In this repo, there are mainly two RL workflows that are used: Tensorflow 2 and PyTorch. While Tensorflow is used for the earlier examples of state preparation and gate calibration, we have for now fixed the context aware calibration workflow on PyTorch for more clarity. More specifically, the whole RL workflow is based on a "Gymified" QuantumEnvironment
called TorchEnvironment
and an Agent built on Torch modules. The implementation of the PPO follows closely the one proposed by the CleanRL library (http://docs.cleanrl.dev/).
In what follows, we explain the logic behind the context-aware gate calibration workflow.
The main motivation behind this work comes from the realization that in most traditional gate calibration procedures, the noise that could emerge from a specific sequence of logical operations is often neglected or averaged out over a large series of different sequences to build an "on-average" noise robust calibration of the target gate. While this paradigm inherent to the gate model modular structure is certainly convenient for the execution of complex quantum circuits, we have to acknowledge that this might not be enough to reach satisfying circuit fidelities in the near-term. In fact, spatio-temporally correlated noise sources such as non-Markovianity or crosstalk could impact significantly the quality of a typical gate. It is to be noted that such effect is typically dependent on the operations that are executed around the target gate. Characterizing this noise is known to be tedious because of the computational heaviness of the process (e.g. Simultaneous RB for crosstalk, Process tensor tomography for non-Markovianity). In typical experiments, we first spend some time characterizing the noise channels, and then derive a series of gate calibrations countering all the effects simultaneously. In our quantum control approach, we decide to intertwine the characterization and the calibration procedure, by letting the RL agent build from its training an internal representation of the contextual noise concurrently to the trial of new controls for suppressing it.
Earlier, we have defined our QuantumEnvironment
to include all details regarding the reward computation and the quantum system access (simulation or real backend). In this new TorchQuantumEnvironment
class, we include all the details regarding the calibration done within a circuit context training.
For this, we first declare the TorchQuantumEnvironment
as follows:
[1] V. V. Sivak, A. Eickbusch, H. Liu, B. Royer, I. Tsioutsios, and M. H. Devoret, “Model Free Quantum Control with Reinforcement Learning”, Physical Review X, vol. 12, no. 1, p. 011 059, Mar. 2022
[2] S. T. Flammia and Y.-K. Liu, “Direct Fidelity Estimation from Few Pauli Measurements”, Physical Review Letters, vol. 106, no. 23, p. 230 501, Jun. 2011, Publisher: American Physical Society. DOI: 10.1103/PhysRevLett.106.230501. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.106.230501