I am trying to understand how to apply custom gates in Qiskit using this Pennylane tutorial . It shows the type-AIII Cartan decomposition method for block-diagonal matrices of the form: which can be implemented using control gates: How can this be more generalized? For example, if $K$ contains 4 block matrices: $$ K = \begin{pmatrix} A & 0 & 0 & 0 \\ 0 & B & 0 & 0 \\ 0 & 0 & C & 0 \\ 0 & 0 & 0 & D \\ \end{pmatrix}$$ where $A,B,C,D$ are 4x4 unitary matrices, what would the circuit look like?