Adversarial attacks were first discovered in the context of deep neural networks (DNNs), where the networks’ gradients were used to produce small bounded-norm perturbations of the input that significantly altered their output. Such attacks target the increase of the model’s loss or the decrease of its accuracy and were shown to undermine the impressive performance of DNNs in multiple fields. The usually considered accessibility setting for adversarial attacks is “white-box” attacks, in which the attacks can access the weights and gradients of the model. However, attacks have also been shown to exist in a “black-box” setting, in which they can only access the input and output of the model.