Quantitative evaluation

We introduce a comprehensive evaluation of WAM across three modalities: audio, images, and volumes. WAM is benchmarked against standard gradient-based attribution techniques using Faithfulness, defined as the difference between Insertion and Deletion metrics. The evaluation spans ESC-50, ImageNet, and MedMNIST3D datasets, using a consistent set of models and attribution methods including Saliency, SmoothGrad, Integrated Gradients, and GradCAM.

WAM, particularly its path-integrated variant WAM-IG, consistently outperforms traditional attribution methods—especially for images and volumes—while remaining competitive for audio. The performance gain is attributed to path integration, which captures inter-scale dependencies and reveals the relative importance of each scale in the model's prediction. These results highlight the advantage of using wavelets over pixels as the basis for feature attribution.

For more details on the evaluation setup, datasets, models, and additional results, we refer the reader to the main paper and its appendix.

Faithfulness (Faith), Insertion (Ins), and Deletion (Del) scores across modalities. Bolded values are the best, second best are underlined.
Method Audio (ResNet, ESC-50) Images (EfficientNet, ImageNet) Volumes (3D Former, AdrenalMNIST3D)
Ins ↑Del ↓Faith ↑ Ins ↑Del ↓Faith ↑ Ins ↑Del ↓Faith ↑
Integrated Gradients 0.2670.0470.264 0.1130.1130.000 0.6660.743-0.077
SmoothGrad 0.2510.0670.184 0.1290.1190.010 0.6800.731-0.051
GradCAM 0.2740.2010.072 0.3640.3030.061 0.6890.744-0.055
Saliency 0.2200.1540.066 0.1480.1400.008 0.7510.7420.009
WAMIG (ours) 0.4360.2600.176 0.4470.0490.370 0.7190.6210.098
WAMSG (ours) 0.4490.2520.197 0.4190.0970.350 0.7180.6480.070