We introduce a comprehensive evaluation of WAM across three modalities: audio, images, and volumes. WAM is benchmarked against standard gradient-based attribution techniques using Faithfulness, defined as the difference between Insertion and Deletion metrics. The evaluation spans ESC-50, ImageNet, and MedMNIST3D datasets, using a consistent set of models and attribution methods including Saliency, SmoothGrad, Integrated Gradients, and GradCAM.
WAM, particularly its path-integrated variant WAM-IG, consistently outperforms traditional attribution methods—especially for images and volumes—while remaining competitive for audio. The performance gain is attributed to path integration, which captures inter-scale dependencies and reveals the relative importance of each scale in the model's prediction. These results highlight the advantage of using wavelets over pixels as the basis for feature attribution.
For more details on the evaluation setup, datasets, models, and additional results, we refer the reader to the main paper and its appendix.
Method | Audio (ResNet, ESC-50) | Images (EfficientNet, ImageNet) | Volumes (3D Former, AdrenalMNIST3D) | ||||||
---|---|---|---|---|---|---|---|---|---|
Ins ↑ | Del ↓ | Faith ↑ | Ins ↑ | Del ↓ | Faith ↑ | Ins ↑ | Del ↓ | Faith ↑ | |
Integrated Gradients | 0.267 | 0.047 | 0.264 | 0.113 | 0.113 | 0.000 | 0.666 | 0.743 | -0.077 |
SmoothGrad | 0.251 | 0.067 | 0.184 | 0.129 | 0.119 | 0.010 | 0.680 | 0.731 | -0.051 |
GradCAM | 0.274 | 0.201 | 0.072 | 0.364 | 0.303 | 0.061 | 0.689 | 0.744 | -0.055 |
Saliency | 0.220 | 0.154 | 0.066 | 0.148 | 0.140 | 0.008 | 0.751 | 0.742 | 0.009 |
WAMIG (ours) | 0.436 | 0.260 | 0.176 | 0.447 | 0.049 | 0.370 | 0.719 | 0.621 | 0.098 |
WAMSG (ours) | 0.449 | 0.252 | 0.197 | 0.419 | 0.097 | 0.350 | 0.718 | 0.648 | 0.070 |