Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models
Adrián Csiszárik, Melinda F. Kiss, Péter Kőrösi-Szabó, Márton Muntag, Gergely Papp, Dániel Varga
As recently discovered (Ainsworth-Hayase-Srinivasa 2022 and others), two wide neural networks with identical network topology and trained on similar data can be permutation-aligned. That is, we can shuffle their neurons (channels) so that linearly interpolating between the two networks in parameter space becomes a meaningful operation (linear mode connectivity).
We extend this notion by considering more general strategies to combine permutation-aligned networks. We investigate extensively which such strategies succeed and which ones fail. As an example, coordinate-wise randomly picking one of the two weights leads to a well-functioning combined network. This might suggest that the two networks are roughly identical functionally, and interpolation is vacuous. We demonstrate that this is not the case: there is actual interpolation in functional behavior.