Abstract
Machine learning for classification has seen numerous applications to complex, real-world tasks. Learned classifiers have become important artifacts of software systems that, like code, require careful analyses, maintenance, and evolution. Existing work on the formal verification of learned classifiers has mainly focused on the properties of individual classifiers, e.g., safety, fairness, or robustness, but not on analyzing the commonalities and differences of multiple classifiers. We present MLDiff, a novel approach to comparing learned classifiers based on querying agreements and disagreements between classifications, where one classifier is an alternative or variant of another. We present a prototypical implementation that leverages an encoding to SMT and can discover differences not (yet) seen in available datasets. Our prototype supports MLDiff for any combination of Decision Trees, Linear Support Vector Classification, Logistic Regression Classification, and Neural Networks. We evaluate it on classifiers trained on popular datasets in terms of performance and effectiveness of the analysis to discover disagreements between classifiers.