Machine learning for non-metric proximity data: July 2022

Sunday 10 July 2022

SIAM - AN 22 - Indefinite Large Scale Kernel Approximation To Loose or to Preserve Information?

Indefinite Large Scale Kernel Approximation

To Loose or to Preserve Information?

Abstract.

Matrix approximations are a key element in large-scale algebraic machine learning approaches.
Focusing on similarities, a common assumption is to have positive semi definite (psd) kernel
functions as the underlying source.
This is often a too strong constraint and limits practical applications.
The approximation either cannot be validly applied or it changes the underlying
similarity function. Approximations could in fact also accidentally lead to non-psd representations.
In any case the modifications of the original or approximated similarities can have a severe
impact on the encoded information.

One may loose information or can introduced disturbances.
This is particular important if branch-and-bound approaches or mathematically
well principled error minimizers are used, to obtain predictive models.
Strategies to correct similarities beforehand or after the approximation are
often based on spectral corrections, embeddings or proxy approaches.
These corrections do often not scale to large data and counteract with the approximation.
We explain the problem setting and detail traditional and recent developments
in the domain of indefinite learning to correct symmetric similarities at large scale.

Authors

Maximilian Muench, University of Groningen, The Netherlands, m.a.munch@rug.nl
Simon Heilig, University of Applied Sciences Würzburg-Schweinfurt, Germany
Frank-Michael Schleif, University of Applied Sciences Würzburg-Schweinfurt, Germany, frank-michael.schleif@fhws.de

SIAM AN 22 - Slides

Machine learning for non-metric proximity data

Sunday 10 July 2022

SIAM - AN 22 - Indefinite Large Scale Kernel Approximation To Loose or to Preserve Information?

Indefinite Large Scale Kernel Approximation

To Loose or to Preserve Information?

Friday 1 July 2022

Application of indefinite learning in the life sciences (open access)

Sparsification of core set models in non-metric supervised learning