Indefinite Large Scale Kernel Approximation
To Loose or to Preserve Information?
Abstract.
Matrix approximations are a key element in large-scale algebraic machine learning approaches.
Focusing on similarities, a common assumption is to have positive semi definite (psd) kernel
functions as the underlying source.
This is often a too strong constraint and limits practical applications.
The approximation either cannot be validly applied or it changes the underlying
similarity function. Approximations could in fact also accidentally lead to non-psd representations.
In any case the modifications of the original or approximated similarities can have a severe
impact on the encoded information.
One may loose information or can introduced disturbances.
This is particular important if branch-and-bound approaches or mathematically
well principled error minimizers are used, to obtain predictive models.
Strategies to correct similarities beforehand or after the approximation are
often based on spectral corrections, embeddings or proxy approaches.
These corrections do often not scale to large data and counteract with the approximation.
We explain the problem setting and detail traditional and recent developments
in the domain of indefinite learning to correct symmetric similarities at large scale.
- Maximilian Muench, University of Groningen, The Netherlands, m.a.munch@rug.nl
- Simon Heilig, University of Applied Sciences Würzburg-Schweinfurt, Germany
- Frank-Michael Schleif, University of Applied Sciences Würzburg-Schweinfurt, Germany, frank-michael.schleif@fhws.de