Szabo, Z;
Sriperumbudur, B;
Poczos, B;
Gretton, A;
(2015)
Learning Theory for Vector-Valued Distribution Regression.
Presented at: CMStatistics 2015, London, United Kingdom.
Preview |
Text
Zoltan_Szabo_invited_talk_CMStatistics_12_12_2015.pdf Download (1MB) | Preview |
Abstract
We focus on the distribution regression problem (DRP): we regress from probability measures to Hilbert-space valued outputs, where the input distributions are only available through samples (this is the 'two-stage sampled' setting). Several important statistical and machine learning problems can be phrased within this framework including point estimation tasks without analytical solution (such as entropy estimation), or multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees to solve the DRP task requires density estimation (which often performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the DRP task: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. We prove that this scheme is consistent under mild conditions, and construct explicit finite sample bounds on its excess risk as a function of the sample numbers and the problem difficulty, which hold with high probability. Specifically, we establish the consistency of set kernels in regression, which was a 15-year-old-open question, and also present new kernels on embedded distributions. The practical efficiency of the studied technique is illustrated in supervised entropy learning and aerosol prediction using multispectral satellite images.
Type: | Conference item (Presentation) |
---|---|
Title: | Learning Theory for Vector-Valued Distribution Regression |
Event: | CMStatistics 2015 |
Location: | London, United Kingdom |
Dates: | 12 - 14 December 2015 |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | http://cmstatistics.org/CMStatistics2015/ |
Language: | English |
Additional information: | Preprint: "http://arxiv.org/abs/1411.2066", code: "https://bitbucket.org/szzoli/ite/". Abstract: "http://www.gatsby.ucl.ac.uk/~szabo/talks/invited_talk/Zoltan_Szabo_invited_talk_CMStatistics_12_12_2015_abstract.pdf" |
Keywords: | Distribution regression, two-stage sampling, mean embedding, convergence rate, set kernel, consistency. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/1469659 |
Archive Staff Only
![]() |
View Item |