Introduction

This project page briefly introduces the methodology described in the paper.

  • Lee, H. and Patrangenaru, V. (2020). Robust Extrinsic Regression Analysis for Manifold Valued Data

  • Github repository : RELR

Motivation

Consider two location measures

  • Mean : \[\begin{align*} \bar{\bf{x}} &= \operatorname{argmin}_{\bf{q}\in\mathbb{R}^p}\sum_{i=1}^n \Vert\bf{x}_i - \bf{q}\Vert^2 \\ &= \sum_{i=1}^n\frac{\bf{x}_i}{n} \end{align*}\]
    • The mean is overly influenced by outliers
  • Geometric Median : \[\begin{align*} \bf{m} = \operatorname{argmin}_{\bf{q}\in\mathbb{R}^p}\sum_{i=1}^n \Vert\bf{x}_i - \bf{q}\Vert \end{align*}\]
    • Robust to outliers, also known as the facility location problem
    • But, doesn't have a closed form solution

Toy example

Generate random numbers on \(\mathbb{R}^2\) and impose some outliers.

N       <- 100      # Number of Data        
n.out   <-  10      # Number of Outliers
X1      <- rnorm(N,0,0.5)
X2      <- rnorm(N,0,0.5)
X1[c(1:n.out)] <- X1[c(1:n.out)]+10
X2[c(1:n.out)] <- X2[c(1:n.out)]+10

Extrinsic Median

Definition

  • Population extrinsic median : Consider the unsquared Euclidean norm induced by \(J : \mathcal{M} \rightarrow E^d\) \[\begin{align*} {\bf{m}_E} = \operatorname{argmin}_{{\bf{q}}\in\mathcal{M}}\int_\mathcal{M} \Vert J( {\bf{x}}) - J( {\bf{q}}) \Vert P(d\bf{x}) \end{align*}\]
  • Empirical extrinsic median : Suppose \(\bf{x}_1, \cdots, \bf{x}_n\) are i.i.d copies of \(\mathcal{M}\) valud r.v. \(\bf{X}\). \[\begin{align*} \bf{\hat{m}}_E &= \operatorname{argmin}_{\bf{q}\in\mathcal{M}} \sum_{i=1}^n \Vert J({\bf{x_i}})-J(\bf{q}) \Vert\\ &= J^{-1} \left( \mathcal{P} \left(\operatorname{argmin}_{m \in E^d} \sum_{i=1}^n \Vert J(\bf{x}_i)-m \Vert \right)\right), \end{align*}\] where \(\mathcal{P} : E^d \rightarrow J(\mathcal{M})\) is the projection onto the image of the embedding.

Q. How to solve the above problem ? Weiszfeld's Algorithm

Applications

1. Circle

Consider two different situations

  • Scenario 1 : von Mises distribution contaminated with outliers \[\begin{align*} f(\theta) = \frac{e^{\kappa\cos(\theta-\mu)}}{2\pi I_0(\kappa)}, \end{align*}\]

    where \(I_0(\kappa)\) is the modified Bessel function of order 0. We add outliers using normal dist \(N(\mu_{\text{out}}, \sigma^2)\), where \(\mu_{\text{out}} \neq \mu\)

  • Scenario 2 : Wrapped \(\alpha\) stable distribution \[\begin{align*} f(\theta) = \frac{1}{2\pi} + \frac{1}{\pi} \sum_{k=1}^\infty \exp(-\tau^\alpha k^\alpha)\cos \left( k(\theta-\mu) -\tau^\alpha k^\alpha \beta \tan \frac{\alpha \pi}{2} \right), \end{align*}\]

    where \(\tau \geq 0\) and \(\vert\beta\vert \leq 1\) denote the scale and skewness parameter, respectively.

2. Planar Shape

  • Result

Robust Extrinsic Local Regression (RELR)

Method

  • The population robust extrinsic regression function : \[\begin{align*} F(x) &= \operatorname{argmin}_{q \in \mathcal{M}} \int_\mathcal{M} \Vert J(q) - J(y) \Vert P(dy \vert x)\\ &= \operatorname{argmin}_{q \in \mathcal{M}} \int_\mathcal{\widetilde{\mathcal{M}}} \Vert J(q) - z \Vert \widetilde{P}(dz \vert x) , \ \text{where}\notag \end{align*}\]

    \(\widetilde{P}(\cdot \vert x) = P(\cdot \vert x) \circ J^{-1}\) is the conditional probability measure on \(J(\mathcal{M})\) given \(x\) induced by \(P(\cdot \vert x)\) via embedding \(J\)

  • Estimation :
    Given \(\mathcal{D} = \{{\bf{x}}_i,{\bf{y}}_i\}_{i=1}^n\), where \({\bf{x}}_i \in \mathbb{R}^p, {\bf{y}}_i \in \mathcal{M}\)
    \[\begin{align*} \hat{F}({\bf{x}}) = J^{-1} \left( \mathcal{P} \left( \operatorname{argmin}_{{\bf{y}} \in E^d} \underbrace{\sum_{i=1}^n \frac{K_H({\bf{x}}_i - {\bf{x}}) \Vert{{\bf{y}}-J({{\bf{y}}_i)\Vert}}}{\sum_{j=1}^n K_H({\bf{x}}_j - {\bf{x}})}}_{f({\bf{y}})} \right)\right) \end{align*}\]
    1. \(K : \mathbb{R}^p \rightarrow \mathbb{R}\) s.t. \(\int K(x)dx = 1\) and \(\int xK(x) = 0\)

    2. \(H = \text{Diag} (h_1, \cdots, h_p)\) with \(h_j > 0\)

    3. \(K_H(x) = \frac{1}{\vert H\vert} K(H^{-1}x)\), where \(\vert H \vert = h_1 \cdots h_p\)

  • Algorithm for RELR :

Simulation on \(\Sigma_2^k\)

  • Generate covariates and shapes \[\begin{align*} & \textbf{Generate Covariate :} \ X_p \sim \text{Unif}(a,b) \in \mathbb{R}^P\\[.5em] & \textbf{Coefficient :} \ \beta = \{1/K^2,\cdots,K/K^2\} \in \mathbb{R}^K\\[.5em] & \textbf{Generate Intercept angles :}\ \phi_{0k} = \{1/2,\cdots,K/2\} \\[.5em] & \textbf{Generate Intercept radius :}\ r_{0k} = \{0.1,\cdots,0.1\}\\[.5em] & \textbf{Generate Shape angles :}\ \phi_{k}^\prime \sim N\left(\phi_{0k} + \sum_{p=1}^P \beta_k X_{p}, \sigma_\phi^2\right)\\[.5em] & \textbf{Generate Shape radius :}\ r_{k} \sim N\left(r_{0k} + \sum_{p=1}^P \beta_k X_{p}, \sigma_r^2\right)\\[.5em] & \textbf{Standardize angles} : \phi_k : {\phi_k^\prime}\hspace{-.3cm}\pmod{2\pi}\\[.5em] & \textbf{Convert to complex form for the landmark} : z_k = r_k(\cos(\phi_k) + i \sin(\phi_k)) \end{align*}\]
    • We add some outliers to some landmarks
  • A Sample of the above simulation setting and fits (Univariate covariate)

    • To ease visualize, only the first landmark is contaminated
    • Covariate values are colored in the left panel
  • Result