4.2 Quantile regression via a family of hyperplanes

We have seen that, as a way to allow for greater flexibility in the functional form of the parameter \(\beta,\) the covariates in a globally concerned quantile regression model are typically required to lie in a hypercube of the form \([{m}_1,\bar{m}_1]\times\cdots[{m}_{\mathrm{D}_X},\bar{m}_{\mathrm{D}_X}]\) for some constants \(0\le m_d \le \bar{m}_d,\) \(d\in\{1,\dots,{\mathrm{D}_X}\}.\) Up to translation and rescaling, we can in fact assume that \(\mathscr{X} \subseteq [0,1]^{\mathrm{D}_X},\) and we can even require that \(0\) and \(1\) lie in the support of each non-constant covariate,¹³ as the following example clarifies.

Example 4.6 Suppose that the quantile regression model (4.3) holds, and that \(\operatorname{support}(X_d) \subseteq [{m}_d,\bar{m}_d]\) for some constants \(0\le m_d \le \bar{m}_d,\) \(d\in\{2,\dots,{\mathrm{D}_X}\}.\) We are implicitly assuming that \({m}_d\) is the greatest constant for which these inclusions hold, and similarly that \(\bar{m}_d\) is the least constant for which they hold.¹⁴ Without loss of generality, we make the assumption that \(X_1 = 1,\) as nothing precludes \(\beta_1(\cdot)\) from being the zero function (this would be the “no intercept” setting). As mentioned earlier, we are also assuming that covariates are linearly independent, and this tells us, in particular, that no covariate other than \(X_1\) is constant, which in turn implies \(\bar{m}_d > {m}_d\) for \(2\le d\le{\mathrm{D}_X}.\) Now let \[ \tilde{X}_d = {(X_d-{m}_d)}/{(\bar{m}_d - {m}_d)},\quad 2\le d\le{\mathrm{D}_X}. \tag{4.4} \] Writing \(\tilde{X} = (1\quad \tilde{X}_2\cdots\tilde{X}_{\mathrm{D}_X})^\prime,\) and with the slighty technical remark that conditioning on \(\tilde{X}\) is the same as conditioning on \(X,\) we then see that the equality \[ Q_{Y|\tilde{X}}(\tau|x) = x^\prime\tilde{\beta}(\tau) \] holds for \(\tau\in(0,1)\) and conformable \(x\in\operatorname{support}(\tilde{X})\subseteq \{1\}\times[0,1]^{{\mathrm{D}_X}-1},\) where \(\tilde{\beta}\colon(0,1)\to\mathbb{R}^{\mathrm{D}_X}\) is defined componentwise through \[ \tilde{\beta}_1 = \beta_1 + \sum_{d=1}^{\mathrm{D}_X}{m}_d\beta_d\quad\textsf{and}\quad \tilde{\beta_d} = (\bar{m}_d - m_d)\beta_d,\,d\ge2. \]

Exercise 4.2 Prove the validity of each assertion in example 4.6. Fill in with the details where necessary. \(\blacksquare\)

So far, I hope that the reader is convinced that the globally concerned quantile regression model allowing an a priori flexible functional form for the parameter \(\beta\) can always be thought of — at least analytically — as a model in which the covariates “include a constant” and are bound to lie in the \({\mathrm{D}_X}\)-dimensional unit cube; in other words, we can always consider \(\mathscr{X} \subseteq \{1\}\times [0,1]^{{\mathrm{D}_X}-1}.\) However, we cannot force the latter inclusion to be an equality since this would preclude the scenario where some covariates are functionally related, for example when we have a polynomial in a certain variable.

Example 4.7 Assume \(X = (1\quad Z\quad Z^2)^\prime,\) where \(Z\) is uniformly distributed in the unit interval. Then the support of \(X\) is the set \[ \mathscr{X} = \{x\in\mathbb{R}^3\colon\,x_1 = 1,\,x_2\in[0,1],\,x_3=x_2^2\} \] which is a strict subset of \(\{1\}\times [0,1]^{2}.\,\blacksquare\)

In general it is not straightforward to constructively exhibit a non-trivial functional parameter \(\tau\mapsto\beta(\tau)\) for which the mappings \[ \tau\mapsto x^\prime\beta(\tau) \] are non-decreasing and left-continuous for all \(x\in \mathscr{X},\) even after “easing up” a little bit to ensure that \(\mathscr{X}\subseteq\{1\}\times[0,1]^{{\mathrm{D}_X}-1}.\) In fact, any conditions ensuring that a function \(\beta\colon(0,1)\to\mathbb{R}^{{\mathrm{D}_X}}\) has the preceding attributes will necessarily be tied to the topology and geometry of \(\mathscr{X}.\) When all the covariates are non-negative, a simple sufficient condition is requiring that each of the coefficients \(\tau\mapsto \beta_d(\tau)\) be non-decreasing. This is somewhat restrictive, however, as in this setting, the quantile regression model (4.3) tells us that the covariates impact mostly the location and dispersion of the response. The example below illustrates this fact.¹⁵

Example 4.8 Assume (4.3) holds, with \(X_1=1\) and with \((X_2\quad X_3)^\prime\) uniformly distributed in the unit square, so \(\mathscr{X} = \{1\}\times [0,1]\times [0,1].\) For \(\tau\in(0,1),\) let \(\beta_1(\tau) = 3\tau/4,\) \(\beta_2(\tau) =\tau/4+1\) and \(\beta_3(\tau) = \tau/4 - 1.\) Thus, we have for example \[\begin{align} Y\,|\,(X_2=0,X_3=0) &\sim \textsf{Uniform}[0,3/4)\\ Y\,|\,(X_2=1,X_3=0) &\sim \textsf{Uniform}[1,2)\\ Y\,|\,(X_2=0,X_3=1) &\sim \textsf{Uniform}[-1,0)\\ Y\,|\,(X_2=1,X_3=1) &\sim \textsf{Uniform}[0,5/4) \end{align}\] and so on. The below figure displays a scatterplot of \(n\) points drawn from this model.

n = 200
U = runif(n)
beta1 = function(tau) 3*tau/4
beta2 = function(tau) tau/4+1
beta3 = function(tau) tau/4-1
X2 = runif(n)
X3 = runif(n)
Y = beta1(U) + beta2(U)*X2 + beta3(U)*X3
dat = data.frame(Y = Y, X2 = X2, X3 = X3)
plot_ly(dat, x = ~X2, y = ~X3, z = ~Y,
 type="scatter3d",
 mode="markers",
 marker = list(color = ~Y, colorscale = c('#FFE1A1', '#683531')
  )
) %>% layout(scene = 
  list(camera = 
    list(eye = list(x = 0, y = -1.5, z = 0)),
    aspectratio = list(x = 1, y = 1, z = 1/2)
   )
 )

In general, it can be challenging to analytically exhibit coefficient functions \(\tau\mapsto\beta(\tau)\) that yield monotonicity of the linear form \(\tau\mapsto x^\prime\beta(\tau)\) over a relevant range of \(x\)’s. The next result provides a constructive approach to obtain conditional quantile functions with \(\mathscr{X} = \{1\}\times [0,1]^{2}.\) This is particularly useful for Monte Carlo simulation studies. I’m stating the result for the case \({\mathrm{D}_X}= 3\) for simplicity, but a generalization to the case \({\mathrm{D}_X}> 3\) is not too difficult to obtain.

Proposition 4.1 Assume that \(v_{d}\colon(0,1)\to\mathbb{R},\) \(d\in\{0,1\}^2,\) are non-decreasing, left-continuous functions satisfying the requirements

\(v_{11} = v_{10} + v_{01} - v_{00}\).
the mapping \(\tau\mapsto v_{11}(\tau)\) is non-decreasing.

Let \(X\) be a random vector in \(\mathbb{R}^3\) with \(\mathscr{X} \subseteq \{1\}\times[0,1]^2,\) and let \(U\) be a scalar random variable uniformly distributed on the unit interval, independent of \(X.\) Then the random variable \(Y\) defined via \[ Y = v_{00}(U) + \big(v_{10}(U) - v_{00}(U)\big)X_2 + \big(v_{01}(U)-v_{00}(U)\big)X_3 \] satisfies, for any \(\tau\in(0,1)\) and all \(x\in\mathscr{X},\) \[ Q_{Y|X}(\tau\,|\,x) = \beta_0(\tau) + \beta_1(\tau)x_2 + \beta_3(\tau)x_3 \] where \(\beta_1 = v_{00},\) \(\beta_2 = v_{10} - v_{00}\) and \(\beta_3 = v_{01} - v_{00}.\)

Proof. The proof amounts to showing that the mapping \[ \tau\mapsto v_{00}(\tau) + (v_{10}(\tau) - v_{00}(\tau))x_2 + (v_{01}(\tau)-v_{00}(\tau))x_3 \] is non-decreasing and left-continuous, for any \((x_2,x_3)\in[0,1]^2.\) The assertion then follows from the conditional version of the Fundamental Theorem of Simulation. Left-continuity is immediate. For monotonicity, write \[ p_\tau(x_2,x_3) := v_{00}(\tau) + (v_{10}(\tau) - v_{00}(\tau))x_2 + (v_{01}(\tau)-v_{00}(\tau))x_3 \] for \(\tau\in(0,1)\) and \((x_2,x_3)\in[0,1]^2.\) If \(\varsigma\ge\tau,\) then the assumption of monotonicity (item 2) tells us that, in each one of the four vertices \((0,0),\) \((1,0),\) \((0,1)\) and \((1,1),\) the affine hyperplane \(p_\varsigma\) lies above the affine hyperplane \(p_\tau.\) It is clear that this implies \(p_\varsigma(x_2,x_3)\ge p_\tau(x_2,x_3)\) for any \((x_2,x_3)\) in the unit square, completing the proof.

Remark. To generalize Proposition 4.1 to dimensions \({\mathrm{D}_X}>3\) one needs to specify functions \(v_d\colon(0,1)\to\mathbb{R},\) \(d\in\{0,1\}^{{\mathrm{D}_X}-1}\) that are non-decreasing and left-continuous. These correspond to the values of the associated affine hyperplanes at the vertices of the \({\mathrm{D}_X}-1\) dimensional hypercube \([0,1]^{{\mathrm{D}_X}-1}\): \(v_{0\cdots0}\) is the “intercept” (the height of the hyperplane at the origin), \(v_{10\cdots0}\) is the height of the hyperplane at vertex \((1 \quad 0 \cdots 0)^\prime\in\mathbb{R}^{{\mathrm{D}_X}-1}\) and so on. Nonetheless, the number of “compatibility conditions” grows much faster than the dimension. For instance, with \({\mathrm{D}_X}=4\) the functions \(v_d\colon(0,1)\to\mathbb{R},\) \(d\in\{0,1\}^{{\mathrm{D}_X}-1}\) must be chosen in such a way that all of the mappings below are non-decreasing in \(\tau\) \[\begin{align} v_{100}+v_{010} - v_{000} &\qquad v_{100}+v_{001}-v_{000}\\ \\ v_{010}+v_{001} - v_{100} &\qquad v_{100} + v_{010} + v_{001} - 2v_{000} \end{align}\]

A second important remark here is that Proposition 4.1 lays out sufficient conditions ensuring that a given candidate function is a bona fide conditional quantile function: if one can find the “vertex functions” \(v_d\) having the required properties stated in the proposition, then one can assure that a conditional quantile function is at hand. It turns out the condition is also necessary, provided the support of the covariates is the whole set \(\{1\}\times[0,1]^{{\mathrm{D}_X}-1},\) and not only one of it’s subsets. The example below illustrates a case where the condition is not necessary.

Example 4.9 Let \(X = (1\quad U\quad V)^\prime,\) where \((U,V)\) is uniformly distributed in the triangle \[ \Delta = \{(u,v)\in\mathbb{R}^2\colon\, 0\le v\le u\le1\}. \] Assume \(Y\) is a scalar random variable such that \[ Q_{Y|X}(\tau|1,u,v) = \tau -\tau u - \tau v,\quad \tau\in(0,1),\,(u,v)\in\Delta. \] Extrapolating the above quantile function to the vertex \((1,1,1)\) yields a function which decreases in \(\tau,\) but this is not an issue because \((1,1)\) is not a “possible value” assumed by \((U,V).\)

Although it is not possible, in general, to ensure that \(\mathscr{X} = \{1\}\times [0,1]^{{\mathrm{D}_X}-1}\); take, for example, a covariate vector \(X = (1\quad X_2\quad X_3)^\prime\) with \((X_2\quad X_3)^\prime\) uniformly distributed in the disc with center \(c = (^1/_2\,,\, ^1/_2)\) and radius \(r=1/2.\)↩︎
That is, the constants \({m}_d\) and \(\bar{m}_d\) are determined as the endpoints of the closed interval \[ I_d = \bigcap\{I : I\ \textsf{is a closed interval and}\ \operatorname{support}(X_d)\subseteq I\} \]↩︎
The word “mostly” in the previous assertion was employed informally, and the example is not 100% honest as we could have, say, \(\beta_2\) as the quantile function of a translated asymmetric Beta distribution, in which case the covariate \(X_2\) would affect not only the location and scale of the response, but also its coefficient of asymmetry.↩︎