4 Quantile Regression

Quantile Regression surfaced in its modern guise in the seminal Econometrica paper by Roger Koenker and Gilbert Bassett (Koenker and Bassett 1978). Similarly to (parametric) mean regression models, where the conditional expectation of the response given the covariates is a linear function of the latter, in its most basic form the quantile regression framework stipulates that the conditional quantile function of a scalar random variable \(Y\) given a \({\mathrm{D}_X}\)-dimensional random vector \(X\) is a linear function of these covariates: the cornerstone assumption is that, for \(\tau\in \mathscr{T}\subseteq(0,1)\) and \(x\in\operatorname{support}(X)=:\mathscr{X},\)¹⁰ the representation \[\begin{equation} Q_{Y|X}(\tau\,|\,x) = \sum_{d=1}^{\mathrm{D}_X}\beta_d(\tau) x_{d} \equiv x'\beta(\tau), \tag{4.1} \end{equation}\] holds for some functional parameter \(\beta\colon\mathscr{T}\to\mathbb{R}^{\mathrm{D}_X}.\) When \(\mathscr{T} = (0,1),\) Zheng, Peng, and He (2015) call the model in equation (4.1) a globally concerned quantile regression model.¹¹ In contrast, when \(\mathscr{T}\) is a countable set (in particular when it is a singleton), they call (4.1) a locally concerned quantile regression model. Of course, as in classical regression one needs to assume that the random variables \(X_1,\dots,X_{\mathrm{D}_X}\) are linearly independent¹² so that the parameters in (4.1) are identified.

Example 4.1 If \(\mathscr{T}=\{^1\!/\!_2\},\) then (4.1) is a median regression model and, putting \(\alpha:=\beta(^1\!/\!_2),\) we can write \[ Y = X^\prime \alpha + \varepsilon \] with \(Q_{\varepsilon | X}(^1\!/\!_2|x) = 0\) for all \(x\in \mathbb{R}^{\mathrm{D}_X}.\)

In fact, for a locally concerned quantile regression model with \(\mathscr{T} = \{\tau\},\) we can always write \(Y = X^\prime \alpha + \varepsilon\) for some vector of parameters \(\alpha\in\mathbb{R}^{\mathrm{D}_X}\) and a random variable \(\varepsilon\) satisfying \(Q_{\varepsilon | X}(\tau|x) = 0.\) \(\blacksquare\)

The situation depicted in the above example has many useful applications. For instance, median regression allows one to obtain a robust point forecast \(x'\widehat{\alpha}\) for the response when the conditional distribution of \(Y\) given \(X\) is heavy-tailed (\(Y\) can even fail to be integrable). Notwithstanding, a locally concerned quantile regression model fails to take full advantage of the fact that (conditional) quantile functions completely characterize the (conditional) distributions. Therefore, in what follows I’ll always have in mind the globally concerned quantile regression model (4.1) with \(\mathscr{T} = (0,1).\) With this, as we have argued earlier, the joint distribution of \(X\) and \(Y\) is entirely encoded in the marginal distribution of \(X\) and the functional parameter \(\beta:\) it holds that \[\begin{align} \begin{split} \mathbf{P}[Y\in B, X\in A] &= \int_{A}\int_0^1 \mathbb{I}[x'\beta(\tau)\in B]\,\mathrm{d}\tau\,F_X(\mathrm{d}x)\\ &= \int_0^1 \mathbf{P}[X'\beta(\tau)\in B, X\in A]\,\mathrm{d}\tau \end{split} \tag{4.2} \end{align}\] for every pair of Borel sets \(B\subseteq \mathbb{R}\) and \(A\subseteq \mathbb{R}^{\mathrm{D}_X}.\)

Nevertheless, in regression models one is usually uninterested in the distribution of the covariates: rather, such models aim to quantify uncertainty about \(Y\) given \(X.\) Mean regression describes this uncertainty in terms of the conditional expected value of \(Y\) given \(X\); median regression, in terms of the conditional median of \(Y,\) and so on. The globally concerned quantile regression model, in turn, quantifies uncertainty by specifying the entire conditional distributions of \(Y\) given \(X\) (of course, via the corresponding conditional quantile functions). In this aspect, it is not too different from fully parametric generalized linear models, which also specify the conditional distribution of the response given the predictors. Quantile regression can be seen as a different take on how to achieve said specification: indeed, the model (4.1) could be described as non-parametric, since the parameter of interest is infinite dimensional, although the functional form of \(Q_{Y|X}\) is partially parametrized/constrained by the “linearity in \(x\)” assumption. At any rate, and this is not obvious at first sight, the fact is that a conditional quantile function of the form (4.1) permits a very flexible structure of dependence between \(Y\) and \(X,\) allowing the covariates to modify not only the mean but also the variance, the coefficient of asymmetry, the number of modes, etc, of the response. To sum up, the quantile regression model is a flexible and parsimonious way to specify the structure of dependence between the response and covariates. As put forth by Koenker (2005),

An attractive feature of quantile regression that has been repeatedly emphasized is that it enables us to look at slices of the conditional distribution without any reliance on global distributional assumptions.

Another important property of the quantile regression model (4.1) is that the parameter \(\beta\) appears as a solution to an optimization problem. This will be relevant later on when we are dealing with estimation.

Theorem 4.1 If equation (4.1) holds for all \(\tau\in\mathscr{T}\subseteq(0,1)\) and all \(x\in\mathscr{X},\) then:

\(\mathbf{E}\rho_\tau\big(Y - X^\prime \beta(\tau)\big) \le \mathbf{E}\rho_\tau\big(Y - X^\prime b\big)\) for any \(b\in\mathbb{R}^{\mathrm{D}_X}\) and \(\tau\in\mathscr{T}.\)
\(\int_{\mathscr{T}}\mathbf{E}\rho_\tau\big(Y - X^\prime \beta(\tau)\big)\,\mathrm{d}\tau \le \int_{\mathscr{T}}\mathbf{E}\rho_\tau\big(Y - X^\prime b(\tau)\big)\,\mathrm{d}\tau\) for any measurable function \(b\colon(0,1)\to\mathbb{R}^{\mathrm{D}_X},\) provided \(\mathscr{T}\) is a Borel set (this is the case, in particular, if \(\mathscr{T}=(0,1)\)).
If \(\mathbf{B}\) is any \({\mathrm{D}_X}\times \mathrm{M}\) matrix with \(d\)th column \(b_m\in\mathbb{R}^{\mathrm{D}_X}\), and if \(0<\tau_1<\cdots<\tau_{\mathrm{M}}<1,\) then \(\sum_{m=1}^\mathrm{M}\mathbf{E}\rho_{\tau_m}\big(Y - X^\prime\beta(\tau_m)\big)\le \sum_{m=1}^\mathrm{M}\mathbf{E}\rho_{\tau_m}\big(Y - X^\prime b_m\big).\)

Proof. Write \(Q = Q_{Y|X}\) for simplicity, so \(Q(\tau|x) = x^\prime\beta(\tau).\)

For item 1, from the univariate setting we know that, given any \(x\in\mathscr{X},\) one has \[ \mathbf{E}\big\{\rho_\tau\big(Y - Q(\tau|x)\big)\,|\,X = x\big\} \le \mathbf{E}\big\{\rho_\tau\big(Y - y\,|\,X = x\big)\,|\,X=x\big\} \] for all \(y\in \mathbb{R},\) and this is true in particular when \(y\) is of the form \(y = x^\prime b\) for some \(b\in\mathbb{R}^{\mathrm{D}_X}.\) Thus, by iterated expactations, monotonicity of the Riemann-Stieltjes integral and the substitution principle, \[\begin{align} \mathbf{E}\rho_\tau(Y - X^\prime\beta(\tau)) &= \int \mathbf{E}\big\{\rho_\tau\big(Y - x^\prime\beta(\tau)\big)\,|\,X = x\big\}\, F_X(\mathrm{d}x)\\ &\le \int \mathbf{E}\big\{\rho_\tau\big(Y - x^\prime b\big)\,|\,X = x\big\}\, F_X(\mathrm{d}x)\\ & =\mathbf{E}\rho_\tau(Y - X^\prime b) \end{align}\]

The second item is just a matter of noticing that, if \(b\colon(0,1)\to\mathbb{R}^{\mathrm{D}_X}\) is any measurable function, then item 1 ensures that \(\mathbf{E}\rho_\tau\big(Y - X^\prime \beta(\tau)\big)\) is bounded above by \(\mathbf{E}\rho_\tau\big(Y - X^\prime b(\tau)\big)\) for all \(\tau\in (0,1).\) Thus, the asserted inequality follows, again by monotonicity of the Riemann-Stieltjes integral. The third item follows by a similar argument.

Exercise 4.1 Prove equation (4.2).

The support of \(X,\) which we shall denote by \(\mathscr{X},\) is defined by the following two conditions: 1. \(\mathscr{X}\) is a closed subset of \(\mathbb{R}^{\mathrm{D}_X}\) satisfying \(\mathbf{P}[X\in\mathscr{X}]=1,\) and; 2. If \(A\subseteq\mathbb{R}^{\mathrm{D}_X}\) is closed and \(\mathbf{P}[X\in A] = 1,\) then \(A\supseteq\mathscr{X}.\) That is \(\mathscr{X}\) is the smallest closed set in \(\mathbb{R}^{\mathrm{D}_X}\) having total \(\mathbf{P}_X\) probability.↩︎
Actually, they call the model globally concerned also in the case when \(\mathscr{T}\) is an interval.↩︎
This means that that the event \[\left\{\omega\in\Omega : x^\prime X(\omega)=0\ \textsf{for some non-zero}\ x\in\mathbb{R}^{\mathrm{D}_X}\right\}\] has null \(\mathbf{P}\)-probability.↩︎