A copula is a statistical concept used in probability theory and multivariate statistics to describe the dependence structure between random variables. It is particularly useful when dealing with complex, high-dimensional data where the relationships between variables are not easily described by traditional methods. Copulas have applications in fields such as finance, economics, and environmental science [1,2], among others.
Copulas are used to model the dependence or association between multiple random variables. In many cases, it’s essential to understand how variables co-move or interact with each other, and copulas provide a way to quantify and analyze this dependence.
Copulas separate the modeling of marginal distributions from the modeling of the joint distribution. This means you can specify the individual probability distributions for each variable (marginal distributions) separately from how they are related to each other.
Copulas allow for the modeling of various types of dependence structures, including positive or negative correlations, asymmetry (skewness), and tail behavior. This flexibility is particularly valuable when dealing with real-world data, which often exhibits complex patterns of dependence.
Copulas are a powerful tool for modeling the dependence structure between random variables, allowing statisticians and data scientists to capture and analyze complex relationships in data while preserving the flexibility to work with different types of marginal distributions.
The central idea of copula theory is to separate the modeling of marginal distributions from the modeling of the joint distribution, allowing for more flexible and comprehensive descriptions of dependence patterns.
The copula function presents an alternative way to build a joint distribution in cases where the assumption of normality is not met. When faced with marginal variables that are not normally distributed, copula facilitates the creation of a joint distribution. Remarkably, this construction is still feasible even when each variable shows a different distribution. Copula, functions as a connector, building multivariate joint distribution function through the marginal univariate distribution function. Another perspective, copula function is a multivariate distribution function, where marginal functions are uniformly distributed in $[0,1]$.
Before defining copulas, we first define subcopulas as a certain class of grounded 2-increasing functions with margins; then we define copulas as subcopulas with domain $I^2$ where $I = [0,1]$.
Definition 1: A two-dimensional subcopula (or 2-subcopula, or briefly, a subcopula) is a function $C’$ with the following properties:
- Domain $C’=S_1\times S_2$ where $S_1$ and $S_2$ are subsets of $I$ containing 0 and 1;
- $C’$ is grounded and 2-increasing;
- For every $u$ in $S_1$ and every $v$ in $S_2$, \(\begin{align} C'(u,1)=u \textrm{ and } C'(1,v)=v \end{align}\)
Note that for every $(u,v)$ in Domain $C’$, $0\le C’(u,v)\le1$, so that Range $C’$ is also a subset of $I$.
Definition 2: A two-dimensional copula (or 2-copula, or briefly, a copula) is a 2-subcopula $C$ whose domain is $I^2$.
Equivalently, a copula is a function $C$ from $I^2$ to $I$ with the following properties:
Proposition 1: The horizontal, vertical, and diagonal sections of a copula $C$ are all nondecreasing and uniformly continuous on $I$.
For more details about the definition and basic properties of copula, you can read the textbook [3].
This theorem is central to the theory of copulas and is the foundation of many, if not most, of the applications of that theory to statistics. Sklar’s theorem elucidates the role that copulas play in the relationship between multivariate distribution functions and their univariate margins. Thus we begin this section with a short discussion of distribution functions.
Definition 3: A distribution function is a function $F$ with domain $\bar{R}$ such that
- $F$ is non-decreasing,
- $F(–\infty) = 0$ and $F(\infty) = 1$
Definition 4: A joint distribution function is a function $F_{XY}$ with domain $\bar{R}^2$ such that
- $F_{XY}$ is 2-increasing,
- $F_{XY}(x,-\infty)=F_{XY}(-\infty,y)=0$, and $F_{XY}(\infty,\infty)=1$.
Thus $F_{XY}$ is grounded, and because Domain $H = \bar{R}^2$ , $F_{XY}$ has margins $F_X$ and $F_Y$ given by $F_X(x) = F_{XY}(x,\infty)$ and $F_Y(y) = F_{XY}(\infty,y)$. By virtue of Proposition 1, $F_X$ and $F_Y$ are distribution functions.
Theorem 1 (Sklar’s theorem): Let $F_{XY}$ be a joint distribution function with margins $F_X$ and $F_Y$. Then there exists a copula $C$ such that for all $x,y$ in $\bar{R}$, \(\begin{align} F_{XY} = C(F_X(x),F_Y(y)) \end{align}\)
This theorem first appeared in [4]. The name “copula” was chosen to emphasize the manner in which a copula “couples” a joint distribution function to its univariate margins.
References
[1] Najib, M. K., Nurdiati, S., & Sopaheluwakan, A. (2022). Copula-based joint distribution analysis of the ENSO effect on the drought indicators over Borneo fire-prone areas. Modeling Earth Systems and Environment, 8(2), 2817-2826.
[2] Najib, M. K., Nurdiati, S., & Sopaheluwakan, A. (2022). Multivariate fire risk models using copula regression in Kalimantan, Indonesia. Natural Hazards, 113(2), 1263-1283.
[3] Nelsen, R. B. (2006). An introduction to copulas. Springer.
[4] Sklar, M. (1959). Fonctions de Répartition àn Dimensions et Leurs Marges. Publ L’institut Stat L’université Paris, 8, 229–231.
Visit my personal blog
@ 2021-2023 Mohamad Khoirun Najib