Jeseph K. Blitzstein & Jessica Hwang. (2019). Joint distributions. Introduction to Probability (2nd ed., pp. 304-323). CRC Press.
Joint, Marginal, and Conditional Distributions
Notation
Symbol
Type
Description
Random variable
Random variables whose distributions are analyzed
Function
Joint cumulative distribution function (CDF) for and
Function
Joint probability mass function (PMF) for discrete random variables and
Function
Joint probability density function (PDF) for continuous random variables and
Function
Marginal PDF of and , respectively
Function
Conditional PDF of given
Operation
Integral of the joint PDF over region
Set
A subset of the two-dimensional real space
Abbreviations
Abbreviation
Description
r.v.
Random variable
CDF
Cumulative distribution function
PMF
Probability mass function
PDF
Probability density function
LOTP
Law of total probability
Discrete
The most general description of the joint distribution of two r.v.s is the joint CDF, which applies to discrete and continuous r.v.s alike.
Joint CDF
Definition: The joint CDF of r.v.s and is the function given by
The joint CDF of r.v.s is defined analogously.
For discrete r.v.s, the joint CDF often consists of jumps and flat regions, so we typically work with the joint PMF instead.
Joint PMF
Definition: The joint PMF of discrete r.v.s and is the function given by
The joint PMF of discrete r.v.s is defined analogously.
Just as univariate PMFs must be nonnegative and sum to 1 , we require valid joint PMFs to be nonnegative and sum to 1 , where the sum is taken over all possible values of and :
Marginal PMF
Definition: For discrete r.v.s and , the marginal PMF of is
The operation of summing over the possible values of in order to convert the joint PMF into the marginal PMF of is known as marginalizing out.
Conditional PMF
Definition: For discrete r.v.s and , the conditional of given is
This is viewed as a function of for fixed .
Independence of discrete r.v.s
Definition: Random variables and are independent if for all and ,
If and are discrete, this is equivalent to the condition
for all , and it is also equivalent to the condition
for all such that .
Continuous
Once we have a handle on discrete joint distributions, it isn't much harder to consider continuous joint distributions. We simply make the now-familiar substitutions of integrals for sums and PDFs for PMFs, remembering that the probability of any individual point is now 0.
Formally, in order for and to have a continuous joint distribution, we require that the joint CDF
be differentiable with respect to and . The partial derivative with respect to and is called the joint PDF. The joint PDF determines the joint distribution, as does the joint CDF.
Joint PDF
Definition: If and are continuous with joint CDF , their joint PDF is the derivative of the joint CDF with respect to and :
We require valid joint PDFs to be nonnegative and integrate to 1:
In the univariate case, the PDF was the function we integrated to get the probability of an interval. Similarly, the joint PDF of two r.v.s is the function we integrate to get the probability of a two-dimensional region. For example,
For a general region ,
Example
Figure 7.4
Figure 7.4 shows a sketch of what a joint PDF of two r.v.s could look like. As usual with continuous r.v.s, we need to keep in mind that the height of the surface at a single point does not represent a probability. The probability of any specific point in the plane is 0 . Now that we've gone up a dimension, the probability of any line or curve in the plane is also 0 . The only way we can get nonzero probability is by integrating over a region of positive area in the -plane.
When we integrate the joint PDF over a region , we are calculating the volume under the surface of the joint PDF and above . Thus, probability is represented by volume under the joint PDF. The total volume under a valid joint PDF is 1.
Marginal PDF
Definition: For continuous r.v.s and with joint PDF , the marginal PDF of is
This is the PDF of , viewing individually rather than jointly with .
To simplify notation, we have mainly been looking at the joint distribution of two r.v.s rather than r.v.s, but marginalization works analogously with any number of variables. For example, if we have the joint PDF of but want the joint PDF of , we just have to integrate over all possible values of and :
Conceptually this is easy-just integrate over the unwanted variables to get the joint PDF of the wanted variables-but computing it may or may not be easy.
Returning to the case of the joint distribution of two r.v.s and , let's consider how to update our distribution for after observing the value of , using the conditional PDF.
Conditional PDF
Definition: For continuous r.v.s and with joint PDF , the conditional PDF of given is
for all with . This is considered as a function of for fixed . As a convention, in order to make well-defined for all real , let for all with .
Notation: The subscripts that we place on all the 's are just to remind us that we have three different functions on our plate. We could just as well write , where is the joint PDF, is the marginal PDF of , and is the conditional PDF of given , but that makes it more difficult to remember which letter stands for which function.
Note: We know that by the definition of PDF, for a continuous r.v. . So how can we speak of conditioning on as its probability is 0? Rigorously speaking, we are actually conditioning on the event that falls within a small interval containing , say , and then taking a limit as approaches 0 from the right. We will not fuss over this technicality; fortunately, many important results such as Bayes' rule work in the continuous case exactly as one would hope.
Continuous form of Bayes' rule and LOTP
Theorem: For continuous r.v.s and , we have the following continuous form of Bayes' rule:
And we have the following continuous form of the law of total probability:
Proof. By definition of conditional PDFs, we have
The continuous version of Bayes' rule follows immediately from dividing by . The continuous version of LOTP follows immediately from integrating with respect to :
Out of curiosity, let's see what would have happened if we had plugged in the other expression for instead in the proof of LOTP:
This just says that, for any with ,
confirming the fact that conditional PDFs must integrate to 1.
Independence of continuous r.v.s
Definition: Random variables and are independent if for all and ,
If and are continuous with joint PDF , this is equivalent to the condition
for all , and it is also equivalent to the condition
for all such that .
Here is one important proposition for the independence of two r.v.s.
A proposition of independence and joint PDF factorization
Proposition: Suppose that the joint of and factors as
for all and , where and are nonnegative functions. Then and are independent. Also, if either or is a valid PDF, then the other one is a valid PDF too and and are the marginal PDFs of and , respectively. (The analogous result in the discrete case also holds.)
Proof. Let . Multiplying and dividing by , we can write
(The point of this is that is a valid PDF.) Then the marginal PDF of is
It follows that since a marginal PDF is a valid PDF (knowing the integral of gave us the integral of for free!). Then the marginal PDF of is
Thus, and are independent with PDFs and , respectively. If or is already a valid PDF , then , so the other one is also a valid PDF .
Note: In the above proposition, we need the joint PDF to factor as a function of times a function of for all in the plane , not just for with . The reason for this is illustrated in the next example.
A simple case of a continuous joint distribution is when the joint PDF is constant over some region in the plane. In the following example, we'll compare a joint PDF that is constant on a square to a joint PDF that is constant on a disk.
Example: Uniform on a region in the plane
Let be a completely random point in the square , in the sense that the joint PDF of and is constant over the square and 0 outside of it:
The constant 1 is chosen so that the joint PDF will integrate to 1 . This distribution is called the Uniform distribution on the square.
Intuitively, it makes sense that and should be marginally. We can check this by computing
and similarly for . Furthermore, and are independent, since the joint PDF factors into the product of the marginal PDFs (this just reduces to , but it's important to note that the value of does not constrain the possible values of . So the conditional distribution of given is , regardless of .
Now let be a completely random point in the unit disk , with joint PDF
Again, the constant is chosen to make the joint PDF integrate to 1 ; the value follows from the fact that the integral of 1 over some region in the plane is the area of that region.
Note that and are not independent, since in general, knowing the value of constrains the possible values of : larger values of restrict to be in a smaller range. It would fall into the previous proposition disastrously to conclude independence from the fact that for all in the disk, where and are constant functions. To see from the definition that and are not independent, note that, for example, since is not in the unit disk, but since 0.9 is in the supports of both and .
The marginal distribution of is now
By symmetry, . Note that the marginal distributions of and are not Uniform on ; rather, and are more likely to fall near 0 than near .
Figure 7.6
Suppose we observe . As illustrated in Figure 7.6, this constrains to lie in the interval . Specifically, the conditional distribution of given is
for , and 0 otherwise. This conditional PDF is constant as a function of , which tells us that the conditional distribution of is Uniform on the interval . The fact that the conditional PDF is not free of confirms the fact that and are not independent.