# Chapter 3 Modeling Loss Severity

Chapter Preview. The traditional loss distribution approach to modeling starts by separately fitting a frequency distribution to the number of losses and a severity distribution to the size of losses. The estimated aggregate loss distribution combines the loss frequency distribution and the loss severity distribution by convolution. Discrete distributions often referred to as counting or frequency distributions were used in Chapter 2 to describe the number of events such as number of accidents to the driver or number of claims to the insurer. Lifetimes, asset values, losses and claim sizes are usually modeled as continuous random variables and as such are modeled using continuous distributions, often referred to as loss or severity distributions. A is a weighted combination of simpler distributions that is used to model phenomenon investigated in a heterogeneous population, such as modelling more than one type of claims in (small frequent claims and large relatively rare claims). In this chapter we explore the use of continuous as well as mixture distributions to model the random size of loss. We present key attributes that characterize continuous models and means of creating new distributions from existing ones. We also explore the effect of coverage modifications, which change the conditions that trigger a payment, such as applying deductibles, limits, or adjusting for inflation, on the distribution of individual loss amounts. The frequency distributions from Chapter 2 will be combined with the ideas from this chapter to describe the aggregate losses over the whole portfolio in Chapter 5.

## 3.1 Basic Distributional Quantities

In this section, you learn how to define some basic distributional quantities:

• moments,
• percentiles, and
• generating functions.

### 3.1.1 Moments

Let $X$ be a with probability density function $f_{X}\left( x \right)$. The k-th of $X$, denoted by $\mu_{k}^{\prime}$, is the of the k-th power of $X$, provided it exists. The first raw moment $\mu_{1}^{\prime}$ is the mean of $X$ usually denoted by $\mu$. The formula for $\mu_{k}^{\prime}$ is given as $\mu_{k}^{\prime} = \mathrm{E}\left( X^{k} \right) = \int_{0}^{\infty}{x^{k}f_{X}\left( x \right)dx } .$ The support of the random variable $X$ is assumed to be nonnegative since actuarial phenomena are rarely negative. An easy integration by parts shows that the raw moments for nonnegative variables can also be computed using $\mu_{k}^{\prime} = \int_{0}^{\infty}{k~x^{k-1}\left[1- F_{X}(x) \right]dx },$ that is based on the survival function, denoted as $S_X(x) = 1-F_{X}(x)$. This formula is particularly useful when $k=1$.

The k-th of $X$, denoted by $\mu_{k}$, is the expected value of the k-th power of the deviation of $X$ from its mean $\mu$. The formula for $\mu_{k}$ is given as $\mu_{k} = \mathrm{E}\left\lbrack {(X - \mu)}^{k} \right\rbrack = \int_{0}^{\infty}{\left( x - \mu \right)^{k}f_{X}\left( x \right) dx }.$ The second central moment $\mu_{2}$ defines the of $X$, denoted by $\sigma^{2}$. The square root of the variance is the $\sigma$.

From a classical perspective, further characterization of the shape of the distribution includes its degree of symmetry as well as its flatness compared to the normal distribution. The ratio of the third central moment to the cube of the standard deviation $\left( \mu_{3} / \sigma^{3} \right)$ defines the coefficient of which is a measure of symmetry. A positive coefficient of skewness indicates that the distribution is skewed to the right (positively skewed). The ratio of the fourth central moment to the fourth power of the standard deviation $\left(\mu_{4} / \sigma^{4} \right)$ defines the coefficient of . The normal distribution has a coefficient of kurtosis of 3. Distributions with a coefficient of kurtosis greater than 3 have heavier tails and higher peak than the normal, whereas distributions with a coefficient of kurtosis less than 3 have lighter tails and are flatter. Section 10.2 describes the tails of distributions from an insurance and actuarial perspective.

Example 3.1.1. Actuarial Exam Question. Assume that the $X$ has a gamma distribution with mean 8 and skewness 1. Find the variance of $X$. (Hint: The gamma distribution is reviewed in Section 3.2.1.)

### 3.1.2 Quantiles

Quantiles can also be used to describe the characteristics of the distribution of $X$. When the distribution of $X$ is continuous, for a given fraction $0 \leq p \leq 1$ the corresponding quantile is the solution of the equation $F_{X}\left( \pi_{p} \right) = p .$ For example, the middle point of the distribution, $\pi_{0.5}$, is the . A is a type of quantile; a $100p$ percentile is the number such that $100 \times p$ percent of the data is below it.

Example 3.1.1. Actuarial Exam Question. Let $X$ be a continuous random variable with density function $f_{X}\left( x \right) = \theta e^{- \theta x}$, for $x > 0$ and 0 elsewhere. If the median of this distribution is $\frac{1}{3}$, find $\theta$.

##### Show Example Solution

Section 4.1.1.3 will extend the definition of quantiles to include distributions that are discrete, continuous, or a hybrid combination.

### 3.1.3 Moment Generating Function

The moment generating function, denoted by $M_{X}(t)$ uniquely characterizes the distribution of $X$. While it is possible for two different distributions to have the same moments and yet still differ, this is not the case with the moment generating function. That is, if two random variables have the same moment generating function, then they have the same distribution. The moment generating function is given by $M_{X}(t) = \mathrm{E}\left( e^{tX} \right) = \int_{0}^{\infty}{e^{\text{tx}}f_{X}\left( x \right) dx }$ for all $t$ for which the expected value exists. The moment generating is a real function whose k-th derivative at zero is equal to the k-th raw moment of $X$. In symbols, this is $\left.\frac{d^k}{dt^k} M_{X}(t)\right|_{t=0} = \mathrm{E}\left( X^{k} \right) .$

Example 3.1.3. Actuarial Exam Question. The random variable $X$ has an exponential distribution with mean $\frac{1}{b}$. It is found that $M_{X}\left( - b^{2} \right) = 0.2$. Find $b$. (Hint: The exponential is a special case of the gamma distribution which is reviewed in Section 3.2.1.)

##### Show Example Solution

Example 3.1.4. Actuarial Exam Question. Let $X_{1}, \ldots, X_{n}$ be random variables, where $X_i$ has a gamma distribution with parameters $\alpha_{i}$ and $\theta$. Find the distribution of $S = \sum_{i = 1}^{n}X_{i}$, the mean $\mathrm{E}(S)$, and the variance $\mathrm{Var}(S)$.

##### Show Example Solution

One can also use the moment generating function to compute the probability generating function

$P_{X}(z) = \mathrm{E}\left( z^{X} \right) = M_{X}\left( \ln z \right) .$

As introduced in Section 2.2.2, the probability generating function is more useful for discrete rvs.

## 3.2 Continuous Distributions for Modeling Loss Severity

In this section, you learn how to define and apply four fundamental severity distributions:

• gamma,
• Pareto,
• Weibull, and
• generalized beta distribution of the second kind.

### 3.2.1 Gamma Distribution

Recall that the traditional approach in modelling losses is to fit separate models for claim frequency and claim severity. When frequency and severity are modeled separately it is common for actuaries to use the Poisson distribution (introduced in Section 2.2.3.2) for claim count and the gamma distribution to model severity. An alternative approach for modelling losses that has recently gained popularity is to create a single model for pure premium (average claim cost) that will be described in Chapter 4.

The continuous variable $X$ is said to have the gamma distribution with shape parameter $\alpha$ and scale parameter $\theta$ if its probability density function is given by $f_{X}\left( x \right) = \frac{\left( x/ \theta \right)^{\alpha}}{x~ \Gamma\left( \alpha \right)}\exp \left( -x/ \theta \right) \ \ \ \text{for } x > 0 .$ Note that $\alpha > 0,\ \theta > 0$.

The two panels in Figure 3.1 demonstrate the effect of the scale and shape parameters on the gamma density function.

##### R Code for Gamma Density Plots

When $\alpha = 1$ the gamma reduces to an and when $\alpha = \frac{n}{2}$ and $\theta = 2$ the gamma reduces to a with $n$ degrees of freedom. As we will see in Section 15.4, the chi-square distribution is used extensively in statistical hypothesis testing.

The distribution function of the gamma model is the incomplete gamma function, denoted by $\Gamma\left(\alpha; \frac{x}{\theta} \right)$, and defined as $F_{X}\left( x \right) = \Gamma\left( \alpha; \frac{x}{\theta} \right) = \frac{1}{\Gamma\left( \alpha \right)}\int_{0}^{x /\theta}t^{\alpha - 1}e^{- t}~dt$ $\alpha > 0,\ \theta > 0$. For an integer $\alpha$, it can be written as $\Gamma\left( \alpha; \frac{x}{\theta} \right) = 1 - e^{-x/\theta}\sum_{k = 0}^{\alpha-1}\frac{(x/\theta)^k}{k!}$.

The $k$-th moment of the gamma distributed random variable for any positive $k$ is given by $\mathrm{E}\left( X^{k} \right) = \theta^{k} \frac{\Gamma\left( \alpha + k \right)}{\Gamma\left( \alpha \right)} .$ The mean and variance are given by $\mathrm{E}\left( X \right) = \alpha\theta$ and $\mathrm{Var}\left( X \right) = \alpha\theta^{2}$, respectively.

Since all moments exist for any positive $k$, the gamma distribution is considered a , which may not be suitable for modeling risky assets as it will not provide a realistic assessment of the likelihood of severe losses.

### 3.2.2 Pareto Distribution

The , named after the Italian economist Vilfredo Pareto (1843-1923), has many economic and financial applications. It is a positively skewed and heavy-tailed distribution which makes it suitable for modeling income, high-risk insurance claims and severity of large casualty losses. The survival function of the Pareto distribution which decays slowly to zero was first used to describe the distribution of income where a small percentage of the population holds a large proportion of the total wealth. For extreme insurance claims, the tail of the severity distribution (losses in excess of a threshold) can be modeled using a Generalized Pareto distribution.

The continuous variable $X$ is said to have the Pareto distribution with shape parameter $\alpha$ and scale parameter $\theta$ if its is given by $f_{X}\left( x \right) = \frac{\alpha\theta^{\alpha}}{\left( x + \theta \right)^{\alpha + 1}} \ \ \ x > 0,\ \alpha > 0,\ \theta > 0.$ The two panels in Figure 3.2 demonstrate the effect of the scale and shape parameters on the Pareto density function.

##### R Code for Pareto Density Plots

The distribution function of the Pareto distribution is given by $F_{X}\left( x \right) = 1 - \left( \frac{\theta}{x + \theta} \right)^{\alpha} \ \ \ x > 0,\ \alpha > 0,\ \theta > 0.$ It can be easily seen that the of the Pareto distribution is a decreasing function in $x$, another indication that the distribution is heavy tailed. When the hazard function decreases over time the population dies off at a decreasing rate resulting in a heavier tail for the distribution. The hazard function reveals information about the tail distribution and is often used to model data distributions in survival analysis. The hazard function is defined as the instantaneous potential that the event of interest occurs within a very narrow time frame.

The $k$-th moment of the Pareto distributed random variable exists, if and only if, $\alpha > k$. If $k$ is a positive integer then $\mathrm{E}\left( X^{k} \right) = \frac{\theta^{k}~ k!}{\left( \alpha - 1 \right)\cdots\left( \alpha - k \right)} \ \ \ \alpha > k.$ The mean and variance are given by $\mathrm{E}\left( X \right) = \frac{\theta}{\alpha - 1} \ \ \ \text{for } \alpha > 1$ and $\mathrm{Var}\left( X \right) = \frac{\alpha\theta^{2}}{\left( \alpha - 1 \right)^{2}\left( \alpha - 2 \right)} \ \ \ \text{for } \alpha > 2,$respectively.

Example 3.2.1. The claim size of an insurance portfolio follows the Pareto distribution with mean and variance of 40 and 1800 respectively. Find
1. The shape and scale parameters.
2. The 95-th percentile of this distribution.

### 3.2.3 Weibull Distribution

The , named after the Swedish physicist Waloddi Weibull (1887-1979) is widely used in reliability, life data analysis, weather forecasts and general insurance claims. Truncated data arise frequently in insurance studies. The Weibull distribution has been used to model excess of loss treaty over automobile insurance as well as earthquake inter-arrival times.

The continuous variable $X$ is said to have the Weibull distribution with shape parameter $\alpha$ and scale parameter $\theta$ if its probability density function is given by $f_{X}\left( x \right) = \frac{\alpha}{\theta}\left( \frac{x}{\theta} \right)^{\alpha - 1} \exp \left(- \left( \frac{x}{\theta} \right)^{\alpha}\right) \ \ \ x > 0,\ \alpha > 0,\ \theta > 0.$ The two panels Figure 3.3 demonstrate the effects of the scale and shape parameters on the Weibull density function.

##### R Code for Weibull Density Plots

The distribution function of the Weibull distribution is given by $F_{X}\left( x \right) = 1 - e^{- \left( x / \theta \right)^{\alpha}} \ \ \ x > 0,\ \alpha > 0,\ \theta > 0.$

It can be easily seen that the shape parameter $\alpha$ describes the shape of the hazard function of the Weibull distribution. The hazard function is a decreasing function when $\alpha < 1$ (heavy tailed distribution), constant when $\alpha = 1$ and increasing when $\alpha > 1$ (light tailed distribution). This behavior of the hazard function makes the Weibull distribution a suitable model for a wide variety of phenomena such as weather forecasting, electrical and industrial engineering, insurance modeling, and financial risk analysis.

The $k$-th moment of the Weibull distributed random variable is given by $\mathrm{E}\left( X^{k} \right) = \theta^{k}~\Gamma\left( 1 + \frac{k}{\alpha} \right) .$

The mean and variance are given by $\mathrm{E}\left( X \right) = \theta~\Gamma\left( 1 + \frac{1}{\alpha} \right)$ and $\mathrm{Var}(X)= \theta^{2}\left( \Gamma\left( 1 + \frac{2}{\alpha} \right) - \left\lbrack \Gamma\left( 1 + \frac{1}{\alpha} \right) \right\rbrack ^{2}\right),$ respectively.

Example 3.2.2. Suppose that the probability distribution of the lifetime of AIDS patients (in months) from the time of diagnosis is described by the Weibull distribution with shape parameter 1.2 and scale parameter 33.33.

1. Find the probability that a randomly selected person from this population survives at least 12 months,
2. A random sample of 10 patients will be selected from this population. What is the probability that at most two will die within one year of diagnosis.
3. Find the 99-th percentile of the distribution of lifetimes.

### 3.2.4 The Generalized Beta Distribution of the Second Kind

The (GB2) was introduced by Venter (1983) in the context of insurance loss modeling and by McDonald (1984) as an income and wealth distribution. It is a four-parameter very flexible distribution that can model positively as well as negatively skewed distributions.

The continuous variable $X$ is said to have the GB2 distribution with parameters $\sigma$, $\theta$, $\alpha_1$ and $\alpha_2$ if its probability density function is given by

$$$f_{X}\left( x \right) = \frac{(x/\theta)^{\alpha_2/\sigma}}{x \sigma~\mathrm{B}\left( \alpha_1,\alpha_2\right)\left\lbrack 1 + \left( x/\theta \right)^{1/\sigma} \right\rbrack^{\alpha_1 + \alpha_2}} \ \ \ \text{for } x > 0, \tag{3.1}$$$

$\sigma,\theta,\alpha_1,\alpha_2 > 0$, and where the beta function $\mathrm{B}\left( \alpha_1,\alpha_2 \right)$ is defined as $\mathrm{B}\left( \alpha_1,\alpha_2\right) = \int_{0}^{1}{t^{\alpha_1 - 1}\left( 1 - t \right)^{\alpha_2 - 1}}~ dt.$

The GB2 provides a model for heavy as well as light tailed data. It includes the exponential, gamma, Weibull, Burr, Lomax, F, chi-square, Rayleigh, lognormal and log-logistic as special or limiting cases. For example, by setting the parameters $\sigma = \alpha_1 = \alpha_2 = 1$, then the GB2 reduces to the log-logistic distribution. When $\sigma = 1$ and $\alpha_2 \rightarrow \infty$, it reduces to the gamma distribution and when $\alpha = 1$ and $\alpha_2 \rightarrow \infty$, it reduces to the Weibull distribution.

A GB2 random variable can be defined as follows. Suppose that $G_1$ and $G_2$ are independent random variables where $G_i$ has a gamma distribution with parameters $\alpha_i$ and scale parameter 1. Then, one can show that the random variable $X = \theta \left(\frac{G_1}{G_2}\right)^{\sigma}$ has a GB2 distribution with pdf summarized in equation (3.1). This theoretical result has several implications. For example, when the moments exist, one can show that the $k$-th moment of the GB2 distributed random variable is given by $\mathrm{E}\left( X^{k} \right) = \frac{\theta^{k}~\mathrm{B}\left( \alpha_1 +k \sigma,\alpha_2 - k \sigma \right)}{\mathrm{B}\left( \alpha_1,\alpha_2 \right)}, \ \ \ k > 0.$

Earlier applications of the GB2 were on income data and more recently have been used to model long-tailed claims data (Section 10.2 describes different interpretations of the descriptor “long-tail”). GB2 was used to model different types of automobile insurance claims, severity of fire losses as well as medical insurance claim data.

## 3.3 Methods of Creating New Distributions

In this section, you learn how to:

• Understand connections among the distributions
• Give insights into when a distribution is preferred when compared to alternatives
• Provide foundations for creating new distributions

### 3.3.1 Functions of Random Variables and their Distributions

In Section 3.2 we discussed some elementary known distributions. In this section we discuss means of creating new parametric probability distributions from existing ones. Specifically, let $X$ be a continuous random variable with a known probability density function $f_{X}(x)$ and distribution function $F_{X}(x)$. We are interested in the distribution of $Y = g\left( X \right)$, where $g(X)$ is a one-to-one defining a new random variable $Y$. In this section we apply the following techniques for creating new families of distributions: (a) multiplication by a constant (b) raising to a power, (c) exponentiation and (d) mixing.

### 3.3.2 Multiplication by a Constant

If claim data show change over time then such transformation can be useful to adjust for inflation. If the level of inflation is positive then claim costs are rising, and if it is negative then costs are falling. To adjust for inflation we multiply the cost $X$ by 1+ inflation rate (negative inflation is deflation). To account for currency impact on claim costs we also use a transformation to apply currency conversion from a base to a counter currency.

Consider the transformation $Y = cX$, where $c > 0$, then the distribution function of $Y$ is given by $F_{Y}\left( y \right) = \Pr\left( Y \leq y \right) = \Pr\left( cX \leq y \right) = \Pr\left( X \leq \frac{y}{c} \right) = F_{X}\left( \frac{y}{c} \right).$ Hence, the probability density function of interest $f_{Y}(y)$ can be written as $f_{Y}\left( y \right) = \frac{1}{c}f_{X}\left( \frac{y}{c} \right).$ Suppose that $X$ belongs to a certain set of and define a rescaled version $Y\ = \ cX$, $c\ > \ 0$. If $Y$ is in the same set of distributions then the distribution is said to be a . When a member of a scale distribution is multiplied by a constant $c$ ($c > 0$), the scale parameter for this scale distribution meets two conditions:
1. The parameter is changed by multiplying by $c$;
2. All other parameters remain unchanged.

Example 3.3.1. Actuarial Exam Question. The aggregate losses of Eiffel Auto Insurance are denoted in Euro currency and follow a lognormal distribution with $\mu = 8$ and $\sigma = 2$. Given that 1 euro $=$ 1.3 dollars, find the set of lognormal parameters which describe the distribution of Eiffel’s losses in dollars.

##### Show Example Solution

Example 3.3.2. Actuarial Exam Question. Demonstrate that the gamma distribution is a scale distribution.

### 3.3.3 Raising to a Power

In Section 3.2.3 we talked about the flexibility of the Weibull distribution in fitting . Looking to the origins of the Weibull distribution, we recognize that the Weibull is a of the exponential distribution. This is an application of another type of transformation which involves raising the random variable to a power.

Consider the transformation $Y = X^{\tau}$, where $\tau > 0$, then the distribution function of $Y$ is given by $F_{Y}\left( y \right) = \Pr\left( Y \leq y \right) = \Pr\left( X^{\tau} \leq y \right) = \Pr\left( X \leq y^{1/ \tau} \right) = F_{X}\left( y^{1/ \tau} \right).$

Hence, the probability density function of interest $f_{Y}(y)$ can be written as $f_{Y}(y) = \frac{1}{\tau} y^{1/ \tau - 1} f_{X}\left( y^{1/ \tau} \right).$ On the other hand, if $\tau < 0$, then the distribution function of $Y$ is given by $F_{Y}\left( y \right) = \Pr\left( Y \leq y \right) = \Pr\left( X^{\tau} \leq y \right) = \Pr\left( X \geq y^{1/ \tau} \right) = 1 - F_{X}\left( y^{1/ \tau} \right),$ and $f_{Y}(y) = \left| \frac{1}{\tau} \right|{y^{1/ \tau - 1}f}_{X}\left( y^{1/ \tau} \right).$

Example 3.3.3. We assume that $X$ follows the exponential distribution with mean $\theta$ and consider the transformed variable $Y = X^{\tau}$. Show that $Y$ follows the Weibull distribution when $\tau$ is positive and determine the parameters of the Weibull distribution.

### 3.3.4 Exponentiation

The normal distribution is a very popular model for a wide number of applications and when the sample size is large, it can serve as an approximate distribution for other models. If the random variable $X$ has a normal distribution with mean $\mu$ and variance $\sigma^{2}$, then $Y = e^{X}$ has with parameters $\mu$ and $\sigma^{2}$. The lognormal random variable has a lower bound of zero, is positively skewed and has a long right tail. A lognormal distribution is commonly used to describe distributions of financial assets such as stock prices. It is also used in fitting claim amounts for automobile as well as health insurance. This is an example of another type of transformation which involves exponentiation.

In general, consider the transformation $Y = e^{X}$. Then, the distribution function of $Y$ is given by $F_{Y}\left( y \right) = \Pr\left( Y \leq y \right) = \Pr\left( e^{X} \leq y \right) = \Pr\left( X \leq \ln y \right) = F_{X}\left( \ln y \right).$ Taking derivatives, we see that the probability density function of interest $f_{Y}(y)$ can be written as $f_{Y}(y) = \frac{1}{y}f_{X}\left( \ln y \right).$ As an important special case, suppose that $X$ is normally distributed with mean $\mu$ and variance $\sigma^2$. Then, the distribution of $Y = e^X$ is

$f_{Y}(y) = \frac{1}{y}f_{X}\left( \ln y \right) = \frac{1}{y\sqrt{2 \pi}} \exp \left\{-\frac{1}{2}\left(\frac{ \ln y - \mu}{\sigma}\right)^2\right\}.$ This is known as a lognormal distribution.

Example 3.3.4. Actuarial Exam Question. Assume that $X$ has a uniform distribution on the interval $(0,\ c)$ and define $Y = e^{X}$. Find the distribution of $Y$.

### 3.3.5 Finite Mixtures

Mixture distributions represent a useful way of modelling data that are drawn from a . This parent population can be thought to be divided into multiple subpopulations with distinct distributions.

#### 3.3.5.1 Two-point Mixture

If the underlying phenomenon is diverse and can actually be described as two phenomena representing two subpopulations with different modes, we can construct the two point mixture random variable $X$. Given random variables $X_{1}$ and $X_{2}$, with probability density functions $f_{X_{1}}\left( x \right)$ and $f_{X_{2}}\left( x \right)$ respectively, the probability density function of $X$ is the weighted average of the component probability density function $f_{X_{1}}\left( x \right)$ and $f_{X_{2}}\left( x \right)$. The probability density function and distribution function of $X$ are given by $f_{X}\left( x \right) = af_{X_{1}}\left( x \right) + \left( 1 - a \right)f_{X_{2}}\left( x \right),$ and $F_{X}\left( x \right) = aF_{X_{1}}\left( x \right) + \left( 1 - a \right)F_{X_{2}}\left( x \right),$

for $0 < a <1$, where the $a$ and $(1 - a)$ represent the proportions of data points that fall under each of the two subpopulations respectively. This weighted average can be applied to a number of other distribution related quantities. The k-th moment and moment generating function of $X$ are given by $\mathrm{E}\left( X^{k} \right) = a\mathrm{E}\left( X_{1}^{K} \right) + \left( 1 - a \right)\mathrm{E}\left( X_{2}^{k} \right)$, and $M_{X}(t) = aM_{X_{1}}(t) + \left( 1 - a \right)M_{X_{2}}(t),$ respectively.

Example 3.3.5. Actuarial Exam Question. A collection of insurance policies consists of two types. 25% of policies are Type 1 and 75% of policies are Type 2. For a policy of Type 1, the loss amount per year follows an exponential distribution with mean 200, and for a policy of Type 2, the loss amount per year follows a Pareto distribution with parameters $\alpha=3$ and $\theta=200$. For a policy chosen at random from the entire collection of both types of policies, find the probability that the annual loss will be less than 100, and find the average loss.

#### 3.3.5.2k-point Mixture

In case of finite mixture distributions, the random variable of interest $X$ has a probability $p_{i}$ of being drawn from homogeneous subpopulation $i$, where $i = 1,2,\ldots,k$ and $k$ is the initially specified number of subpopulations in our mixture. The mixing parameter $p_{i}$ represents the proportion of observations from subpopulation $i$. Consider the random variable $X$ generated from $k$ distinct subpopulations, where subpopulation $i$ is modeled by the continuous distribution $f_{X_{i}}\left( x \right)$. The probability distribution of $X$ is given by $f_{X}\left( x \right) = \sum_{i = 1}^{k}{p_{i}f_{X_{i}}\left( x \right)},$ where $0 < p_{i} < 1$ and $\sum_{i = 1}^{k} p_{i} = 1$.

This model is often referred to as a or a $k$-point mixture. The distribution function, $r$-th moment and moment generating functions of the $k$-th point mixture are given as

$F_{X}\left( x \right) = \sum_{i = 1}^{k}{p_{i}F_{X_{i}}\left( x \right)},$ $\mathrm{E}\left( X^{r} \right) = \sum_{i = 1}^{k}{p_{i}\mathrm{E}\left( X_{i}^{r} \right)}, \text{and}$ $M_{X}(t) = \sum_{i = 1}^{k}{p_{i}M_{X_{i}}(t)},$ respectively.

Example 3.3.6. Actuarial Exam Question. $Y_{1}$ is a mixture of $X_{1}$ and $X_{2}$ with mixing weights $a$ and $(1 - a)$. $Y_{2}$ is a mixture of $X_{3}$ and $X_{4}$ with mixing weights $b$ and $(1 - b)$. $Z$ is a mixture of $Y_{1}$ and $Y_{2}$ with mixing weights $c$ and $(1 - c)$.

Show that $Z$ is a mixture of $X_{1}$, $X_{2}$, $X_{3}$ and $X_{4}$, and find the mixing weights.

### 3.3.6 Continuous Mixtures

A mixture with a very large number of subpopulations ($k$ goes to infinity) is often referred to as a . In a continuous mixture, subpopulations are not distinguished by a discrete mixing parameter but by a continuous variable $\Theta$, where $\Theta$ plays the role of $p_{i}$ in the finite mixture. Consider the random variable $X$ with a distribution depending on a parameter $\Theta$, where $\Theta$ itself is a continuous random variable. This description yields the following model for $X$ $f_{X}\left( x \right) = \int_{-\infty}^{\infty}{f_{X}\left(x \left| \theta \right. \right)g_{\Theta}( \theta )} d \theta ,$ where $f_{X}\left( x | \theta \right)$ is the of $X$ at a particular value of $\Theta=\theta$ and $g_{\Theta}\left( \theta \right)$ is the probability statement made about the unknown parameter $\theta$. In a Bayesian context (described in Section 4.4), this is known as the of $\Theta$ (the prior information or expert opinion to be used in the analysis).

The distribution function, $k$-th moment and moment generating functions of the continuous mixture are given as $F_{X}\left( x \right) = \int_{-\infty}^{\infty}{F_{X}\left(x \left| \theta \right. \right) g_{\Theta}(\theta)} d \theta,$ $\mathrm{E}\left( X^{k} \right) = \int_{-\infty}^{\infty}{\mathrm{E}\left( X^{k}\left| \theta \right. \right)g_{\Theta}(\theta)}d \theta,$ $M_{X}(t) = \mathrm{E}\left( e^{t X} \right) = \int_{-\infty}^{\infty}{\mathrm{E}\left( e^{ tx}\left| \theta \right. \right)g_{\Theta}(\theta)}d \theta,$ respectively.

The $k$-th moment of the mixture distribution can be rewritten as $\mathrm{E}\left( X^{k} \right) = \int_{-\infty}^{\infty}{\mathrm{E}\left( X^{k}\left| \theta \right. \right)g_{\Theta}(\theta)}d\theta ~=~ \mathrm{E}\left\lbrack \mathrm{E}\left( X^{k}\left| \Theta \right. \right) \right\rbrack .$

Using the law of iterated expectations (see Appendix Chapter 16), we can define the mean and variance of $X$ as $\mathrm{E}\left( X \right) = \mathrm{E}\left\lbrack \mathrm{E}\left( X\left| \Theta \right. \right) \right\rbrack$ and $\mathrm{Var}\left( X \right) = \mathrm{E}\left\lbrack \mathrm{Var}\left( X\left| \Theta \right. \right) \right\rbrack + \mathrm{Var}\left\lbrack \mathrm{E}\left( X\left| \Theta \right. \right) \right\rbrack .$

Example 3.3.7. Actuarial Exam Question. $X$ has a normal distribution with a mean of $\Lambda$ and variance of 1. $\Lambda$ has a normal distribution with a mean of 1 and variance of 1. Find the mean and variance of $X$.

##### Show Example Solution

Example 3.3.8. Actuarial Exam Question. Claim sizes, $X$, are uniform on the interval $\left(\Theta,\Theta+10\right)$ for each policyholder. $\Theta$ varies by policyholder according to an exponential distribution with mean 5. Find the , mean and variance of $X$.

## 3.4 Coverage Modifications

In this section we evaluate the impacts of coverage modifications: a) deductibles, b) policy limit, c) coinsurance and inflation on insurer’s costs.

### 3.4.1 Policy Deductibles

Under an ordinary deductible policy, the insured (policyholder) agrees to cover a fixed amount of an insurance claim before the insurer starts to pay. This fixed expense paid out of pocket is called the deductible and often denoted by $d$. If the loss exceeds $d$ then the insurer is responsible for covering the loss X less the deductible $d$. Depending on the agreement, the deductible may apply to each covered loss or to the total losses during a defined benefit period (month, year, etc.)

Deductibles eliminate a large number of small claims, reduce costs of handling and processing these claims, reduce premiums for the policyholders and reduce . Moral hazard occurs when the insured takes more risks, increasing the chances of loss due to perils insured against, knowing that the insurer will incur the cost (e.g. a policyholder with collision insurance may be encouraged to drive recklessly). The larger the deductible, the less the insured pays in premiums for an insurance policy.

Let $X$ denote the loss incurred to the insured and $Y$ denote the amount of paid claim by the insurer. Speaking of the benefit paid to the policyholder, we differentiate between two variables: The payment per loss and the payment per payment. The variable, denoted by $Y^{L}$ or $(X-d)_+$ is because values of $X$ that are less than $d$ are not ignored but are set equal to zero. This variable includes losses for which a payment is made as well as losses less than the deductible and hence is defined as $Y^{L} = \left( X - d \right)_{+} = \left\{ \begin{array}{cc} 0 & X < d, \\ X - d & X > d \end{array} \right. .$ $Y^{L}$ is often referred to as left censored and shifted variable because the values below $d$ are not ignored and all losses are shifted by a value $d$.

On the other hand, the variable, denoted by $Y^{P}$, is defined only when there is a payment. Specifically, $Y^P$ equals $X-d$ on the event $\{X >d\}$, denoted as $Y^P = X-d ||X>d$. Another way of expressing this that is commonly used is $Y^{P} = \left\{ \begin{matrix} \text{Undefined} & X \le d \\ X - d & X > d \end{matrix} . \right.$ Here, $Y^{P}$ is often referred to as and shifted variable or excess loss variable because the claims smaller than $d$ are not reported and values above $d$ are shifted by $d$.

Even when the distribution of $X$ is continuous, the distribution of $Y^{L}$ is a hybrid combination of discrete and continuous components. The discrete part of the distribution is concentrated at $Y = 0$ (when $X \leq d$) and the continuous part is spread over the interval $Y > 0$ (when $X > d$). For the discrete part, the probability that no payment is made is the probability that losses fall below the deductible; that is, $\Pr\left( Y^{L} = 0 \right) = \Pr\left( X \leq d \right) = F_{X}\left( d \right).$ Using the transformation $Y^{L} = X - d$ for the continuous part of the distribution, we can find the probability density function of $Y^{L}$ given by $f_{Y^{L}}\left( y \right) = \left\{ \begin{matrix} F_{X}\left( d \right) & y = 0, \\ f_{X}\left( y + d \right) & y > 0 \end{matrix} \right.$

We can see that the payment per payment variable is the payment per loss variable conditioned on the loss exceeding the deductible; that is, $Y^{P} = \left. \ Y^{L} \right|X > d$. Hence, the probability density function of $Y^{P}$ is given by $f_{Y^{P}}\left( y \right) = \frac{f_{X}\left( y + d \right)}{1 - F_{X}\left( d \right)},$ for $y > 0$. Accordingly, the distribution functions of $Y^{L}$and $Y^{P}$ are given by $F_{Y^{L}}\left( y \right) = \left\{ \begin{matrix} F_{X}\left( d \right) & y = 0, \\ F_{X}\left( y + d \right) & y > 0. \\ \end{matrix} \right.\$ and $F_{Y^{P}}\left( y \right) = \frac{F_{X}\left( y + d \right) - F_{X}\left( d \right)}{1 - F_{X}\left( d \right)},$ for $y > 0$, respectively.

The raw moments of $Y^{L}$ and $Y^{P}$ can be found directly using the probability density function of $X$ as follows $\mathrm{E}\left\lbrack \left( Y^{L} \right)^{k} \right\rbrack = \int_{d}^{\infty}\left( x - d \right)^{k}f_{X}\left( x \right)dx ,$ and $\mathrm{E}\left\lbrack \left( Y^{P} \right)^{k} \right\rbrack = \frac{\int_{d}^{\infty}\left( x - d \right)^{k}f_{X}\left( x \right) dx }{{1 - F}_{X}\left( d \right)} = \frac{\mathrm{E}\left\lbrack \left( Y^{L} \right)^{k} \right\rbrack}{{1 - F}_{X}\left( d \right)},$ respectively. For $k=1$, we can use the survival function to calculate $\mathrm{E}(Y^L)$ as $\mathrm{E}(Y^L) = \int_d^{\infty} [1-F_X(x)] ~dx .$ This could be easily proved if we start with the initial definition of $\mathrm{E}(Y^L)$ and use integration by parts.

We have seen that the deductible $d$ imposed on an insurance policy is the amount of loss that has to be paid out of pocket before the insurer makes any payment. The deductible $d$ imposed on an insurance policy reduces the insurer’s payment. The is the percentage decrease in the expected payment of the insurer as a result of imposing the deductible. LER is defined as $LER = \frac{\mathrm{E}\left( X \right) - \mathrm{E}\left( Y^{L} \right)}{\mathrm{E}\left( X \right)}.$

A little less common type of policy deductible is the franchise deductible. The will apply to the policy in the same way as ordinary deductible except that when the loss exceeds the deductible $d$, the full loss is covered by the insurer. The payment per loss and payment per payment variables are defined as $Y^{L} = \left\{ \begin{matrix} 0 & X \leq d, \\ X & X > d, \\ \end{matrix} \right.\$ and $Y^{P} = \left\{ \begin{matrix} \text{Undefined} & X \leq d, \\ X & X > d, \\ \end{matrix} \right.\$ respectively.

Example 3.4.1. Actuarial Exam Question. A claim severity distribution is exponential with mean 1000. An insurance company will pay the amount of each claim in excess of a deductible of 100. Calculate the variance of the amount paid by the insurance company for one claim, including the possibility that the amount paid is 0.

##### Show Example Solution

Example 3.4.2. Actuarial Exam Question. For an insurance:
• Losses have a density function $f_{X}\left( x \right) = \left\{ \begin{matrix} 0.02x & 0 < x < 10, \\ 0 & \text{elsewhere.} \\ \end{matrix} \right.$
• The insurance has an ordinary deductible of 4 per loss.
• $Y^{P}$ is the claim payment per payment random variable.

Calculate $\mathrm{E}\left( Y^{P} \right)$.

##### Show Example Solution

Example 3.4.3. Actuarial Exam Question. You are given:
1. Losses follow an exponential distribution with the same mean in all years.
2. The loss elimination ratio this year is 70%.
3. The ordinary deductible for the coming year is 4/3 of the current deductible.

Compute the loss elimination ratio for the coming year.

### 3.4.2 Policy Limits

Under a limited policy, the insurer is responsible for covering the actual loss $X$ up to the limit of its coverage. This fixed is called the policy limit and often denoted by $u$. If the loss exceeds the policy limit, the difference $X - u$ has to be paid by the policyholder. While a higher policy limit means a higher payout to the insured, it is associated with a higher premium.

Let $X$ denote the loss incurred to the insured and $Y$ denote the amount of paid claim by the insurer. The variable $Y$ known as the limited loss variable and denoted by $X \land u$. It is a because values above $u$ are set equal to $u$. The limited loss random variable $Y$ is defined as $Y = X \land u = \left\{ \begin{matrix} X & X \leq u, \\ u & X > u. \\ \end{matrix} \right.\$ It can be seen that the distinction between $Y^{L}$ and $Y^{P}$ is not needed under limited policy as the insurer will always make a payment.

Using the definitions of $\left(X-d\right)_+ \text{ and } \left(X\land d\right)$, it can be easily seen that the expected payment without any coverage modification, $X$, is equal to the sum of the expected payments with deductible $d$ and limit $d$. That is, ${X=\left(X-d\right)}_++ \left(X\land d\right)$.

When a loss is subject to a deductible $d$ and a limit $u$, the per-loss variable $Y^L$ is defined as $Y^{L} = \left\{ \begin{matrix} 0 & X \leq d, \\ X - d & d < X \leq u, \\ u - d & X > u. \\ \end{matrix} \right.\$ Hence, $Y^L$ can be expressed as $Y^L=\left(X\land u\right)-\left(X\land d\right)$.

Even when the distribution of $X$ is continuous, the distribution of $Y$ is a hybrid combination of discrete and continuous components. The discrete part of the distribution is concentrated at $Y = u$ (when $X > u$), while the continuous part is spread over the interval $Y < u$ (when $X \leq u$). For the discrete part, the probability that the benefit paid is $u$, is the probability that the loss exceeds the policy limit $u$; that is, $\Pr \left( Y = u \right) = \Pr \left( X > u \right) = {1 - F}_{X}\left( u \right).$ For the continuous part of the distribution $Y = X$, hence the probability density function of $Y$ is given by $f_{Y}\left( y \right) = \left\{ \begin{matrix} f_{X}\left( y \right) & 0 < y < u, \\ 1 - F_{X}\left( u \right) & y = u. \\ \end{matrix} \right.\$ Accordingly, the distribution function of $Y$ is given by $F_{Y}\left( y \right) = \left\{ \begin{matrix} F_{X}\left( x \right) & 0 < y < u, \\ 1 & y \geq u. \\ \end{matrix} \right.\$ The raw moments of $Y$ can be found directly using the probability density function of $X$ as follows $\mathrm{E}\left( Y^{k} \right) = \mathrm{E}\left\lbrack \left( X \land u \right)^{k} \right\rbrack = \int_{0}^{u}x^{k}f_{X}\left( x \right)dx + \int_{u}^{\infty}{u^{k}f_{X}\left( x \right)} dx \\ = \int_{0}^{u}x^{k}f_{X}\left( x \right)dx + u^{k}\left\lbrack {1 - F}_{X}\left( u \right) \right\rbrack .$ For $k=1$, we can use the survival function to calculate $\mathrm{E}\left( Y \right)$ as follows $\mathrm{E}\left( Y \right) = \mathrm{E}\left( X \land u \right) = \int_{0}^{u} [1-F_{X}(x) ]dx .$ This could be easily proved if we start with the initial definition of $\mathrm{E}\left( Y \right)$ and use integration by parts.

Example 3.4.4. Actuarial Exam Question. Under a policy, an insurer agrees to pay 100% of the medical bills incurred during the year by employees of a small company, up to a maximum total of one million dollars. The total amount of bills incurred, $X$, has probability density function $f_{X}\left( x \right) = \left\{ \begin{matrix} \frac{x\left( 4 - x \right)}{9} & 0 < x < 3, \\ 0 & \text{elsewhere.} \\ \end{matrix} \right.\$ where $x$ is measured in millions. Calculate the total amount, in millions of dollars, the insurer would expect to pay under this policy.

### 3.4.3 Coinsurance and Inflation

As we have seen in Section 3.4.1, the amount of loss retained by the policyholder can be losses up to the deductible $d$. The retained loss can also be a percentage of the claim. The percentage $\alpha$, often referred to as the coinsurance factor, is the percentage of claim the insurance company is required to cover. If the policy is subject to an ordinary deductible and policy limit, coinsurance refers to the percentage of claim the insurer is required to cover, after imposing the ordinary deductible and policy limit. The payment per loss variable, $Y^{L}$, is defined as $Y^{L} = \left\{ \begin{matrix} 0 & X \leq d, \\ \alpha\left( X - d \right) & d < X \leq u, \\ \alpha\left( u - d \right) & X > u. \\ \end{matrix} \right.\$ The policy limit (the maximum amount paid by the insurer) in this case is $\alpha\left( u - d \right)$, while $u$ is the maximum covered loss.

We have seen in Section 3.4.2 that when a loss is subject to both a deductible $d$ and a limit $u$ the per-loss variable $Y^L$ can be expressed as $Y^L=\left(X\land u\right)-\left(X\land d\right)$. With coinsurance, this becomes $Y^L$ can be expressed as $Y^L=\alpha\left[(X\land u)-(X\land d)\right]$.

The $k$-th moment of $Y^{L}$ is given by $\mathrm{E}\left\lbrack \left( Y^{L} \right)^{k} \right\rbrack = \int_{d}^{u}\left\lbrack \alpha\left( x - d \right) \right\rbrack^{k}f_{X}\left( x \right)dx + \left\lbrack \alpha\left( u - d \right) \right\rbrack^{k} [1-F_{X}\left( u \right)] .$

A $\left( 1 + r \right)$ may be applied to $X$ resulting in an inflated loss random variable $\left( 1 + r \right)X$ (the prespecified d and u remain unchanged). The resulting per loss variable can be written as $Y^{L} = \left\{ \begin{matrix} 0 & X \leq \frac{d}{1 + r}, \\ \alpha\left\lbrack \left( 1 + r \right)X - d \right\rbrack & \frac{d}{1 + r} < X \leq \frac{u}{1 + r}, \\ \alpha\left( u - d \right) & X > \frac{u}{1 + r}. \\ \end{matrix} \right.\$ The first and second moments of $Y^{L}$ can be expressed as $\mathrm{E}\left( Y^{L} \right) = \alpha\left( 1 + r \right)\left\lbrack \mathrm{E}\left( X \land \frac{u}{1 + r} \right) - \mathrm{E}\left( X \land \frac{d}{1 + r} \right) \right\rbrack,$ and $\mathrm{E}\left\lbrack \left( Y^{L} \right)^{2} \right\rbrack = \alpha^{2}\left( 1 + r \right)^{2} \left\{ \mathrm{E}\left\lbrack \left( X \land \frac{u}{1 + r} \right)^{2} \right\rbrack - \mathrm{E}\left\lbrack \left( X \land \frac{d}{1 + r} \right)^{2} \right\rbrack \right. \\ \left. \ \ \ \ \ - 2\left( \frac{d}{1 + r} \right)\left\lbrack \mathrm{E}\left( X \land \frac{u}{1 + r} \right) - \mathrm{E}\left( X \land \frac{d}{1 + r} \right) \right\rbrack \right\} ,$ respectively.

The formulas given for the first and second moments of $Y^{L}$ are general. Under full coverage, $\alpha = 1$, $r = 0$, $u = \infty$, $d = 0$ and $\mathrm{E}\left( Y^{L} \right)$ reduces to $\mathrm{E}\left( X \right)$. If only an ordinary deductible is imposed, $\alpha = 1$, $r = 0$, $u = \infty$ and $\mathrm{E}\left( Y^{L} \right)$ reduces to $\mathrm{E}\left( X \right) - \mathrm{E}\left( X \land d \right)$. If only a policy limit is imposed $\alpha = 1$, $r = 0$, $d = 0$ and $\mathrm{E}\left( Y^{L} \right)$ reduces to $\mathrm{E}\left( X \land u \right)$.

Example 3.4.5. Actuarial Exam Question. The ground up loss random variable for a health insurance policy in 2006 is modeled with X, an exponential distribution with mean 1000. An insurance policy pays the loss above an ordinary deductible of 100, with a maximum annual payment of 500. The ground up loss random variable is expected to be 5% larger in 2007, but the insurance in 2007 has the same deductible and maximum payment as in 2006. Find the percentage increase in the expected cost per payment from 2006 to 2007.

### 3.4.4 Reinsurance

In Section 3.4.1 we introduced the policy deductible, which is a contractual arrangement under which an insured transfers part of the risk by securing coverage from an insurer in return for an insurance premium. Under that policy, the insured must pay all losses up to the deductible, and the insurer only pays the amount (if any) above the deductible. We now introduce , a mechanism of insurance for insurance companies. Reinsurance is a contractual arrangement under which an insurer transfers part of the underlying insured risk by securing coverage from another insurer (referred to as a reinsurer) in return for a reinsurance premium. Although reinsurance involves a relationship between three parties: the original insured, the insurer (often referred to as or cedant) and the reinsurer, the parties of the reinsurance agreement are only the primary insurer and the reinsurer. There is no contractual agreement between the original insured and the reinsurer. Though many different types of reinsurance contracts exist, a common form is . In such contracts, the primary insurer must make all required payments to the insured until the primary insurer’s total payments reach a fixed reinsurance deducible. The reinsurer is then only responsible for paying losses above the reinsurance deductible. The maximum amount retained by the primary insurer in the reinsurance agreement (the reinsurance deductible) is called .

Reinsurance arrangements allow insurers with limited financial resources to increase the capacity to write insurance and meet client requests for larger insurance coverage while reducing the impact of potential losses and protecting the insurance company against catastrophic losses. Reinsurance also allows the primary insurer to benefit from underwriting skills, expertise and proficient complex claim file handling of the larger reinsurance companies.

Example 3.4.6. Actuarial Exam Question. Losses arising in a certain portfolio have a two-parameter Pareto distribution with $\alpha=5$ and $\theta=3,600$. A reinsurance arrangement has been made, under which (a) the reinsurer accepts 15% of losses up to $u=5,000$ and all amounts in excess of 5,000 and (b) the insurer pays for the remaining losses.

1. Express the random variables for the reinsurer’s and the insurer’s payments as a function of $X$, the portfolio losses.
2. Calculate the mean amount paid on a single claim by the insurer.
3. By assuming that the upper limit is $u = \infty$, calculate an upper bound on the standard deviation of the amount paid on a single claim by the insurer (retaining the 15% copayment).
##### Show Example Solution

Further discussions of reinsurance will be provided in Section 10.4.

## 3.5 Maximum Likelihood Estimation

In this section, you learn how to:

• Define a likelihood for a sample of observations from a continuous distribution
• Define the maximum likelihood estimator for a random sample of observations from a continuous distribution
• Estimate parametric distributions based on grouped, censored, and truncated data

### 3.5.1 Maximum Likelihood Estimators for Complete Data

Up to this point, the chapter has focused on parametric distributions that are commonly used in insurance applications. However, to be useful in applied work, these distributions must use “realistic” values for the parameters and for this we turn to data. At a foundational level, we assume that the analyst has available a random sample $X_1, \ldots, X_n$ from a distribution with distribution function $F_X$ (for brevity, we sometimes drop the subscript $X$). As is common, we use the vector $\boldsymbol \theta$ to denote the set of parameters for $F$.This basic sample scheme is reviewed in Appendix Section 15.1. Although basic, this sampling scheme provides the foundations for understanding more complex schemes that are regularly used in practice, and so it is important to master the basics.

Before a draw from a distribution, we consider potential outcomes summarized by the random variable $X_i$ (here, $i$ is 1, 2, …, $n$). After the draw, we observe $x_i$. Notationally, we use upper case roman letters for random variables and lower case ones for realizations. We have seen this set-up already in Section 2.4, where we used $\Pr(X_1 =x_1, \ldots, X_n=x_n)$ to quantify the “likelihood” of drawing a sample $\{x_1, \ldots, x_n\}$. With continuous data, we use the joint probability density function (pdf) instead of joint probabilities. With the independence assumption, the joint pdf may be written as the product of pdfs. Thus, we define the likelihood to be

$$$L(\boldsymbol \theta) = \prod_{i=1}^n f(x_i) . \tag{3.2}$$$

From the notation, note that we consider this to be a function of the parameters in $\boldsymbol \theta$, with the data $\{x_1, \ldots, x_n\}$ held fixed. The maximum likelihood estimator is that value of the parameters in $\boldsymbol \theta$ that maximize $L(\boldsymbol \theta)$.

From calculus, we know that maximizing a function produces the same results as maximizing the logarithm of a function (this is because the logarithm is a monotone, convex function). Because we get the same results, to ease computation considerations, it is common to consider the logarithmic likelihood, denoted as

$$$l(\boldsymbol \theta) = \ln L(\boldsymbol \theta) = \sum_{i=1}^n \ln f(x_i) . \tag{3.3}$$$

Example 3.5.1. Actuarial Exam Question. You are given the following five observations: 521, 658, 702, 819, 1217. You use the single-parameter Pareto with cumulative distribution function: $F(x) = 1- \left(\frac{500}{x}\right)^{\alpha}, ~~~~ x>500 .$

With $n=5$, the log-likelihood function is $l(\alpha|\mathbf{x} ) = \sum_{i=1}^5 \ln f(x_i;\alpha ) = 5 \alpha \ln 500 + 5 \ln \alpha -(\alpha+1) \sum_{i=1}^5 \ln x_i.$ Figure 3.4 shows the logarithmic likelihood as a function of the parameter $\alpha$.

We can determine the maximum value of the logarithmic likelihood by taking derivatives and setting it equal to zero. This yields $\frac{ \partial}{\partial \alpha } l(\alpha |\mathbf{x}) = 5 \ln 500 + 5 / \alpha - \sum_{i=1}^5 \ln x_i =_{set} 0 \Rightarrow \hat{\alpha}_{MLE} = \frac{5}{\sum_{i=1}^5 \ln x_i - 5 \ln 500 } = 2.453 .$

Naturally, there are many problems where it is not practical to use hand calculations for optimization. Fortunately there are many statistical routines available such as the R function optim.

##### R Code for Optimization

This code confirms our hand calculation result where the maximum likelihood estimator is $\alpha_{MLE} =$ 2.453125.

We present a few additional examples to illustrate how actuaries fit a parametric distribution model to a set of claim data using maximum likelihood.

Example 3.5.2. Actuarial Exam Question. Consider a random sample of claim amounts: 8,000 10,000 12,000 15,000. You assume that claim amounts follow an inverse exponential distribution, with parameter $\theta$. Calculate the maximum likelihood estimator for $\theta$.
##### Show Example Solution

Example 3.5.3. Actuarial Exam Question. A random sample of size 6 is from a lognormal distribution with parameters $\mu$ and $\sigma$. The sample values are 200, 3,000, 8,000, 60,000, 60,000, 160,000. Calculate the maximum likelihood estimator for $\mu$ and $\sigma$.
##### Show Example Solution

Two follow-up questions rely on large sample properties that you may have seen in an earlier course. Appendix Chapter 17 reviews the definition of the likelihood function, introduces its properties, reviews the maximum likelihood estimators, extends their large-sample properties to the case where there are multiple parameters in the model, and reviews statistical inference based on maximum likelihood estimators. In the solutions of these examples we derive the asymptotic variance of maximum-likelihood estimators of the model parameters. We use the delta method to derive the asymptotic variances of functions of these parameters.

Example 3.5.2 - Follow - Up. Refer to Example 3.5.2.
1. Approximate the variance of the maximum likelihood estimator.
2. Determine an approximate 95% confidence interval for $\theta$.
3. Determine an approximate 95% confidence interval for $\Pr \left( X \leq 9,000 \right).$
##### Show Example Solution

Example 3.5.3 - Follow - Up. Refer to Example 3.5.3.
1. Estimate the of the maximum likelihood estimator.
2. Determine approximate 95% confidence intervals for $\mu$ and $\sigma$.
3. Determine an approximate 95% confidence interval for the mean of the lognormal distribution.
##### Show Example Solution

Example 3.5.4. Wisconsin Property Fund. To see how maximum likelihood estimators work with real data, we return to the 2010 claims data introduced in Section 1.3.

The following snippet of code shows how to fit the exponential, gamma, Pareto, lognormal, and GB2 models. For consistency, the code employs the R package VGAM. The acronym stands for Vector Generalized Linear and Additive Models; as suggested by the name, this package can do far more than fit these models although it suffices for our purposes. The one exception is the GB2 density which is not widely used outside of insurance applications; however, we can code this density and compute maximum likelihood estimators using the optim general purpose optimizer.

##### Show Example Solution

Results from the fitting exercise are summarized in Figure 3.5. Here, the black “longdash” curve is a smoothed histogram of the actual data (that we will introduce in Section 4.1); the other curves are parametric curves where the parameters are computed via maximum likelihood. We see poor fits in the red dashed line from the exponential distribution fit and the blue dotted line from the gamma distribution fit. Fits of the other curves, Pareto, lognormal, and GB2, all seem to provide reasonably good fits to the actual data. Chapter 4 describes in more detail the principles of model selection.

### 3.5.2 Maximum Likelihood Estimators using Modified Data

In many applications, actuaries and other analysts wish to estimate model parameters based on individual data that are not limited. However, there are also important applications when only limited, or modified, data are available. This section introduces maximum likelihood estimation for grouped, censored, and truncated data. Later, we will follow up with additional details in Section 4.3.

#### 3.5.2.1 Maximum Likelihood Estimators for Grouped Data

In the previous section we considered the maximum likelihood estimation of continuous models from complete (individual) data. Each individual observation is recorded, and its contribution to the likelihood function is the density at that value. In this section we consider the problem of obtaining maximum likelihood estimates of parameters from . The observations are only available in grouped form, and the contribution of each observation to the likelihood function is the probability of falling in a specific group (interval). Let $n_{j}$ represent the number of observations in the interval $\left( \left. \ c_{j - 1},c_{j} \right\rbrack \right.\$ The grouped data likelihood function is thus given by $L\left( \theta \right) = \prod_{j = 1}^{k}\left\lbrack F_X\left( \left. \ c_{j} \right|\theta \right) - F_X\left( \left. \ c_{j - 1} \right|\theta \right) \right\rbrack^{n_{j}},$ where $c_{0}$ is the smallest possible observation (often set to zero) and $c_{k}$ is the largest possible observation (often set to infinity).

Example 3.5.5. Actuarial Exam Question. For a group of policies, you are given that losses follow the distribution function $F_X\left( x \right) = 1 - \frac{\theta}{x}$, for $\theta < x < \infty.$ Further, a sample of 20 losses resulted in the following:

${\small \begin{matrix}\hline \text{Interval} & \text{Number of Losses} \\ \hline (\theta, 10] & 9 \\ (10, 25] & 6 \\ (25, \infty) & 5 \\ \hline \end{matrix} }$

Calculate the maximum likelihood estimate of $\theta$.

#### 3.5.2.2 Maximum Likelihood Estimators for Censored Data

Another possible distinguishing feature of a data gathering mechanism is censoring. While for some events of interest (losses, claims, lifetimes, etc.) the maybe available, for others only partial information is available; all that may be known is that the observation exceeds a specific value. The limited policy introduced in Section 3.4.2 is an example of right censoring. Any loss greater than or equal to the policy limit is recorded at the limit. The contribution of the censored observation to the likelihood function is the probability of the random variable exceeding this specific limit. Note that contributions of both complete and censored data share the survivor function, for a complete point this survivor function is multiplied by the hazard function, but for a censored observation it is not. The likelihood function for censored data is then given by

$L(\theta) = \left[ \prod_{i=1}^r f_X(x_i) \right] \left[ S_X(u) \right]^m ,$ where $r$ is the number of known loss amounts below the limit $u$ and $m$ is the number of loss amounts larger than the limit $u$.

Example 3.5.6. Actuarial Exam Question. The random variable $X$ has survival function: $S_{X}\left( x \right) = \frac{\theta^{4}}{\left( \theta^{2} + x^{2} \right)^{2}}.$ Two values of $X$ are observed to be 2 and 4. One other value exceeds 4. Calculate the maximum likelihood estimate of $\theta$.

#### 3.5.2.3 Maximum Likelihood Estimators for Truncated Data

This section is concerned with the maximum likelihood estimation of the continuous distribution of the random variable $X$ when the data is incomplete due to truncation. If the values of $X$ are truncated at $d$, then it should be noted that we would not have been aware of the existence of these values had they not exceeded $d$. The policy deductible introduced in Section 3.4.1 is an example of left truncation. Any loss less than or equal to the deductible is not recorded. The contribution to the likelihood function of an observation $x$ truncated at $d$ will be a conditional probability and the $f_{X}\left( x \right)$ will be replaced by $\frac{f_{X}\left( x \right)}{S_{X}\left( d \right)}$. The likelihood function for truncated data is then given by

$L(\theta) = \prod_{i=1}^k \frac{f_X(x_i)}{S_X(d)} ,$ where $k$ is the number of loss amounts larger than the deductible $d$.

Example 3.5.7. Actuarial Exam Question. For the single parameter Pareto distribution with $\theta = 2$, maximum likelihood estimation is applied to estimate the parameter $\alpha$. Find the estimated mean of the ground up loss distribution based on the maximum likelihood estimate of $\alpha$ for the following data set:
• Ordinary policy deductible of 5, maximum covered loss of 25 (policy limit 20)
• 8 insurance payment amounts: 2, 4, 5, 5, 8, 10, 12, 15
• 2 limit payments: 20, 20.

## 3.6 Further Resources and Contributors

#### Contributors

• Zeinab Amin, The American University in Cairo, is the principal author of this chapter. Email: zeinabha@aucegypt.edu for chapter comments and suggested improvements.
• Many helpful comments have been provided by Hirokazu (Iwahiro) Iwasawa, iwahiro@bb.mbn.or.jp .
• Other chapter reviewers include: Rob Erhardt, Jorge Yslas, Tatjana Miljkovic, and Samuel Kolins.

#### Exercises

Here are a set of exercises that guide the viewer through some of the theoretical foundations of Loss Data Analytics. Each tutorial is based on one or more questions from the professional actuarial examinations – typically the Society of Actuaries Exam C.

Notable contributions include: Cummins and Derrig (2012), Frees and Valdez (2008), Klugman, Panjer, and Willmot (2012), Kreer et al. (2015), McDonald (1984), McDonald and Xu (1995), Tevet (2016), and Venter (1983).

### Bibliography

Cummins, J. David, and Richard A. Derrig. 2012. Managing the Insolvency Risk of Insurance Companies: Proceedings of the Second International Conference on Insurance Solvency. Vol. 12. Springer Science & Business Media.

Frees, Edward W., and Emiliano A. Valdez. 1998. “Understanding Relationships Using Copulas.” North American Actuarial Journal 2 (01): 1–25.

2008. “Hierarchical Insurance Claims Modeling.” Journal of the American Statistical Association 103 (484). Taylor & Francis: 1457–69.

Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models: From Data to Decisions. John Wiley & Sons.

Kreer, Markus, Ayşe Kizilersü, Anthony W Thomas, and Alfredo D Egídio dos Reis. 2015. “Goodness-of-Fit Tests and Applications for Left-Truncated Weibull Distributions to Non-Life Insurance.” European Actuarial Journal 5 (1). Springer: 139–63.

McDonald, James B. 1984. “Some Generalized Functions for the Size Distribution of Income.” Econometrica: Journal of the Econometric Society. JSTOR, 647–63.

McDonald, James B, and Yexiao J Xu. 1995. “A Generalization of the Beta Distribution with Applications.” Journal of Econometrics 66 (1-2). Elsevier: 133–52.

Tevet, Dan. 2016. “Applying Generalized Linear Models to Insurance Data.” Predictive Modeling Applications in Actuarial Science: Volume 2, Case Studies in Insurance. Cambridge University Press, 39.

Venter, Gary. 1983. “Transformed Beta and Gamma Distributions and Aggregate Losses.” In Proceedings of the Casualty Actuarial Society, 70:289–308. 133 & 134.