Chapter 9 Experience Rating Using Credibility Theory

Chapter Preview. This chapter introduces credibility theory as an important actuarial tool for estimating pure premiums, frequencies, and severities for individual risks or classes of risks. Credibility theory provides a convenient framework for combining the experience for an individual risk or class with other data to produce more stable and accurate estimates. Several models for calculating credibility estimates will be discussed including limited fluctuation, Bühlmann, Bühlmann-Straub, and nonparametric and semiparametric credibility methods. The chapter will also show a connection between credibility theory and Bayesian estimation which was introduced in Chapter 4.

9.1 Introduction to Applications of Credibility Theory

What premium should be charged to provide insurance? The answer depends upon the exposure to the risk of loss. A common method to compute an insurance premium is to rate an insured using a classification rating planA rating plan that uses an insured’s risk characteristics to determine premium. A classification plan is used to select an insurance rate based on an insured’s rating characteristics such as geographic territory, age, etc. All classification rating plans use a limited set of criteria to group insureds into a “class” and there will be variation in the risk of loss among insureds within the class.

An experience rating plan attempts to capture some of the variation in the risk of loss among insureds within a rating class by using the insured’s own loss experience to complement the rate from the classification rating plan. One way to do this is to use a credibility weightThe weight assigned to an insured’s historical loss experience for the purposes of determining their premium in an experience rating plan \(Z\) with \(0\leq Z \leq 1\) to compute

\[ \hat{R}=Z\bar{X}+(1-Z)M, \]

\[\begin{eqnarray*} \hat{R}&=&\textrm{credibility weighted rate for risk,}\\ \bar{X}&=&\textrm{average loss for the risk over a specified time period,}\\ M&=&\textrm{the rate for the classification group, often called the manual rate.}\\ \end{eqnarray*}\]

For a risk whose loss experience is stable from year to year, \(Z\) might be close to 1. For a risk whose losses vary widely from year to year, \(Z\) may be close to 0.

Credibility theory is also used for computing rates for individual classes within a classification rating plan. When classification plan rates are being determined, some or many of the groups may not have sufficient data to produce stable and reliable rates. The actual loss experience for a group will be assigned a credibility weight \(Z\) and the complement of credibilityThe remainder of the weight not assigned to an insured’s historical loss experience in the experience rating plan \(1-Z\) may be given to the average experience for risks across all classes. Or, if a class rating plan is being updated, the complement of credibility may be assigned to the current class rateAverage rate per exposure for an insured in a particular classification group. Credibility theory can also be applied to the calculation of expected frequencies and severities.

Computing numeric values for \(Z\) requires analysis and understanding of the data. What are the variances in the number of losses and sizes of losses for risks? What is the variance between expected values across risks?

Show Quiz Solution

9.2 Limited Fluctuation Credibility


In this section, you learn how to:

  • Calculate full credibility standards for number of claims, average size of claims, and aggregate losses.
  • Learn how the relationship between means and variances of underlying distributions affects full credibility standards.
  • Determine credibility-weight \(Z\) using the square-root partial credibility formula.

Limited fluctuation credibilityA credibility method that attempts to limit fluctuations in its estimates, also called “classical credibility” and “American credibility,” was given this name because the method explicitly attempts to limit fluctuations in estimates for claim frequencies, severities, or losses. For example, suppose that you want to estimate the expected number of claims \(N\) for a group of risks in an insurance rating class. How many risks are needed in the class to ensure that a specified level of accuracy is attained in the estimate? First the question will be considered from the perspective of how many claims are needed.

9.2.1 Full Credibility for Claim Frequency

Let \(N\) be a random variable representing the number of claims for a group of risks, for example, risks within a particular rating classification. The observed number of claims will be used to estimate \(\mu_N=\mathrm{E}[N]\), the expected number of claims. How big does \(\mu_N\) need to be to get a good estimate? One way to quantify the accuracy of the estimate would be with a statement like: ``The observed value of \(N\) should be within 5\(\%\) of \(\mu_N\) at least 90\(\%\) of the time." Writing this as a mathematical expression would give \(\Pr[0.95 \mu_N \leq N \leq 1.05 \mu_N] \geq 0.90\). Generalizing this statement by letting the range parameter \(k\) replace 5\(\%\) and probability level \(p\) replace 0.90 gives the equation

\[\begin{equation} \Pr[(1-k) \mu_N \leq N \leq (1+k) \mu_N] \geq p . \tag{9.1} \end{equation}\]

The expected number of claims required for the probability on the left-hand side of (9.1) to equal \(p\) is called the full credibility standardThe threshold of experience necessary to assign 100% credibility to the insured’s own experience.

If the expected number of claims is greater than or equal to the full credibility standard then full credibility can be assigned to the data so \(Z=1\). Usually the expected value \(\mu_N\) is not known so full credibility will be assigned to the data if the actual observed number of claims \(n\) is greater than or equal to the full credibility standard. The \(k\) and \(p\) values must be selected and the actuary may rely on experience, judgment, and other factors in making the choices.

Subtracting \(\mu_N\) from each term in (9.1) and dividing by the standard deviation \(\sigma_N\) of \(N\) gives

\[\begin{equation} \Pr\left[\frac{-k\mu_N}{\sigma_N}\leq \frac{N-\mu_N}{\sigma_N} \leq \frac{k\mu_N}{\sigma_N}\right] \geq p. \tag{9.2} \end{equation}\]

In limited fluctuation credibility the standard normal distribution is used to approximate the distribution of \((N-\mu_N)/\sigma_N\). If \(N\) is the sum of many claims from a large group of similar risks and the claims are independent, then the approximation may be reasonable.

Let \(y_p\) be the value such that

\[ \Pr[-y_p\leq \frac{N-\mu_N}{\sigma_N} \leq y_p]=\Phi(y_p)-\Phi(-y_p)=p \]

where \(\Phi( )\) is the cumulative distribution function of the standard normalCumulative density function for the normal distribution with mean 0 and standard deviation 1. Because \(\Phi(-y_p)=1-\Phi(y_p)\), the equality can be rewritten as \(2\Phi(y_p)-1=p\). Solving for \(y_p\) gives \(y_p=\Phi^{-1}((p+1)/2)\) where \(\Phi^{-1}( )\) is the inverse of \(\Phi( )\).

Equation (9.2) will be satisfied if \(k\mu_N/\sigma_N \geq y_p\) assuming the normal approximation. First we will consider this inequality for the case when \(N\) has a Poisson distribution: \(\Pr[N=n] = \lambda^n\textrm{e}^{-\lambda}/n!\). Because \(\lambda=\mu_N=\sigma_N^2\) for the Poisson, taking square roots yields \(\mu_N^{1/2}=\sigma_N\). So, \(k\mu_N/\mu_N^{1/2} \geq y_p\) which is equivalent to \(\mu_N \geq (y_p/k)^2\). Let’s define \(\lambda_{kp}\) to be the value of \(\mu_N\) for which equality holds. Then the full credibility standard for the Poisson distribution is

\[\begin{equation} \lambda_{kp} = \left(\frac{y_p}{k}\right)^2 \textrm{with } y_p=\Phi^{-1}((p+1)/2). \tag{9.3} \end{equation}\]

If the expected number of claims \(\mu_N\) is greater than or equal to \(\lambda_{kp}\) then equation (9.1) is assumed to hold and full credibility can be assigned to the data. As noted previously, because \(\mu_N\) is usually unknown, full credibility is given if the observed number of claims \(n\) satisfies \(n \geq \lambda_{kp}.\)

Example 9.2.1. The full credibility standard is set so that the observed number of claims is to be within 5% of the expected value with probability \(p=0.95\). If the number of claims has a Poisson distribution find the number of claims needed for full credibility.

Show Example Solution

If claims are not Poisson distributed then equation (9.2) does not imply (9.3). Setting the upper bound of \((N-\mu_N)/\sigma_N\) in (9.2) equal to \(y_p\) gives \(k\mu_N/\sigma_N=y_p\). Squaring both sides and moving everything to the right side except for one of the \(\mu_N\)’s gives \(\mu_N=(y_p/k)^2(\sigma_N^2/\mu_N)\). This is the full credibility standard for frequency and will be denoted by \(n_f\),

\[\begin{equation} n_f=\left(\frac{y_p}{k}\right)^2\left(\frac{\sigma_N^2}{\mu_N}\right)=\lambda_{kp}\left(\frac{\sigma_N^2}{\mu_N}\right). \tag{9.4} \end{equation}\]

This is the same equation as the Poisson full credibility standard except for the \((\sigma_N^2/\mu_N)\) multiplier. When the claims distribution is Poisson this extra term is one because the variance equals the mean.

Example 9.2.2. The full credibility standard is set so that the total number of claims is to be within 5\(\%\) of the observed value with probability \(p=0.95\). The number of claims has a negative binomial distribution,

\[ \Pr(N=x)={x+r-1\choose x} \left(\frac{1}{1+\beta}\right)^r \left(\frac{\beta}{1+\beta}\right)^x , \]

with \(\beta=1\). Calculate the full credibility standard.

Show Example Solution

We see that the negative binomial distribution with \((\sigma_N^2/\mu_N)>1\) requires more claims for full credibility than a Poisson distribution for the same \(k\) and \(p\) values. The next example shows that a binomial distribution which has \((\sigma_N^2/\mu_N)<1\) will need fewer claims for full credibility.

Example 9.2.3. The full credibility standard is set so that the total number of claims is to be within 5\(\%\) of the observed value with probability \(p=0.95\). The number of claims has a binomial distribution

\[ \Pr(N=x)={m\choose x}q^x(1-q)^{m-x}. \]

Calculate the full credibility standard for \(q=1/4\).

Show Example Solution

Rather than using expected number of claims to define the full credibility standard, the number of exposures can be used for the full credibility standard. An exposure is a measure of risk. For example, one car insured for a full year would be one car-year. Two cars each insured for exactly one-half year would also result in one car-year. Car-years attempt to quantify exposure to loss. Two car-years would be expected to generate twice as many claims as one car-year if the vehicles have the same risk of loss. To translate a full credibility standard denominated in terms of number of claims to a full credibility standard denominated in exposures one needs a reasonable estimate of the expected number of claims per exposure.

Example 9.2.4. The full credibility standard should be selected so that the observed number of claims will be within 5\(\%\) of the expected value with probability \(p=0.95\). The number of claims has a Poisson distribution. If one exposure is expected to have about 0.20 claims per year, find the number of exposures needed for full credibility.

Show Example Solution

Frequency can be defined as the number of claims per exposure. Letting \(m\) denote the number of exposures. Then, if observed claim frequency \(N/m\) is used to estimate \(\mathrm{E}(N/m)\):

\[ \Pr[(1-k)\mathrm{E}(N/m)\leq N/m \leq(1+k)\mathrm{E}(N/m)] \geq p. \]

Because the number of exposures is not a random variable, \(\mathrm{E}(N/m)=\mathrm{E}(N)/m=\mu_N/m\) and the prior equation becomes

\[ \Pr\left[(1-k)\frac{\mu_N}{m}\leq \frac{N}{m} \leq(1+k)\frac{\mu_N}{m}\right] \geq p. \]

Multiplying through by \(m\) results in equation (9.1) at the beginning of the section. The full credibility standards that were developed for estimating expected number of claims also apply to frequency.

9.2.2 Full Credibility for Aggregate Losses and Pure Premium

Aggregate losses are the total of all loss amounts for a risk or group of risks. Letting \(S\) represent aggregate losses

\[ S=X_1+X_2+\cdots+X_N. \]

The random variable \(N\) represents the number of losses and random variables \(X_1, X_2,\ldots,X_N\) are the individual loss amounts. In this section it is assumed that \(N\) is independent of the loss amounts and that \(X_1, X_2,\ldots,X_N\) are iidIndependent and identically distributed.

The mean and variance of \(S\) are

\[ \mu_S=\mathrm{E}(S)=\mathrm{E}(N)\mathrm{E}(X)=\mu_N\mu_X \]

and

\[ \sigma^{2}_S=\mathrm{Var}(S)=\mathrm{E}(N)\mathrm{Var}(X)+[\mathrm{E}(X)]^{2}\mathrm{Var}(N)=\mu_N\sigma^{2}_X+\mu^{2}_X\sigma^{2}_N , \]

where \(X\) is the amount of a single loss. See the discussion on collective risk models in Section 5.3 for more discussion of this framework.

Observed losses \(S\) will be used to estimate expected losses \(\mu_S=\mathrm{E}(S)\). As with the frequency model in the previous section, the observed losses must be close to the expected losses as quantified in the equation

\[ \Pr[(1-k)\mu_S\leq S \leq(1+k)\mu_S] \geq p. \]

After subtracting the mean and dividing by the standard deviation,

\[ \Pr\left[\frac{-k\mu_S}{\sigma_S}\leq (S-\mu_S)/\sigma_S \leq \frac{k\mu_S}{\sigma_S}\right] \geq p . \]

As done in the previous section the distribution for \((S-\mu_S)/\sigma_S\) is assumed to be standard normal and \(k\mu_S/\sigma_S=y_p=\Phi^{-1}((p+1)/2)\). This equation can be rewritten as \(\mu_S^2=(y_p/k)^2\sigma_S^2\). Using the prior formulas for \(\mu_S\) and \(\sigma_{S}^2\) gives \((\mu_N\mu_X)^2=(y_p/k)^2(\mu_N\sigma^{2}_X+\mu^{2}_X\sigma^{2}_N)\). Dividing both sides by \(\mu_N\mu_X^2\) and reordering terms on the right side results in a full credibility standard \(n_S\) for aggregate losses

\[\begin{equation} n_S=\left(\frac{y_p}{k}\right)^2\left[\left(\frac{\sigma_N^2}{\mu_N}\right)+\left(\frac{\sigma_X}{\mu_X}\right)^2\right]=\lambda_{kp}\left[\left(\frac{\sigma_N^2}{\mu_N}\right)+\left(\frac{\sigma_X}{\mu_X}\right)^2\right]. \tag{9.5} \end{equation}\]

Example 9.2.5. The number of claims has a Poisson distribution. Individual loss amounts are independently and identically distributed with a Pareto distribution \(F(x)=1-[\theta/(x+\theta)]^{\alpha}\). The number of claims and loss amounts are independent. If observed aggregate losses should be within 5\(\%\) of the expected value with probability \(p=0.95\), how many losses are required for full credibility?

Show Example Solution

When the number of claims is Poisson distributed then equation (9.5) can be simplified using \((\sigma_N^2/\mu_N)=1\). It follows that

\[ [(\sigma_N^2/\mu_N)+(\sigma_X/\mu_X)^2]=[1+(\sigma_X/\mu_X)^2]=[(\mu_X^2+\sigma_X^2)/\mu_X^2]=\mathrm{E}(X^2)/\mathrm{E}(X)^2 \]

using the relationship \(\mu_X^2+\sigma_X^2=\mathrm{E}(X^2)\). The full credibility standard is \(n_S=\lambda_{kp}~\mathrm{E}(X^2)/\mathrm{E}(X)^2\).

The pure premium \(PP\) is equal to aggregate losses \(S\) divided by exposures \(m\): \(PP=S/m\). The full credibility standard for pure premium will require

\[ \Pr\left[(1-k)\mu_{PP}\leq PP \leq(1+k)\mu_{PP}\right] \geq p. \]

The number of exposures \(m\) is assumed fixed and not a random variable so \(\mu_{PP}=\mathrm{E}(S/m)=\mathrm{E}(S)/m=\mu_S/m\).

\[ \Pr\left[(1-k)\left(\frac{\mu_S}{m}\right)\leq \left(\frac{S}{m}\right) \leq(1+k)\left(\frac{\mu_S}{m}\right)\right] \geq p. \]

Multiplying through by \(m\) returns the bounds for losses

\[ \Pr[(1-k)\mu_S\leq S \leq(1+k)\mu_S] \geq p. \]

This means that the full credibility standard \(n_{PP}\) for the pure premium is the same as that for aggregate losses

\[ n_{PP}=n_S=\lambda_{kp}\left[\left(\frac{\sigma_N^2}{\mu_n}\right)+\left(\frac{\sigma_X}{\mu_X}\right)^2\right]. \]

9.2.3 Full Credibility for Severity

Let \(X\) be a random variable representing the size of one claim. Claim severity is \(\mu_X=\mathrm{E}(X)\). Suppose that \({X_1,X_2, \ldots, X_n}\) is a random sample of \(n\) claims that will be used to estimate claim severity \(\mu_X\). The claims are assumed to be iid. The average value of the sample is

\[ \bar{X}=\frac{1}{n}\left(X_1+X_2+\cdots+X_n\right). \]

How big does \(n\) need to be to get a good estimate? Note that \(n\) is not a random variable whereas it is in the aggregate loss model.

In Section 9.2.1 the accuracy of an estimator for frequency was defined by requiring that the number of claims lie within a specified interval about the mean number of claims with a specified probability. For severity this requirement is

\[ \Pr[(1-k)\mu_X\leq \bar{X} \leq(1+k)\mu_X ]\geq p , \]

where \(k\) and \(p\) need to be specified. Following the steps in Section 9.2.1, the mean claim severity \(\mu_X\) is subtracted from each term and the standard deviation of the claim severity estimator \(\sigma_{\bar{X}}\) is divided into each term yielding

\[ \Pr\left[\frac{-k~\mu_X}{\sigma_{\bar{X}}}\leq (\bar{X}-\mu_X)/\sigma_{\bar{X}} \leq \frac{k~\mu_X}{\sigma_{\bar{X}}}\right] \geq p . \]

As in prior sections, it is assumed that \((\bar{X}-\mu_X)/\sigma_{\bar{X}}\) is approximately normally distributed and the prior equation is satisfied if \(k\mu_X/\sigma_{\bar{X}}\geq y_p\) with \(y_p=\Phi^{-1}((p+1)/2)\). Because \(\bar{X}\) is the average of individual claims \(X_1, X_2,\dots, X_n\), its standard deviation is equal to the standard deviation of an individual claim divided by \(\sqrt{n}\): \(\sigma_{\bar{X}}=\sigma_X/\sqrt{n}\). So, \(k\mu_X/(\sigma_X/\sqrt{n})\geq y_p\) and with a little algebra this can be rewritten as \(n \geq (y_p/k)^2(\sigma_X/\mu_X)^2\). The full credibility standard for severity is

\[\begin{equation} n_X=\left(\frac{y_p}{k}\right)^2\left(\frac{\sigma_X}{\mu_X}\right)^2=\lambda_{kp}\left(\frac{\sigma_X}{\mu_X}\right)^2. \tag{9.6} \end{equation}\]

Note that the term \(\sigma_X/\mu_X\) is the coefficient of variationStandard deviation divided by the mean of a distribution, to measure variability in terms of units of the mean for an individual claim. Even though \(\lambda_{kp}\) is the full credibility standard for frequency given a Poisson distribution, there is no assumption about the distribution for the number of claims.

Example 9.2.6. Individual loss amounts are independently and identically distributed with a Type II Pareto distribution \(F(x)=1-[\theta/(x+\theta)]^{\alpha}\). How many claims are required for the average severity of observed claims to be within 5\(\%\) of the expected severity with probability \(p=0.95\)?

Show Example Solution

9.2.4 Partial Credibility

In prior sections full credibility standards were calculated for estimating frequency (\(n_f\)), pure premium (\(n_{PP}\)), and severity (\(n_X\)) - in this section these full credibility standards will be denoted by \(n_{0}\). In each case the full credibility standard was the expected number of claims required to achieve a defined level of accuracy when using empirical data to estimate an expected value. If the observed number of claims is greater than or equal to the full credibility standard then a full credibility weight \(Z=1\) is given to the data.

In limited fluctuation credibility, credibility weights \(Z\) assigned to data are

\[ Z= \left\{ \begin{array}{ll} \sqrt{n /n_{0}} &\textrm{if } n < n_{0} \\ 1 & \textrm{if } n \ge n_{0} , \end{array} \right. \]

where \(n_0\) is the full credibility standard. The quantity \(n\) is the number of claims for the data that is used to estimate the expected frequency, severity, or pure premium.

Example 9.2.7. The number of claims has a Poisson distribution. Individual loss amounts are independently and identically distributed with a Type II Pareto distribution \(F(x)=1-[\theta/(x+\theta)]^{\alpha}\). Assume that \(\alpha=3\). The number of claims and loss amounts are independent. The full credibility standard is that the observed pure premium should be within 5\(\%\) of the expected value with probability \(p=0.95\). What credibility \(Z\) is assigned to a pure premium computed from 1,000 claims?

Show Example Solution

Limited fluctuation credibility uses the formula \(Z=\sqrt{n/n_0}\) to limit the fluctuation in the credibility-weighted estimate to match the fluctuation allowed for data with expected claims at the full credibility standard. Variance or standard deviation is used as the measure of fluctuation. Next we show an example to explain why the square-root formula is used.

Suppose that average claim severity is being estimated from a sample of size \(n\) that is less than the full credibility standard \(n_0=n_X\). Applying credibility theory, the estimate \(\hat{\mu}_X\) would be

\[ \hat{\mu}_X=Z\bar{X}+(1-Z)M_X , \]

with \(\bar{X}=(X_1+X_2+\cdots+X_n)/n\) and \(iid\) random variables \(X_i\) representing the sizes of individual claims. The complement of credibility is applied to \(M_X\) which could be last year’s estimated average severity adjusted for inflation, the average severity for a much larger pool of risks, or some other relevant quantity selected by the actuary. It is assumed that the variance of \(M_X\) is zero or negligible. With this assumption

\[ \mathrm{Var}(\hat{\mu}_X)=\mathrm{Var}(Z\bar{X})=Z^2\mathrm{Var}(\bar{X})=\frac{n}{n_0}\mathrm{Var}(\bar{X}). \]

Because \(\bar{X}=(X_1+X_2+\cdots+X_n)/n\) it follows that \(\mathrm{Var}(\bar{X})=\mathrm{Var}(X_i)/n\) where random variable \(X_i\) is one claim. So,

\[ \mathrm{Var}(\hat{\mu}_X)=\frac{n}{n_0}\mathrm{Var}(\bar{X})=\frac{n}{n_0}\frac{\mathrm{Var}(X_i)}{n}=\frac{\mathrm{Var}(X_i)}{n_0}. \]

The last term is exactly the variance of a sample mean \(\bar{X}\) when the sample size is equal to the full credibility standard \(n_0=n_X\).

Show Quiz Solution

9.3 Bühlmann Credibility


In this section, you learn how to:

  • Compute a credibility-weighted estimate for the expected loss for a risk or group of risks.
  • Determine the credibility \(Z\) assigned to observations.
  • Calculate the values required in Bühlmann credibility including the Expected Value of the Process Variance (\(EPV\)), Variance of the Hypothetical Means (\(VHM\)) and collective mean \(\mu\).
  • Recognize situations when the Bühlmann model is appropriate.

A classification rating plan groups policyholders together into classes based on risk characteristics. Although policyholders within a class have similarities, they are not identical and their expected losses will not be exactly the same. An experience rating plan can supplement a class rating plan by credibility weighting an individual policyholder’s loss experience with the class rate to produce a more accurate rate for the policyholder.

In the presentation of Buhlmann credibilityA credibility method that uses the amount of experience, expected value of the process variance, and variance of the hypothetical means to determine the credibility weight it is convenient to assign a risk parameterParameter in a distribution whose value reflects the risk categorization \(\theta\) to each policyholder. Losses \(X\) for the policyholder will have a common distribution function \(F_{\theta}(x)\) with mean \(\mu(\theta)=\mathrm{E}(X|\theta)\) and variance \(\sigma^2(\theta)=\mathrm{Var}(X|\theta)\). Losses \(X\) can represent pure premiums, aggregate losses, number of claims, claim severities, or some other measure of loss for a period of time, often one year. Risk parameter \(\theta\) may be continuous or discrete and may be multivariate depending on the model.

If a policyholder with risk parameter \(\theta\) had losses \(X_1, \ldots, X_n\) during \(n\) time periods then the goal is to find \(\mathrm{E}(\mu(\theta)|X_1,\ldots, X_n)\), the conditional expectation of \(\mu(\theta)\) given \(X_1,\ldots, X_n\). The Bühlmann credibility-weighted estimate for \(\mathrm{E}(\mu(\theta)|X_1,\ldots, X_n)\) for the policyholder is

\[\begin{equation} \hat{\mu}(\theta)=Z\bar{X}+(1-Z)\mu \tag{9.7} \end{equation}\]

with

\[\begin{eqnarray*} \theta&=&\textrm{a risk parameter that identifies a policyholder's risk level}\\ \hat{\mu}(\theta)&=&\textrm{estimated expected loss for a policyholder with parameter }\theta\\ & & \textrm{and loss experience } \bar{X}\\ \bar{X}&=&(X_1+\cdots+X_n)/n \textrm{ is the average of $n$ observations of the policyholder } \\ Z&=&\textrm{credibility assigned to $n$ observations } \\ \mu&=&\textrm{the expected loss for a randomly chosen policyholder in the class.}\\ \end{eqnarray*}\]

For a selected policyholder, random variables \(X_j\) are assumed to be iid for \(j=1,\ldots,n\) because it is assumed that the policyholder’s exposure to loss is not changing through time. The quantity \(\bar{X}\) is the average of \(n\) observations and \(\mathrm{E}(\bar{X}|\theta)=\mathrm{E}(X_j|\theta)=\mu(\theta)\).

If a policyholder is randomly chosen from the class and there is no loss information about the risk then the expected loss is \(\mu=\mathrm{E}(\mu(\theta))\) where the expectation is taken over all \(\theta\)’s in the class. In this situation \(Z=0\) and the expected loss is \(\hat\mu(\theta)=\mu\) for the risk. The quantity \(\mu\) can also be written as \(\mu=\mathrm{E}(X_j)\) or \(\mu=\mathrm{E}(\bar{X})\) and is often called the overall mean or collective meanThe mean estimate of a risk when no loss information about the risk is known. Note that \(\mathrm{E}(X_j)\) is evaluated with the law of total expectationThe expected value of the conditional expected value of x given y is the same as the expected value of x: \(\mathrm{E}(X_j)=\mathrm{E}(\mathrm{E}[X_j|\theta])\).

Example 9.3.1. The number of claims \(X\) for an insured in a class has a Poisson distribution with mean \(\theta>0\). The risk parameter \(\theta\) is exponentially distributed within the class with pdfProbability density function \(f(\theta)=e^{-\theta}\). What is the expected number of claims for an insured chosen at random from the class?

Show Example Solution

In the prior example the risk parameter \(\theta\) is a random variable with an exponential distribution. In the next example there are three types of risks and the risk parameter has a discrete distribution.

Example 9.3.2. For any risk (policyholder) in a population the number of losses \(N\) in a year has a Poisson distribution with parameter \(\lambda\). Individual loss amounts \(X_i\) for a risk are independent of \(N\) and are iid with Type II Pareto distribution \(F(x)=1-[\theta/(x+\theta)]^{\alpha}\). There are three types of risks in the population as follows:

\[ \small{ \begin{array}{|c|c|c|c|} \hline \text{Risk } & \text{Percentage} & \text{Poisson} & \text{Pareto} \\ \text{Type} & \text{of Population} & \text{Parameter} & \text{Parameters} \\ \hline A & 50\% & \lambda=0.5 & \theta=1000, \alpha=2.0 \\ B & 30\% & \lambda=1.0 & \theta=1500, \alpha=2.0 \\ C & 20\% & \lambda=2.0 & \theta=2000, \alpha=2.0 \\ \hline \end{array} } \]

If a risk is selected at random from the population, what is the expected aggregate loss in a year?

Show Example Solution

What is the risk parameter for a risk (policyholder) in the prior example? One could say that the risk parameter has three components \((\lambda,\theta,\alpha)\) with possible values (0.5,1000,2.0), (1.0,1500,2.0), and (2.0,2000,2.0) depending on the type of risk.

Note that in both of the examples the risk parameter is a random quantity with its own probability distribution. We do not know the value of the risk parameter for a randomly chosen risk.

Although formula (9.7) was introduced using experience rating as an example, the Bühlmann credibility model has wider application. Suppose that a rating plan has multiple classes. Credibility formula (9.7) can be used to determine individual class rates. The overall mean \(\mu\) would be the average loss for all classes combined, \(\bar{X}\) would be the experience for the individual class, and \(\hat{\mu}(\theta)\) would be the estimated loss for the class.

9.3.1 Credibility Z, EPV, and VHM

When computing the credibility estimate \(\hat{\mu}(\theta)=Z\bar{X}+(1-Z)\mu\), how much weight \(Z\) should go to experience \(\bar{X}\) and how much weight \((1-Z)\) to the overall mean \(\mu\)? In Bühlmann credibility there are three factors that need to be considered:

  1. How much variation is there in a single observation \(X_j\) for a selected risk? With \(\bar{X}=(X_1+\cdots+X_n)/n\) and assuming that the observations are iid conditional on \(\theta\), it follows that \(\mathrm{Var}(\bar{X}|\theta)\) = \(\mathrm{Var}(X_j|\theta)/n\). For larger \(\mathrm{Var}(\bar{X}|\theta)\) less credibility weight \(Z\) should be given to experience \(\bar{X}\). The Expected Value of the Process VarianceAverage of the natural variability of observations from within each risk, abbreviated \(EPV\), is the expected value of \(\mathrm{Var}(X_j|\theta\)) across all risks:

\[ EPV = \mathrm{E}(\mathrm{Var}(X_j|\theta)). \]

Because \(\mathrm{Var}(\bar{X}|\theta)\) = \(\mathrm{Var}(X_j|\theta)/n\) it follows that \(\mathrm{E}(\mathrm{Var}(\bar{X}|\theta))=EPV/n\).

  1. How homogeneous is the population of risks whose experience was combined to compute the overall mean \(\mu\)? If all the risks are similar in loss potential then more weight \((1-Z)\) would be given to the overall mean \(\mu\) because \(\mu\) is the average for a group of similar risks whose means \(\mu(\theta)\) are not far apart. The homogeneity or heterogeneity of the population is measured by the Variance of the Hypothetical MeansVariance of the means across different classes, used to determine how similar or different the classes are from one another with abbreviation \(VHM\):

\[ VHM=\mathrm{Var}(\mathrm{E}(X_j|\theta))=\mathrm{Var}(\mathrm{E}(\bar{X}|\theta)). \]

Note that we used \(\mathrm{E}(\bar{X}|\theta)=\mathrm{E}(X_j|\theta)\) for the second equality.

  1. How many observations \(n\) were used to compute \(\bar{X}\)? A larger sample would infer a larger \(Z\).

Example 9.3.3. The number of claims \(N\) in a year for a risk in a population has a Poisson distribution with mean \(\lambda>0\). The risk parameter \(\lambda\) is uniformly distributed over the interval \((0,2)\). Calculate the \(EPV\) and \(VHM\) for the population.

Show Example Solution

The Bühlmann credibility formula includes values for \(n\), \(EPV\), and \(VHM\):

\[\begin{equation} Z=\frac{n}{n+K} \quad , \quad K =\frac{EPV}{VHM}. \tag{9.8} \end{equation}\]

If the \(VHM\) increases then \(Z\) increases. If the \(EPV\) increases then \(Z\) gets smaller. Unlike limited fluctuation credibility where \(Z=1\) when the expected number of claims is greater than the full credibility standard, \(Z\) can approach but not equal 1 as the number of observations \(n\) goes to infinity.

If you multiply the numerator and denominator of the \(Z\) formula by (\(VHM\)/\(n\)) then \(Z\) can be rewritten as

\[ Z=\frac{VHM}{VHM+(EPV/n)} . \]

The number of observations \(n\) is captured in the term (\(EPV/n\)). As shown in bullet (1) at the beginning of the section, \(\mathrm{E}(\mathrm{Var}(\bar{X}|\theta))\) = \(EPV/n\). As the number of observations get larger, the expected variance of \(\bar{X}\) gets smaller and credibility \(Z\) increases so that more weight gets assigned to \(\bar{X}\) in the credibility-weighted estimate \(\hat{\mu}(\theta)\).

Example 9.3.4. Use the law of total variance to show that \(\mathrm{Var}(\bar{X})\) = \(VHM + (EPV/n)\) and derive a formula for \(Z\) in terms of \(\bar{X}\).

Show Example Solution

The following long example and solution demonstrate how to compute the credibility-weighted estimate with frequency and severity data.

Example 9.3.5. For any risk in a population the number of losses \(N\) in a year has a Poisson distribution with parameter \(\lambda\). Individual loss amounts \(X\) for a selected risk are independent of \(N\) and are iid with exponential distribution \(F(x)=1-e^{-x/\beta}\). There are three types of risks in the population as shown below. A risk was selected at random from the population and all losses were recorded over a five-year period. The total amount of losses over the five-year period was 5,000. Use Bühlmann credibility to estimate the annual expected aggregate loss for the risk.

\[ \small{ \begin{array}{|c|c|c|c|} \hline \text{Risk } & \text{Percentage} & \text{Poisson} & \text{Exponential} \\ \text{Type} & \text{of Population} & \text{Parameter} & \text{Parameter} \\ \hline A & 50\% & \lambda=0.5 & \beta=1000 \\ B & 30\% & \lambda=1.0 & \beta=1500 \\ C & 20\% & \lambda=2.0 & \beta=2000 \\ \hline \end{array} } \]

Show Example Solution

In real world applications of Bühlmann credibility the value of \(K=EPV/VHM\) must be estimated. Sometimes a value for \(K\) is selected using judgment. A smaller \(K\) makes estimator \(\hat{\mu}(\theta)\) more responsive to actual experience \(\bar{X}\) whereas a larger \(K\) produces a more stable estimate by giving more weight to \(\mu\). Judgment may be used to balance responsiveness and stability. A later section in this chapter will discuss methods for determining \(K\) from data.

For a policyholder with risk parameter \(\theta\), Bühlmann credibility uses a linear approximation \(\hat{\mu}(\theta)=Z\bar{X}+(1-Z)\mu\) to estimate \(\mathrm{E}(\mu(\theta)|X_1,\ldots,X_n)\), the expected loss for the policyholder given prior losses \(X_1,\ldots, X_n\). We can rewrite this as \(\hat{\mu}(\theta)=a+b\bar{X}\) which makes it obvious that the credibility estimate is a linear function of \(\bar{X}\).

If \(\mathrm{E}(\mu(\theta)|X_1,\ldots,X_n)\) is approximated by the linear function \(a+b\bar{X}\) and constants \(a\) and \(b\) are chosen so that \(\mathrm{E}[(\mathrm{E}(\mu(\theta)|X_1,\ldots,X_n)-(a+b\bar{X}))^2]\) is minimized, what are \(a\) and \(b\)? The answer is \(b=n/(n+K)\) and \(a=(1-b)\mu\) with \(K=EPV/VHM\) and \(\mu=\mathrm{E}(\mu(\theta))\). More details can be found in references (Bühlmann 1967), (Bühlmann and Gisler 2005), (Klugman, Panjer, and Willmot 2012), and (Tse 2009).

Bühlmann credibility is also called least-squares credibility, greatest accuracy credibility, or Bayesian credibility.

Show Quiz Solution

9.4 Bühlmann-Straub Credibility


In this section, you learn how to:

  • Compute a credibility-weighted estimate for the expected loss for a risk or group of risks using the Bühlmann-Straub model.
  • Determine the credibility \(Z\) assigned to observations.
  • Calculate required values including the Expected Value of the Process Variance (\(EPV\)), Variance of the Hypothetical Means (\(VHM\)) and collective mean \(\mu\).
  • Recognize situations when the Bühlmann-Straub model is appropriate.

With standard Bühlmann or least-squares credibility as described in the prior section, losses \(X_1,\ldots,X_n\) arising from a selected policyholder are assumed to be iid. If the subscripts indicate year 1, year 2 and so on up to year \(n\), then the iid assumption means that the policyholder has the same exposure to loss every year. For commercial insurance this assumption is frequently violated.

Consider a commercial policyholder that uses a fleet of vehicles in its business. In year 1 there are \(m_1\) vehicles in the fleet, \(m_2\) vehicles in year 2, .., and \(m_n\) vehicles in year \(n\). The exposure to loss from ownership and use of this fleet is not constant from year to year. The annual losses for the fleet are not iid.

Define \(Y_{jk}\) to be the loss for the \(k^{th}\) vehicle in the fleet for year \(j\). Then, the total losses for the fleet in year \(j\) are \(Y_{j1}+\cdots+Y_{jm_j}\) where we are adding up the losses for each of the \(m_j\) vehicles. In the Bühlmann-Straub model it is assumed that random variables \(Y_{jk}\) are iid across all vehicles and years for the policyholder. With this assumption the means \(\mathrm{E}(Y_{jk}|\theta)=\mu(\theta)\) and variances \(\mathrm{Var}(Y_{jk}|\theta)=\sigma^2(\theta)\) are the same for all vehicles and years. The quantity \(\mu(\theta)\) is the expected loss and \(\sigma^2(\theta)\) is the variance in the loss for one year for one vehicle for a policyholder with risk parameter \(\theta\).

If \(X_j\) is the average loss per unit of exposure in year \(j\), \(X_j=(Y_{j1}+\cdots+Y_{jm_j})/m_j\), then \(\mathrm{E}(X_j|\theta)=\mu(\theta)\) and \(\mathrm{Var}(X_j|\theta)=\sigma^2(\theta)/m_j\) for a policyholder with risk parameter \(\theta\). Note that we used the fact that the \(Y_{jk}\) are iid for a given policyholder. The average loss per vehicle for the entire \(n\)-year period is

\[\begin{equation*} \bar{X}= \frac{1}{m} \sum_{j=1}^{n} m_j X_{j} \quad , \quad m=\sum_{j=1}^{n} m_j. \end{equation*}\]

It follows that E\((\bar{X}|\theta)=\mu(\theta)\) and \(\mathrm{Var}(\bar{X}|\theta)=\sigma^2(\theta)/m\) where \(\mu(\theta)\) and \(\sigma^2(\theta)\) are the mean and variance for a single vehicle for one year for the policyholder.

Example 9.4.1. Prove that \(\mathrm{Var}(\bar{X}|\theta)=\sigma^2(\theta)/m\) for a risk with risk parameter \(\theta\).

Show Example Solution

The Buhlmann-Straub credibilityAn extension of the buhlmann credibility model that allows for varying exposure by year estimate is:

\[\begin{equation}\hat{\mu}(\theta)=Z\bar{X}+(1-Z)\mu \tag{9.9} \end{equation}\]

with

\[\begin{eqnarray*} \theta&=&\textrm{a risk parameter that identifies a policyholder's risk level}\\ \hat{\mu}(\theta)&=&\textrm{estimated expected loss for one exposure for the policyholder}\\ & & \textrm{with loss experience } \bar{X}\\ \bar{X}&=& \frac{1}{m} \sum_{j=1}^{n} m_j X_j \textrm{ is the average loss per exposure for $m$ exposures.}\\ & & \textrm{$X_j$ is the average loss per exposure and $m_j$ is the number of exposures in year $j$.} \\ Z&=&\textrm{credibility assigned to $m$ exposures } \\ \mu&=&\textrm{expected loss for one exposure for randomly chosen}\\ & & \textrm{ policyholder from population.}\\ \end{eqnarray*}\]

Note that \(\hat{\mu}(\theta)\) is the estimator for the expected loss for one exposure. If the policyholder has \(m_j\) exposures then the expected loss is \(m_j\hat{\mu}(\theta)\).

In Example 9.3.4, it was shown that \(Z=\mathrm{Var}(\mathrm{E}(\bar{X}|\theta))/\mathrm{Var}(\bar{X})\) where \(\bar{X}\) is the average loss for \(n\) observations. In equation (9.9) the \(\bar{X}\) is the average loss for \(m\) exposures and the same \(Z\) formula can be used:

\[ Z=\frac{\mathrm{Var}(\mathrm{E}(\bar{X}|\theta))}{\mathrm{Var}(\bar{X})}= \frac{\mathrm{Var}(\mathrm{E}(\bar{X}|\theta))}{\mathrm{E}(\mathrm{Var}(\bar{X}|\theta))+\mathrm{Var}(\mathrm{E}(\bar{X}|\theta))}. \]

The denominator was expanded using the law of total varianceA decomposition of the variance of a random variable into conditional components. specifically, for random variables x and y on the same probability space, var(x) = e[var(y|x)] + var[e(x|y)].. As noted above \(\mathrm{E}(\bar{X}|\theta)=\mu(\theta)\) so \(\mathrm{Var}(\mathrm{E}(\bar{X}|\theta))=\mathrm{Var}(\mu(\theta))=VHM\). Because \(\mathrm{Var}(\bar{X}|\theta)=\sigma^2(\theta)/m\) it follows that \(\mathrm{E}(\mathrm{Var}(\bar{X}|\theta))=\mathrm{E}(\sigma^2(\theta))/m\) = \(EPV/m\). Making these substitutions and using a little algebra gives

\[\begin{equation} Z=\frac{m}{m+K} \quad , \quad K =\frac{EPV}{VHM}. \tag{9.10} \end{equation}\]

This is the same \(Z\) as for Bühlmann credibility except number of exposures \(m\) replaces number of years or observations \(n\).

Example 9.4.2. A commercial automobile policyholder had the following exposures and claims over a three-year period:

\[ \small{ \begin{array}{|c|c|c|} \hline \text{Year} & \text{Number of Vehicles} & \text{Number of Claims} \\ \hline 1 & 9 & 5 \\ 2 & 12 & 4 \\ 3 & 15 & 4 \\ \hline \end{array} } \]

  • The number of claims in a year for each vehicle in the policyholder’s fleet is Poisson distributed with the same mean (parameter) \(\lambda\).
  • Parameter \(\lambda\) is distributed among the policyholders in the population with pdf \(f(\lambda)=6\lambda(1-\lambda)\) with \(0<\lambda<1\).

The policyholder has 18 vehicles in its fleet in year 4. Use Bühlmann-Straub credibility to estimate the expected number of policyholder claims in year 4.

Show Example Solution

9.5 Bayesian Inference and Bühlmann Credibility


In this section, you learn how to:

  • Use Bayes Theorem to determine a formula for the expected loss of a risk given a likelihood and prior distribution.
  • Determine the posterior distributions for the gamma-Poisson and beta-binomial Bayesian models and compute expected values.
  • Understand the connection between the Bühlmann and Bayesian estimates for the gamma-Poisson and beta-binomial models.

Section 4.4 reviews Bayesian inferenceA branch of statistics that leverages bayes theorem to update the distribution as more experience becomes available and it is assumed that the reader is familiar with that material. The reader is also advised to read the Bühlmann credibility Section 9.3 in this chapter. This section will compare Bayesian inference with Bühlmann credibility and show connections between the two models.

A risk with risk parameter \(\theta\) has expected loss \(\mu(\theta)=\mathrm{E}(X|\theta)\) with random variable \(X\) representing pure premium, aggregate loss, number of claims, claim severity, or some other measure of loss during a period of time. If the risk has \(n\) losses \(X_1,\ldots, X_n\) during n separate periods of time, then these losses are assumed to be \(iid\) for the policyholder and \(\mu(\theta)=\mathrm{E}(X_i|\theta)\) for \(i=1,..,n\).

If the risk had \(n\) losses \(x_1,\ldots, x_n\) then \(\mathrm{E}(\mu(\theta)|x_1,\ldots, x_n)\) is the conditional expectation of \(\mu(\theta)\). The Bühlmann credibility formula \(\hat{\mu}(\theta)=Z\bar{X}+(1-Z)\mu\) is a linear function of \(\bar{X}=(x_1+\cdots+x_n)/n\) used to estimate \(\mathrm{E}(\mu(\theta)|x_1,\ldots,x_n)\).

The expectation \(\mathrm{E}(\mu(\theta)|x_1,\ldots,x_n)\) can be calculated from the conditional density function \(f(x|\theta)\) and the posterior distribution \(\pi(\theta|x_1,\ldots,x_n)\):

\[\begin{eqnarray*} \mathrm{E}(\mu(\theta)|x_1,\ldots,x_n)&=&\int \mu(\theta) \pi(\theta|x_1,\ldots,x_n) d\theta \\ \mu(\theta)&=&\mathrm{E}(X|\theta)=\int xf(x|\theta) dx .\\ \end{eqnarray*}\]

The posterior distribution comes from Bayes theoremA probability law that expresses conditional probability of the event a given the event b in terms of the conditional probability of the event b given the event a and the unconditional probability of a

\[\begin{equation*} \pi(\theta|x_1,\ldots,x_n)=\frac{\prod_{j=1}^{n} f(x_j|\theta)}{f(x_1,\ldots,x_n)}\pi({\theta}). \end{equation*}\]

The conditional density function \(f(x|\theta)\) and the prior distribution \(\pi(\theta)\) must be specified. The numerator \(\prod_{j=1}^{n} f(x_j|\theta)\) on the right-hand side is called the likelihood. The denominator \(f(x_1,\ldots,x_n)\) is the joint density function for \(n\) losses \(x_1,\ldots,x_n\).

9.5.1 Gamma-Poisson Model

In the Gamma-Poisson modelA statistical model that assumes the frequency of claims is poisson whose mean has a prior distribution that is a gamma distribution the number of claims \(X\) has a Poisson distribution \(\Pr(X=x|\lambda)=\lambda^xe^{-\lambda}/x!\) for a risk with risk parameter \(\lambda\). The prior distribution for \(\lambda\) is gamma with \(\pi(\lambda)=\beta^\alpha\lambda^{\alpha-1}e^{-\beta\lambda}/\Gamma(\alpha)\). (Note that a rate parameter \(\beta\) is being used in the gamma distribution rather than a scale parameter.) The mean of the gamma is \(\mathrm{E}(\lambda)=\alpha/\beta\) and the variance is \(\mathrm{Var}(\lambda)=\alpha/\beta^2\). In this section we will assume that \(\lambda\) is the expected number of claims per year though we could have chosen another time interval.

If a risk is selected at random from the population then the expected number of claims in a year is \(\mathrm{E}(N)=\mathrm{E}(\mathrm{E}[N|\lambda])\) = \(\mathrm{E}(\lambda)=\alpha/\beta\). If we had no observations for the selected risk then the expected number of claims for the risk is \(\alpha/\beta\).

During \(n\) years the following number of claims by year was observed for the randomly selected risk: \(x_1,\ldots,x_n\). From Bayes theorem the posterior distribution is

\[ \pi(\lambda|x_1,\ldots,x_n)=\frac{\prod_{j=1}^{n} (\lambda^{x_j}e^{-\lambda}/x_j!)}{\Pr(X_1=x_1,\ldots,X_n=x_n)}\beta^\alpha\lambda^{\alpha-1}e^{-\beta\lambda}/\Gamma(\alpha). \]

Combining terms that have a \(\lambda\) and putting all other terms into constant \(C\) gives

\[\begin{equation*} \pi(\lambda|x_1,\ldots,x_n)=C\lambda^{(\alpha+\sum_{j=1}^{n}x_j)-1}e^{-(\beta+n)\lambda}. \end{equation*}\]

This is a gamma distribution with parameters \(\alpha'=\alpha+\sum_{j=1}^{n}x_j\) and \(\beta'=\beta+n\). The constant must be \(C={\beta'}^{\alpha'}/\Gamma(\alpha')\) so that \(\int_{0}^{\infty}\pi(\lambda|x_1,\ldots,x_n) d\lambda=1\) though we do not need to know \(C\). As explained in Chapter 4 the gamma distribution is a conjugate prior for the Poisson distribution so the posterior distribution is also gamma. See also Appendix Section 16.3.2.

Because the posterior distribution is gamma the expected number of claims for the selected risk is

\[\begin{equation*} \mathrm{E}(\lambda|x_1,\ldots,x_n) = \frac{\alpha+\sum_{j=1}^{n}x_j}{\beta+n}=\frac{\alpha + \textrm{number of claims}}{\beta+\textrm{number of years}}. \end{equation*}\]

This formula is slightly different from Chapter 4 because parameter \(\beta\) is multiplied by \(\lambda\) in the exponential of the gamma pdf whereas in Chapter 4 \(\lambda\) is divided by parameter \(\theta\). We have chosen this form for the exponential to simplify the equation for the expected number of claims.

Now we will compute the Bühlmann credibility estimate for the gamma-Poisson model. The variance for a Poisson distribution with parameter \(\lambda\) is \(\lambda\) so \(EPV=\mathrm{E}(\mathrm{Var}(X|\lambda))=\mathrm{E}(\lambda)=\alpha/\beta\). The mean number of claims per year for the risk is \(\lambda\) so \(VHM=\mathrm{Var}(\mathrm{E}(X|\lambda))\) = \(\mathrm{Var}(\lambda)=\alpha/\beta^2\). The credibility parameter is \(K=EPV/VHM\) = \((\alpha/\beta)/(\alpha/\beta^2)=\beta\). The overall mean is \(\mathrm{E}(\mathrm{E}(X|\lambda))=\mathrm{E}(\lambda)=\alpha/\beta\). The sample mean is \(\bar{X}=(\sum_{j=1}^{n}x_j)/n\). The credibility-weighted estimate for the expected number of claims for the risk is

\[ \hat{\mu}=\frac{n}{n+\beta}\frac{\sum_{j=1}^{n}x_j}{n} +\left(1-\frac{n}{n+\beta}\right)\frac{\alpha}{\beta}=\frac{\alpha+\sum_{j=1}^{n}x_j}{\beta+n}. \]

For the gamma-Poisson model the Bühlmann credibility estimate matches the Bayesian analysis result.

9.5.2 Beta-Binomial Model

The Beta-Binomial modelA statistical model for modeling the probability of an event using the binomial distribution with a probability that has a prior distribution from a beta distribution is useful for modeling the probability of an event. Assume that random variable \(X\) is the number of successes in \(n\) trials and that \(X\) has a binomial distribution \(\Pr(X=x|p)=\binom{n}{x}p^x(1-p)^{n-x}\). In the beta-binomial model the prior distribution for probability \(p\) is a beta distribution with pdf

\[\begin{equation*} \pi(p)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}p^{\alpha-1}(1-p)^{\beta-1} , \quad 0<p<1, \alpha>0, \beta>0. \end{equation*}\]

The posterior distribution for \(p\) given an outcome of \(x\) successes in \(n\) trials is

\[\begin{equation*} \pi(p|x)=\frac{\binom{n}{x}p^x(1-p)^{n-x}}{\Pr(x)}\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}p^{\alpha-1}(1-p)^{\beta-1}. \end{equation*}\]

Combining terms that have a \(p\) and putting everything else into the constant \(C\) yields

\[\begin{equation*} \pi(p| x)=Cp^{\alpha+x-1}(1-p)^{\beta+(n-x)-1}. \end{equation*}\]

This is a beta distribution with new parameters \(\alpha^\prime=\alpha+x\) and \(\beta^\prime=\beta+(n-x)\). The constant must be

\[\begin{equation*} C=\frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+x)\Gamma(\beta+n-x)}. \end{equation*}\]

The mean for the beta distribution with parameters \(\alpha\) and \(\beta\) is \(\mathrm{E}(p)=\alpha/(\alpha+\beta)\). Given \(x\) successes in \(n\) trials in the beta-binomial model the mean of the posterior distribution is

\[\begin{equation*} \mathrm{E}(p|x)=\frac{\alpha+x}{\alpha+\beta+n}. \end{equation*}\]

As the number of trials \(n\) and successes \(x\) increase, the expected value of \(p\) approaches \(x/n\).

The Bühlmann credibility estimate for \(\mathrm{E}(p|x)\) is exactly as the same as the Bayesian estimate as demonstrated in the following example.

Example 9.5.1 The probability that a coin toss will yield heads is \(p\). The prior distribution for probability \(p\) is beta with parameters \(\alpha\) and \(\beta\). On \(n\) tosses of the coin there were exactly \(x\) heads. Use Bühlmann credibility to estimate the expected value of \(p\).

Show Example Solution

9.5.3 Exact Credibility

As demonstrated in the prior section, the Bühlmann credibility estimates for the gamma-Poisson and beta-binomial models exactly match the Bayesian analysis results. The term exact credibilityA situation where the bayesian credibility estimate matches that of the buhlmann credibility estimate is applied in these situations. Exact credibility may occur if the probability distribution for \(X_j\) is in the linear exponential family and the prior distribution is a conjugate prior. Besides these two models, examples of exact credibility also include Gamma-Exponential and Normal-Normal models.

It is also noteworthy that if the conditional mean \(\mathrm{E}(\mu(\theta)|X_1,...,X_n)\) is linear in the past observations, then the Bühlmann credibility estimate will coincide with the Bayesian estimate. More information about exact credibility can be found in (Bühlmann and Gisler 2005), (Klugman, Panjer, and Willmot 2012), and (Tse 2009).

9.6 Estimating Credibility Parameters


In this section, you learn how to:

  • Perform nonparametric estimation with the Bühlmann and Bühlmann-Straub credibility models.
  • Identify situations when semiparametric estimation is appropriate.
  • Use data to approximate the \(EPV\) and \(VHM\).
  • Balance credibility-weighted estimates.

The examples in this chapter have provided assumptions for calculating credibility parameters. In actual practice the actuary must use real world data and judgment to determine credibility parameters.

9.6.1 Full Credibility Standard for Limited Fluctuation Credibility

Limited-fluctuation credibility requires a full credibility standard. The general formula for aggregate losses or pure premium, as obtained in formula (9.5), is

\[ n_S=\left(\frac{y_p}{k}\right)^2\left[\left(\frac{\sigma_N^2}{\mu_N}\right)+\left(\frac{\sigma_X}{\mu_X}\right)^2\right] , \]

with \(N\) representing number of claims and \(X\) the size of claims. If one assumes \(\sigma_X=0\) then the full credibility standard for frequency results. If \(\sigma_N=0\) then the full credibility formula for severity follows. Probability \(p\) and \(k\) value are often selected using judgment and experience.

In practice it is often assumed that the number of claims is Poisson distributed so that \(\sigma_N^2/\mu_N=1\). In this case the formula can be simplified to

\[\begin{equation*} n_S=\left(\frac{y_p}{k}\right)^2\left[\frac{\mathrm{E}(X^2)}{(\mathrm{E}(X))^2}\right]. \end{equation*}\]

An empirical mean and second moment for the sizes of individual claim losses can be computed from past data, if available.

9.6.2 Nonparametric Estimation for Bühlmann and Bühlmann-Straub Models

Bayesian analysis as described previously requires assumptions about a prior distribution and likelihood. It is possible to produce estimates without these assumptions and these methods are often referred to as empirical Bayes methodsCredibility methods that estimate the credibility weight without using any assumptions about prior distributions or likelihoods, instead relying only on empirical data. Bühlmann and Bühlmann-Straub credibility with parameters estimated from the data are included in the category of empirical Bayes methods.

Bühlmann Model First we will address the simpler Bühlmann model. Assume that there are \(r\) risks in a population. For risk \(i\) with risk parameter \(\theta_i\) the losses for \(n\) periods are \(X_{i1},\ldots, X_{in}\). The losses for a given risk are iid across periods as assumed in the Bühlmann model. For risk \(i\) the sample mean is \(\bar{X}_i=\sum_{j=1}^{n}X_{ij}/n\) and the unbiased sample process variance is \(s_i^2=\sum_{j=1}^{n}(X_{ij}-\bar{X}_i)^2/(n-1)\). An unbiased estimator for the \(EPV\) can be calculated by taking the average of \(s_i^2\) for the \(r\) risks in the population:

\[\begin{equation} \widehat{EPV}=\frac{1}{r}\sum_{i=1}^{r} s_i^2 = \frac{1}{r(n-1)} \sum_{i=1}^{r} \sum_{j=1}^{n}(X_{ij}-\bar{X}_i)^2 . \tag{9.11} \end{equation}\]

The individual risk means \(\bar{X}_i\) for \(i=1,\ldots, r\) can be used to estimate the \(VHM\). An unbiased estimator of Var(\(\bar{X}_i\)) is

\[\begin{equation*} \widehat{\mathrm{Var}}(\bar{X}_i)=\frac{1}{r-1} \sum_{i=1}^{r}(\bar{X}_i-\bar{X})^2 \textrm{ and } \bar{X}=\frac{1}{r}\sum_{i=1}^{r} \bar{X}_i, \end{equation*}\]

but Var(\(\bar{X}_i\)) is not the \(VHM\). Using equation (16.2), the total variance formula or unconditional variance formula is

\[ \mathrm{Var}(\bar{X}_i)=\textrm{E(Var}(\bar{X}_i|\Theta=\theta_i))+\textrm{Var(E}(\bar{X}_i|\Theta=\theta_i)). \]

The \(VHM\) is the second term on the right because \(\mu(\theta_i)=\mathrm{E}(\bar{X}_i|\Theta=\theta_i)\) is the hypothetical mean for risk \(i\). So,

\[\begin{equation*} VHM=\textrm{Var(E}(\mu(\theta_i)) = \mathrm{Var}(\bar{X}_i) - \textrm{E(Var}(\bar{X}_i|\Theta=\theta_i)). \end{equation*}\]

As discussed previously in Section 9.3.1, \(EPV/n\) = \(\mathrm{E}(\mathrm{Var}[\bar{X}_i|\Theta=\theta_i])\) and using the above estimators gives an unbiased estimator for the \(VHM\):

\[\begin{equation} \widehat{VHM} = \frac{1}{r-1} \sum_{i=1}^{r}(\bar{X}_i-\bar{X})^2 - \frac{\widehat{EPV}}{n} . \tag{9.12} \end{equation}\]

Although the expected loss for a risk with parameter \(\theta_i\) is \(\mu(\theta_i)\)=\(\mathrm{E}(\bar{X}_i|\Theta=\theta_i\)), the variance of the sample mean \(\bar{X}_i\) is greater than or equal to the variance of the hypothetical means: \(\mathrm{Var}(\bar{X}_i)\geq\)Var(\(\mu(\theta_i)\)). The variance in the sample means \(\mathrm{Var}(\bar{X}_i\)) includes both the variance in the hypothetical means plus a process variance term.

In some cases formula (9.12) can produce a negative value for \(\widehat{VHM}\) because of the subtraction of \(\widehat{EPV}/n\), but a variance cannot be negative. The process variance within risks is so large that it overwhelms the measurement of the variance in means between risks. In this case we cannot use this method to determine the values needed for Bühlmann credibility.

Example 9.6.1. Two policyholders had claims over a three-year period as shown in the table below. Estimate the expected number of claims for each policyholder using Bühlmann credibility and calculating necessary parameters from the data.

\[ \small{ \begin{array}{|c|c|c|} \hline \text{Year} & \text{Risk A} & \text{Risk B} \\ \hline 1 & 0 & 2 \\ 2 & 1 & 1 \\ 3 & 0 & 2 \\ \hline \end{array} } \]

Show Example Solution

Example 9.6.2. Two policyholders had claims over a three-year period as shown in the table below. Calculate the nonparametric estimate for the \(VHM\).

\[ \small{ \begin{array}{|c|c|c|} \hline \text{Year} & \text{Risk A} & \text{Risk B} \\ \hline 1 & 3 & 3 \\ 2 & 0 & 0 \\ 3 & 0 & 3 \\ \hline \end{array} } \]

Show Example Solution

Bühlmann-Straub Model Empirical formulas for \(EPV\) and \(VHM\) in the Bühlmann-Straub model are more complicated because a risk’s number of exposures can change from one period to another. Also, the number of experience periods does not have to be constant across the population. First some definitions:

  • \(X_{ij}\) is the losses per exposure for risk \(i\) in period \(j\). Losses can refer to number of claims or amount of loss. There are \(r\) risks so \(i=1,\ldots,r\).
  • \(n_i\) is the number of observation periods for risk \(i\)
  • \(m_{ij}\) is the number of exposures for risk \(i\) in period \(j\) for \(j=1,\ldots,n_i\)

Risk \(i\) with risk parameter \(\theta_i\) has \(m_{ij}\) exposures in period \(j\) which means that the losses per exposure random variable can be written as \(X_{ij}=(Y_{i1}+\cdots+Y_{im_{ij}})/m_{ij}\). Random variable \(Y_{ik}\) is the loss for one exposure. For risk \(i\) losses \(Y_{ik}\) are iid with mean \(\mathrm{E}(Y_{ik})=\mu(\theta_i)\) and process variance \(\mathrm{Var}(Y_{ik}\)) = \(\sigma^2(\theta_i)\). It follows that \(\mathrm{Var}(X_{ij})\) = \(\sigma^2(\theta_i)/m_{ij}\).

Two more important definitions are:

  • \(\bar{X}_i=\frac{1}{m_i}\sum_{j=1}^{n_i} m_{ij}X_{ij}\) with \(m_i = \sum_{j=1}^{n_i} m_{ij}\). \(\bar{X}_i\) is the average loss per exposure for risk \(i\) for all observation periods combined.
  • \(\bar{X}=\frac{1}{m}\sum_{i=1}^{r} m_i \bar{X}_i\) with \(m=\sum_{i=1}^r m_i\). \(\bar{X}\) is the average loss per exposure for all risks for all observation periods combined.

An unbiased estimator for the process variance \(\sigma^2(\theta_i)\) of one exposure for risk \(i\) is

\[\begin{equation*} {s_i}^2=\frac{\sum_{j=1}^{n_i} m_{ij}(X_{ij}-\bar{X}_i)^2}{n_i-1}. \end{equation*}\]

The weights \(m_{ij}\) are applied to the squared differences because the \(X_{ij}\) are the averages of \(m_{ij}\) exposures. The weighted average of the sample variances \({s_i}^2\) for each risk \(i\) in the population with weights proportional to the number of \((n_i-1)\) observation periods will produce the expected value of the process variance (\(EPV\)) estimate

\[\begin{equation*} \widehat{EPV}=\frac{\sum_{i=1}^r (n_i-1){s_i}^2}{\sum_{i=1}^r (n_i-1)}=\frac{\sum_{i=1}^r \sum_{j=1}^{n_i} m_{ij}(X_{ij}-\bar{X}_i)^2}{\sum_{i=1}^r (n_i-1)}. \end{equation*}\]

The quantity \(\widehat{EPV}\) is an unbiased estimator for the expected value of the process variance of one exposure for a risk chosen at random from the population.

To calculate an estimator for the variance in the hypothetical means (\(VHM\)) the squared differences of the individual risk sample means \(\bar{X}_i\) and population mean \(\bar{X}\) are used. An unbiased estimator for the \(VHM\) is

\[\begin{equation*} \widehat{VHM}=\frac{\sum_{i=1}^r m_i(\bar{X}_i-\bar{X})^2 - (r-1)\widehat{EPV}}{m-\frac{1}{m}\sum_{i=1}^r m_i^2}. \end{equation*}\]

This complicated formula is necessary because of the varying number of exposures. Proofs that the \(EPV\) and \(VHM\) estimators shown above are unbiased can be found in several references mentioned at the end of this chapter including (Bühlmann and Gisler 2005), (Klugman, Panjer, and Willmot 2012), and (Tse 2009).

Example 9.6.3. Two policyholders had claims shown in the table below. Estimate the expected number of claims per vehicle for each policyholder using Bühlmann-Straub credibility and calculating parameters from the data.

\[ \small{ \begin{array}{|c|c|c|c|c|c|} \hline \text{Policyholder} & & \text{Year 1} & \text{Year 2} & \text{Year 3} & \text{Year 4} \\ \hline \text{A} & \text{Number of claims} & 0 & 2 & 2 & 3 \\ \hline \text{A} & \text{Insured vehicles} & 1 & 2 & 2 & 2\\ \hline & & & & & \\ \hline \text{B} & \text{Number of claims} & 0 & 0 & 1 & 2\\ \hline \text{B} & \text{Insured vehicles} & 0 & 2 & 3 & 4\\ \hline \end{array} } \]

Show Example Solution

9.6.3 Semiparametric Estimation for Bühlmann and Bühlmann-Straub Models

In the prior section on nonparametric estimationStatistical method that allows the functional form of a fit from data to have no assumed prior distribution, constraints, or parameters, there were no assumptions about the distribution of the losses per exposure \(X_{ij}\). Assuming that the \(X_{ij}\) have a particular distribution and using properties of the distribution along with the data to determine credibility parameters is referred to as semiparametric estimationCredibility method that assumes a distribution for the loss per exposure random variable and otherwise uses empirical data.

An example of semiparametric estimation would be the assumption of a Poisson distribution when estimating claim frequencies. The Poisson distribution has the property that the mean and variance are identical and this property can simplify calculations. The following simple example comes from the prior section but now includes a Poisson assumption about claim frequencies.

Example 9.6.4. Two policyholders had claims over a three-year period as shown in the table below. Assume that the number of claims for each risk has a Poisson distribution. Estimate the expected number of claims for each policyholder using Bühlmann credibility and calculating necessary parameters from the data.

\[ \small{ \begin{array}{|c|c|c|} \hline \text{Year} & \text{Risk A} & \text{Risk B} \\ \hline 1 & 0 & 2 \\ 2 & 1 & 1 \\ 3 & 0 & 2 \\ \hline \end{array} } \]

Show Example Solution

Although we assumed that the number of claims for each risk was Poisson distributed in the prior example, we did not need this additional assumption because there was enough information to use nonparametric estimation. In fact, the Poisson assumption might not be appropriate because for risk B the sample mean is not equal to the sample variance: \(\bar{x}_B=\frac{5}{3}\neq s_B^2=\frac{1}{3}\).

The following example is commonly used to demonstrate a situation where semiparametric estimation is needed. There is insufficient information for nonparametric estimation but with the Poisson assumption, estimates can be calculated.

Example 9.6.5. A portfolio of 2,000 policyholders generated the following claims profile during a five-year period:

\[ \small{ \begin{array}{|c|c|} \hline \text{Number of Claims} & \\ \text{In 5 Years} & \text{Number of policies}\\ \hline 0 & 923 \\ 1 & 682 \\ 2 & 249 \\ 3 & 70 \\ 4 & 51 \\ 5 & 25 \\ \hline \end{array} } \]

In your model you assume that the number of claims for each policyholder has a Poisson distribution and that a policyholder’s expected number of claims is constant through time. Use Bühlmann credibility to estimate the annual expected number of claims for policyholders with 3 claims during the five-year period.

Show Example Solution

9.6.4 Balancing Credibility Estimators

The credibility weighted model \(\hat{\mu}(\theta_i)=Z_i\bar{X}_i+(1-Z_i)\bar{X}\), where \(\bar{X}_i\) is the loss per exposure for risk \(i\) and \(\bar{X}\) is loss per exposure for the population, can be used to estimate the expected loss for risk \(i\). The overall mean is \(\bar{X}=\sum_{i=1}^r(m_i/m) \bar{X}_i\) where \(m_i\) and \(m\) are number of exposures for risk \(i\) and population, respectively.

For the credibility weighted estimators to be in balance we want

\[ \bar{X}=\sum_{i=1}^r(m_i/m) \bar{X}_i=\sum_{i=1}^r(m_i/m) \hat{\mu}(\theta_i). \]

If this equation is satisfied then the estimated losses for each risk will add up to the population total, an important goal in ratemaking, but this may not happen if the complement of credibility is applied to \(\bar{X}\).

To achieve balance, we will set \(\hat{M}_X\) as the amount that is applied to the complement of credibility and thus analyze the following equation:

\[ \sum_{i=1}^r(m_i/m) \bar{X}_i=\sum_{i=1}^r(m_i/m) \left\{Z_i\bar{X}_i+(1-Z_i) \cdot \hat{M}_X\right\} . \]

A little algebra gives

\[ \sum_{i=1}^r m_i \bar{X}_i=\sum_{i=1}^r m_i Z_i\bar{X}_i + \hat{M}_X\sum_{i=1}^r m_i(1-Z_i), \]

and

\[ \hat{M}_X=\frac{\sum_{i=1}^r m_i(1-Z_i)\bar{X}_i}{\sum_{i=1}^r m_i(1-Z_i)}. \]

Using this value for \(\hat{M}_X\) will bring the credibility weighted estimators into balance.

If credibilities \(Z_i\) were computed using the Bühlmann-Straub model, then \(Z_i=m_i/(m_i+K)\). The prior formula can be simplified using the following relationship

\[ m_i(1-Z_i)=m_i\left(1-\frac{m_i}{m_i+K}\right)=m_i\left(\frac{(m_i+K)-m_i}{m_i+K}\right)=KZ_i . \]

Therefore, an amount when applied to the complement of credibility that will bring the credibility-weighed estimators into balance with the overall mean loss per exposure is

\[ \hat{M}_X=\frac{\sum_{i=1}^r Z_i \bar{X}_i}{\sum_{i=1}^r Z_i}. \]

Example 9.6.6. An example from the nonparametric Bühlmann-Straub section had the following data for two risks. Find the amount associated with the complement of credibility, \(\hat{M}_X\), that will produce credibility-weighted estimates that are in balance.

\[ \small{ \begin{array}{|c|c|c|c|c|c|} \hline \text{Policyholder} & & \text{Year 1} & \text{Year 2} & \text{Year 3} & \text{Year 4} \\ \hline \text{A} & \text{Number of claims} & 0 & 2 & 2 & 3 \\ \hline \text{A} & \text{Insured vehicles} & 1 & 2 & 2 & 2\\ \hline & & & & & \\ \hline \text{B} & \text{Number of claims} & 0 & 0 & 1 & 2\\ \hline \text{B} & \text{Insured vehicles} & 0 & 2 & 3 & 4\\ \hline \end{array} } \]

Show Example Solution

Show Quiz Solution

9.7 Further Resources and Contributors

Exercises

Here are a set of exercises that guide the viewer through some of the theoretical foundations of Loss Data Analytics. Each tutorial is based on one or more questions from the professional actuarial examinations, typically the Society of Actuaries Exam C/STAM.

Credibility Guided Tutorials

Contributors

  • Gary Dean, Ball State University is the author of the initial version of this chapter. Email: for chapter comments and suggested improvements.
  • Chapter reviewers include: Liang (Jason) Hong, Ambrose Lo, Ranee Thiagarajah, Hongjuan Zhou.

Bibliography

Bühlmann, Hans. 1967. “The Complement of Credibility,” 199–207.

Bühlmann, Hans, and Alois Gisler. 2005. A Course in Credibility Theory and Its Applications. ACTEX Publications.

Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models: From Data to Decisions. John Wiley & Sons.

Tse, Yiu-Kuen. 2009. Nonlife Actuarial Models: Theory, Methods and Evaluation. Cambridge University Press.