1. Introduction
First, we congratulate Dr. Olivier Le Courtois for this interesting contribution to credibility theory. One of us (Liang) was fortunate enough to meet Dr. Le Courtois and learn about the key results of his paper during the session “Credibility Analysis: Theory, Practice, and Evolution” at the 2016 Joint Statistical Meeting in Chicago, sponsored by both the Society of Actuaries and the American Statistical Association. It is a pleasure to see that the paper is finally in print now.
The classical credibility theory circumvents the challenge of finding the bona fide Bayesian estimate (with respect to the square loss) by restricting attention to the class of linear estimators of data. See, for example, Bühlmann and Gisler (2005) and Klugman, Panjer, and Willmot (2008) for a detailed treatment. Though it is simple to implement and easy to interpret, the classical credibility theory basically guarantees accurate estimation (i.e., exact credibility) only under fairly restricted assumptions such as exponential family models with conjugate priors (Diaconis and Ylvisaker 1979). Therefore, it is natural to seek alternative and more general methods for estimating the mean loss. One such approach is to consider the best quadratic estimator of data, as Le Courtois has done with his credibility proposal.
In this note we provide three comments. The first shows how Le Courtois’s Proposition 1.1 can be simultaneously extended and the proof simplified; the second discusses what actuaries can do beyond the classical credibility theory; and the third poses several open problems. To be clear, in no way are any of these comments meant to be critical. Instead, it is our hope that these remarks will stimulate further discussion and developments along this line and ultimately benefit the actuarial science community in general.
2. Extending and simplifying Proposition 1.1
The proof of Proposition 1.1 bears resemblance to its counterpart in the classical theory. That is, the paper applies differential calculus to obtain the solution for a constrained optimization problem. The author is already aware that a short proof exists, and mentions the Hilbert space approach (e.g., Shiu and Sing 2004) in his final remark. While the Hilbert space approach is arguably more elegant, it does put more technical burden on the reader. But an alternative proof, which is both short and elementary, is available too. Indeed, set
\[\left\{ \begin{array}{ll} Y_i=X_i, & \hbox{$i=1, \ldots, n$,} \\ Y_i=X_i^2, & \hbox{$i=n+1, \ldots, 2n$,}\\ Y_{2n+1}=X_{n+1} \end{array} \right.\]
so that the proposed quadratic estimator Le Courtois (2021) follow immediately the following normal equations in the classical credibility theory (Klugman, Panjer, and Willmot 2008, 583):
is transformed to a linear estimator Then Equations (30) and (32) in\[\begin{align} E(X_{n+1}) &= \widehat{a}_0+\sum_{i1}^n \widehat{a}_iE(X_i), \\ Cov(X_k, X_{n+1}) &= \sum_{i=1}^n\widehat{a}_iCov(X_k, X_i), \quad k=1, \ldots, n,\end{align}\]
where Le Courtois (2021).
is the classical credibility estimator of and are the values of that minimize the corresponding mean squared error. For the remainder of the proof, one can proceed as inThis technique can also be applied to explore higherorder polynomial functions of the data as estimators. Even more generally, given an appropriately rich dictionary of basis functions
the estimator\[\gamma_0 + \sum_{j=1}^J \sum_{i=1}^n \gamma_{ij} \phi_j(X_i)\]
is linear in the data Shiu and Sing (2004), since the results of their Section 2.2 rely heavily on the assumption that the estimator is linear in data.
In any case, such transformations are needed if one wants to employ the Hilbert space approach in3. Beyond the credibility approximations
The preceding discussion shows that very complex functions of the data can be considered as linear estimators after a suitable transformation. In this sense, the classical credibility theory essentially covers any basistype approximation, though the conditions under which exact credibility occurs remain unknown. When Hans Bühlmann first proposed the classical theory in the 1960s, neither the computer technology nor the field of computational statistics was ready for a full Bayesian analysis. With the rapid advances in both fields in the last two to three decades, actuaries are now equipped with the tools needed to build more powerful predictive models beyond what credibility theory has to offer. In particular, actuaries can now seek genuine Bayesian estimates, instead of linear approximations. For example, Hong and Martin (2017) propose a Dirichlet process mixture lognormal model for predicting future insurance claims. When the underlying loss distribution has an arbitrary absolutely continuous density function, this Bayesian nonparametric model is able to produce a full predictive distribution from which the Bayesian premium (i.e., predictive mean) and any other feature of interest can be read off.
Example 1. Here we consider a simulated data example similar to Example 20.29 of Klugman, Panjer, and Willmot (2008). That is, claim amounts are assumed to follow the inverse gamma distribution with parameter and unknown scale parameter The prior distribution for is taken to be the gamma distribution with shape parameter 0.5 and scale parameter 100. For our simulation setting, we consider five different values for the true corresponding to the 10th, 25th, 50th, 75th, and 90th percentiles of the gamma prior. For each of these values we simulate a sample of size from the stated inverse gamma and calculate the Bayesian premium credibility premium and Dirichlet process mixture premium Table 1 gives the estimates and shows that the Dirichlet process mixture premium outperforms the oracle credibility premium. In practice, actuaries will not know the underlying loss distribution, so the credibility premium is subject to further bias from potential model misspecification. However, as Hong and Martin (2018) argue, the Dirichlet process mixture model is more robust and is less susceptible to model misspecification risk.
In addition, the credibility approach provides only a point estimator and ignores other features of the loss distribution. But the Bayesian nonparametric approach allows actuaries to obtain information about these features. For example, Figure 1 shows the plot of the predictive density function along with the histogram of the simulated data for More than that, actuaries can also obtain other interesting quantities such as mean, variance, valueatrisk, and conditional tail expectation; see Hong and Martin (2017) for more details. The R code for this example is available upon request from the first author.
Our conversations with several industry actuaries strike us that it is crucial to have a relatively simple implementation procedure available for the Dirichlet process mixture model so that insurance companies and regulators might be willing to entertain it. Toward that goal, Hong and Martin (2019) propose an easytoimplement algorithm that does not require the user to know or use Markov chain Monte Carlo methods. It is our hope that in the future practicing actuaries will use this powerful method.
4. Several open problems in the credibility theory
Though powerful alternative methods are now available for obtaining genuine Bayesian estimates, the classical credibility theory remains a cornerstone of actuarial science. There still exist many interesting open problems. Here are a few:

Goel (1982) conjectures that if the (posterior) predictive mean is a linear function of the data then the marginal distribution of must belong to the exponential family. He imposed the restriction that the distribution function of be of the form \[ℱ=\{F_{\theta}(\cdot)\mid \theta\in \Theta\}, \Theta\subset \mathbb{R},\nonumber\] which excludes the Dirichlet process prior as a negative answer. Note that Theorem 1 in Landsman and Makov (1998) does not really give a negative answer to this conjecture because they assume the dispersion parameter is known. Hence, their loss distribution is essentially a member in the oneparameter exponential family (also called linear exponential family).

Goel (1982) also asks whether a linear form of the predictive mean implies that the sample mean is a sufficient statistic.

Regardless of whether the aforementioned conjecture is true, it remains unknown what class of distributions is implied by a linear predictive mean and how big this family is.

It is well known that the exponential family with conjugate priors implies exact credibility. Under which conditions will
credibility be exact too? How about a general degree polynomial approximation to the Bayesian estimate? 
A surprising fact in mathematics says that the class of differentiable functions is “dust” in the “universe” of all functions (e.g., Theorem 25.2 in Willard 1970). Can we say the same for the exponential family with conjugate priors relative to arbitrary loss distributions with arbitrary priors?
5. Conclusion
We congratulate Dr. Le Courtois again on this interesting extension of the classical credibility theory. Though the classical credibility theory has been investigated for several decades, many interesting open problems remain unsettled. In addition, Bayesian nonparametric models are now available to actuaries for more accurate and efficient prediction. We hope that this discussion stimulates more interest in both directions.
Acknowledgments
We thank Timothy Wheeler, FCAS, for a fruitful discussion on the challenges academic actuaries must face to make their models more accessible to practicing actuaries and regulators.