Novel approach to incorporate information about recessive lethal genes increases the accuracy of genomic prediction for mortality traits

Play all audios:

ABSTRACT The genetic underpinnings of calf mortality can be partly polygenic and partly due to deleterious effects of recessive lethal alleles. Prediction of the genetic merits of selection

candidates should thus take into account both genetic components contributing to calf mortality. However, simultaneously modeling polygenic risk and recessive lethal allele effects in

genomic prediction is challenging due to effects that behave differently. In this study, we present a novel approach where mortality risk probabilities from polygenic and lethal allele

components are predicted separately to compute the total risk probability of an individual for its future offspring as a basis for selection. We present methods for transforming genomic

estimated breeding values of polygenic effect into risk probabilities using normal density and cumulative distribution functions and show computations of risk probability from recessive

lethal alleles given sire genotypes and population recessive allele frequencies. Simulated data were used to test the novel approach as implemented in probit, logit, and linear models. In

the simulation study, the accuracy of predicted risk probabilities was computed as the correlation between predicted mortality probabilities and observed calf mortality for validation sires.

The results indicate that our novel approach can greatly increase the accuracy of selection for mortality traits compared with the accuracy of predictions obtained without distinguishing

polygenic and lethal gene effects. SIMILAR CONTENT BEING VIEWED BY OTHERS INCLUDING DOMINANCE EFFECTS IN THE PREDICTION MODEL THROUGH LOCUS-SPECIFIC WEIGHTS ON HETEROZYGOUS GENOTYPES CAN

GREATLY IMPROVE GENOMIC PREDICTIVE ABILITIES Article Open access 05 February 2022 IDENTIFICATION OF CANDIDATE LETHAL HAPLOTYPES AND GENOMIC ASSOCIATION WITH POST-NATAL MORTALITY AND

REPRODUCTIVE TRAITS IN NELLORE CATTLE Article Open access 27 June 2023 DISSECTION OF X CHROMOSOME DOSAGE COMPENSATION FOR QUANTITATIVE TRAITS IN SHEEP USING DIFFERENT STATISTICAL MODELS

Article Open access 05 December 2024 INTRODUCTION Important fractions of deleterious mutations segregating in diploid organisms are recessive alleles that cause fatal effects when present in

a homozygous state. In dairy cattle breeds, intensive use of a limited number of elite breeding sires through artificial insemination has led to the spread of recessive lethal alleles in

populations (e.g., Shuster et al. 1992; Agerholm et al. 2001). Recessive lethal alleles are of considerable economic consequence in the dairy industry because they cause calf and young stock

mortality as well as reproductive inefficiency (e.g., Cole et al. 2016). Calf mortality represents a major economic loss for farmers, poses great animal welfare issues, and threatens public

perceptions of the dairy industry. Therefore, efficient strategies for limiting the impact of recessive lethal alleles on calf mortality are critical. Several strategies have been proposed

that involve the use of genotype information to limit the harmful effects of recessive lethal alleles (e.g., Charlier et al. 2008; Pryce et al. 2012; Cole 2015), including the possibility of

their removal through genome editing (Johnsson et al. 2019). Currently, the most practiced approach in dairy breeding is to limit carrier-to-carrier mating (Charlier et al. 2008). While

such an approach might be feasible with few lethal alleles segregating in the population, it is becoming considerably more difficult with the increasing number of detected recessive lethal

alleles (VanRaden et al. 2011; Fritz et al. 2013; Sahana et al. 2013, 2016; Kadri et al. 2014; Hoff et al. 2017). Preselection based on carrier status reduces selection intensity,

consequently affecting genetic gains in economically important traits. In addition, the genetic causes of calf mortality are also partly polygenic. Therefore, genomic prediction of breeding

values for calf mortality traits should take into account both polygenic and recessive lethal components. Several approaches have been proposed for incorporating genotypic information of

major genes into genomic prediction for traits controlled by both “major” genes and polygenic inheritance (e.g., Hoeschele 1988; Fernando and Grossman 1989). However, the proposed models

traditionally rely on the assumption of additive effects for major quantitative trait loci (QTLs), as in the remaining genome-wide markers (Cole 2015), and different weights, with the a

priori assumption of different effect sizes. Adopting such approaches to model polygenic effects and recessive lethal alleles simultaneously is problematic, as not only the effect sizes but

also the modes of gene action are different. The effect of the heterozygous genotype for a recessive lethal locus on an individual is the same as that of the homozygous genotype of the wild

type, but the effects on the future mortality risk of offspring differ. Additional sources of complexity in modeling the two effects simultaneously are that different recessive lethal

alleles might have different penetrance levels and the fact that the mortality risk transferred to offspring depends on both penetrance and recessive allele frequency. Therefore, fitting

polygenic and recessive lethal effects simultaneously in genomic prediction models has been a challenge. In this study, we hypothesize that modeling risk probabilities from polygenic QTL

effects and recessive lethal alleles separately can improve the accuracy of selection for mortality traits. The objective of this study is therefore to present a strategy where a breeding

animal’s risk with regard to the mortality of its future offspring can be predicted in terms of probabilities for the recessive lethal allele and polygenic risk components. In this approach,

risk probabilities from recessive lethal loci are computed as a function of carrier status and the recessive allele frequency in the population. We develop methodologies for transforming

genomic estimated breeding values (GEBVs) into polygenic risk probabilities using normal density and cumulative distribution functions. Simulated data are used to test the advantages of the

presented approach by comparing its prediction accuracies with those of an alternative approach where risk probabilities are predicted without distinguishing polygenic and recessive lethal

allele effects. MATERIALS AND METHODS In this section, we present an approach that allows the prediction of genetic risk posed to the survival of future offspring by both the recessive

lethal locus and polygenic components of a breeding animal as the basis for selection to improve mortality traits. RISK PROBABILITIES FROM RECESSIVE LETHAL ALLELES The computation of risk

probabilities from recessive lethal alleles is straightforward when using the carrier status of an animal as inferred from its genotype and the recessive allele frequencies in the

population. Given the genotype of a breeding animal, e.g., a sire, for a particular recessive lethal locus, an offspring’s risk probability of succumbing to a lethal gene is the probability

that it receives two copies of the recessive lethal allele. Assuming a locus _i_ with recessive lethal allele a and wild-type allele A, the probability of the offspring’s mortality due to

the lethal allele (hereafter referred to as _p_lethi) is the probability of the offspring having an aa genotype. This can be computed given the sire’s genotype and the frequency of the

recessive allele in the population (_p__a_), which accounts for the probability of receiving a copy of the lethal allele from a dam under the assumption of random mating: $$p_{\left(

{{\rm{leth}}_i} \right)} = p\left( {{\mathrm{a}}\,{\mathrm{from}}\,{\mathrm{sire}}\,\left| {{{{\rm sire}\_{\rm geno}}}} \right.} \right) \times p_a.$$ (1) Assuming complete penetrance and

given three possible outcomes of the sire’s genotype, i.e., AA, Aa, or aa, the lethal risk probability at locus _i_ for an offspring can be given as: $${\rm{Sire}}\,{\rm{AA}}:\,p\left(

{{\rm{leth}}_i} \right) = 0,$$ $${\rm{Sire}}\,{\rm{A}}{\mathrm{a}}:p\left( {{\rm{leth}}_i} \right) = 0.5 \times p_a,$$ ${\rm{Sire}}\,{\rm{aa}}:\,p\left( {{\rm{leth}}_i} \right) = p_a$

(however, the affected individual may not be a breeding animal under complete penetrance). In cases of recessive lethal alleles with a penetrance other than 100%, the risk probability in Eq.

(1) can be calculated by multiplying by the penetrance level of a given lethal allele: $$\begin{array}{l}p\left( {{\rm{leth}}_i} \right) = p\left(

{{\mathrm{a}}\,{\mathrm{from}}\,{\mathrm{sire}}\,\left| {{{{\rm sire}\_{\rm geno}}}} \right.} \right) \times p_{a} \times {\rm{penetrance}}\end{array}.$$ (2) An individual might carry more

than one recessive lethal gene. Thus, the risk probability across all _n_ loci can be computed as: $$p\left( {{\rm{leth}}} \right) = 1 - \mathop {\prod}\nolimits_{i = 1}^n {\left( {1 -

p\left( {{\rm{leth}}_i} \right)} \right)}.$$ (3) PREDICTION OF POLYGENIC RISK PROBABILITIES Obtaining risk probabilities from the polygenic component requires a transformation from GEBVs,

which are predicted using different genomic prediction models. Mortality-related traits are often recorded as categorical outcomes, which are usually binary and hence non-normally

distributed. Commonly used approaches for modeling categorical traits include threshold-liability (probit) and logistic regression (logit) models. Despite their violations of normality

assumptions, linear models (LM) have also been employed in many studies for genetic analysis of categorical traits (e.g., Rao and Xia 2000; Peñagaricano et al. 2011). To accommodate the

specific computations in the prediction of GEBVs in these different models, we present strategies for transforming GEBVs into polygenic risk probabilities as implemented in probit model,

logit model, and LM. LIABILITY THRESHOLD (PROBIT) MODEL In genetic modeling of categorical traits, one of the most commonly used approaches is the threshold model (Wright 1934). In the

threshold model, the observed categorical responses are assumed to be the outcome of an underlying, normally distributed latent variable, often termed the liability (_l_), in relation to a

fixed threshold (_τ_). In the context of mortality traits, the categories of response are usually “survival,” denoted as event 0, or “death,” denoted as event 1, i.e., a binary response,

during a given monitored period. Accordingly, the observed categorical outcomes (_y_) are linked to the underlying liability (_l_) such that: $$y = \left\{ {\begin{array}{*{20}{c}}

{1,\,{\rm{if}}\,l > \tau } \\ {0,\,{\rm{if}}\,l < \tau } \end{array}} \right..$$ (4) Thus, the expected liability (_η_) is assumed to be a function of the predictors such that:

$${\mathbf{\upeta }} = {\mathbf{X}}{{\mathbf\beta }} + {\mathbf{Za}},$$ (5) where Η is the vector of all expectations _η__i_, Β is the vector of fixed effects, A is the vector of random

additive genetic effects, and X and Z are the design matrices for the fixed and random effects, respectively. Thus, the true underlying liability (_l_) will be the expectation plus the

residuals such that: $${\boldsymbol{l}} = {\mathbf{\upeta}} + {\boldsymbol{e}},$$ (6) where _L_ is the vector of all _l__i_ and _E_ is the vector of random residuals. The underlying

liability for each individual as a linear function of the linear predictors can thus be rewritten as: $$l_i = \eta _{\boldsymbol{i}} + e_i.$$ Combining Eqs. (4) and (6): $$y = \left\{

{\begin{array}{*{20}{c}} {1,\,{\rm{if}}\,\eta _i + e_i > \tau } \\ {0,\,{\rm{if}}\,\eta _i + e_i < \tau } \end{array}} \right..$$ (7) Given _η__i_ estimated from the data and an

assumed value for the fixed threshold _τ_, the observed outcomes are conditional on the residuals as follows: $$y = \left\{ {\begin{array}{*{20}{c}} {1,\,{\rm{if}}\,\,e_i > \tau - \eta

_i} \\ {0,\,{\rm{if}}\,\,e_i < \tau - \eta _i} \end{array}} \right..$$ (8) The probability of observing event 1 (mortality) given _η__i_ and _τ_ can then be estimated as: $${p}\left(

{{y}_{i} = 1{\mathrm{|}}\eta _{\boldsymbol{i}},{\uptau}} \right) = {p}\left( {{e}_{i} > \tau - \eta _{\boldsymbol{i}}} \right) = 1 - \Phi _e\left( {\tau - \eta _{\boldsymbol{i}}}

\right),$$ (9) where _Φ__e_(.) is the cumulative density function with $e_i\sim N\left( {0,\,\sigma _{e}^2} \right)$, where $\sigma _{e}^2$ is residual variance. During implementation of

the probit model, the threshold _τ_ is commonly set to 0 as a convenient origin. Since the liability cannot be observed, the variation in the liability is scaled to be \(\sigma _{e}^2 =

1.\) Thus: $${p}\left( {{y}_{i} = 1\left| {\eta _{\it{i}},{\it{\uptau }}} \right.} \right) = 1 - \Phi _e\left( { - \eta _{\it{i}}} \right),$$ (10) with _e__i_ ~ _N_(0, 1). Considering the

simplest case, where the population mean is the only fixed effect in the threshold model, i.e., XI_B_ = 1μ, the probability of observing event 1 can be estimated as: $${p}\left( {{y}_{i} =

1\left| {\widehat \mu ,\widehat {a_i}} \right.} \right) = \widehat {\pi _i} = 1 - \Phi _e\left( { - \left( {\widehat \mu + \widehat {a_i}} \right)} \right),$$ (11) where $\widehat \mu$ is

the predicted population mean and $\widehat a_i$ is the EBV of individual _i_. LOGIT MODEL Alternatively, a logistic distribution can be assigned to the residuals, resulting in a model

known as the logit model. In this model, it is assumed that the logit of an underlying probability (_π__i_) is a function of the linear predictors: $${\rm{logit}}\,\left( {\pi _i} \right) =

{\rm{log}}\left( {\frac{{\pi _i}}{{1 - \pi _i}}} \right) = {\boldsymbol{x}}_{\boldsymbol{i}}{\boldsymbol{b}}_{\boldsymbol{i}} + {\boldsymbol{z}}_{\boldsymbol{i}}{\mathbf{a}}.$$ (12) Again,

assuming a simple scenario, where the only fixed effect in the model is the population mean, Eq. (12) can be rewritten as: $${\rm{logit}}\,\left( {\pi _i} \right) = {\rm{log}}\left(

{\frac{{\pi _i}}{{1 - \pi _i}}} \right) = \widehat \mu + \widehat {a_i}.$$ The underlying probability (_π__i_) can then be estimated by the inverse logit transformation: $$\pi _i =

{\rm{logit}}^{ - 1}\left( {\widehat \mu + \widehat a_i} \right) = \frac{{{\rm{Exp}}\left( {\widehat \mu + \widehat {a_i}} \right)}}{{1 + {\rm{Exp}}\left( {\widehat \mu + \widehat {a_i}}

\right)}}.$$ (13) LINEAR MODELS In addition to the probit and logistic regression models, it is a common practice in routine genetic evaluation to fit categorical trait data using LM,

treating the traits as normally distributed. It has been shown that the loss of power when using a linear Gaussian model for categorical traits is negligible compared with implementing logit

or probit regression, despite the violations of assumptions of normality (e.g., Meijering and Gianola 1985). Here, we present an approach for transforming EBVs obtained from a LM into risk

probabilities in a way comparable to the probit approach. An advantage of the probit approach is that the threshold is usually set to 0. However, in a LM, the threshold (_τ_) must be

estimated. Here, we estimate an approximate _τ_ based on the cumulative distribution function of the normal distribution. Given a particular threshold (_τ_), the proportion of an event

(_y__i_ = 1) (hereafter _π_) is the proportion of the underlying liability (_l_) above the threshold, i.e.: $$1 - \pi = \Phi \left( {\uptau} \right),$$ (14) where _Φ_(_τ_) is the cumulative

probability of the normal distribution _N_(_μ_, _σ_2). _τ_ is unknown, but _π_ can be estimated as the proportion of observed _y__i_ = 1 events in the data. Thus, the unknown threshold (_τ_)

can be estimated by inverse probability transformation as: $$\tau = \Phi ^{ - 1}\left( {1 - \pi } \right).$$ (15) As mentioned above, the LM treats the binary observations as variables with

a normal distribution. To be consistent with the mean and variance of the distribution for binary observations, the normal distribution used to derive _τ_ is assumed to be: $${N}\left( {\pi

,\,\pi \times \left( {1 - \pi } \right)} \right).$$ Furthermore, risk probabilities can then be estimated by Eqs. (9)–(11), but with an approximated threshold _τ_, which is not set to 0 as

in the liability threshold model. Thus: $${p}\left( {{y}_{i} = 1\left| {\widehat \mu ,\widehat a,\tau } \right.} \right) = \widehat \pi _i = 1 - \Phi _e\left( {\tau - \left( {\widehat \mu +

\widehat a_i} \right)} \right).$$ (16) Finally, since we are interested in the polygenic risk transmitted from a breeding animal, e.g., a sire, which on average passes half of its breeding

value to its future offspring, the polygenic risk probability transmitted is half the probability of the sire estimated using the models presented above. Thus: $$p_{{{{\rm poly}\_{\rm

offspring}}}} = 0.5 \times p_{{{{\rm poly}\_{\rm sire}}}}.$$ (17) COMBINING RISK PROBABILITIES FROM THE LETHAL ALLELE AND POLYGENIC COMPONENTS The risk probabilities computed separately from

the recessive lethal alleles and the polygenic effect can finally be combined to give the total risk probability of an individual’s future offspring. Assuming that the probabilities of

survival from the lethal allele and polygenic components are independent, the total risk probability (_p__total_) can be given as: $$p_{{\rm{total}}} = 1 - \left( {\left( {1 -

p_{{\rm{leth}}}} \right) \times \left( {1 - p_{{\rm{poly}}}} \right)} \right),$$ (18) where _p_leth is the risk probability from the lethal component and _p_poly is the risk probability from

the polygenic component. ASSESSING THE ACCURACY OF PREDICTED RISK PROBABILITIES It is possible to assess the accuracy of predicted total risk probabilities with a validation procedure,

where EBVs for the polygenic effect of validation individuals are predicted without using observations from their offspring. The predicted EBVs are subsequently transformed into risk

probabilities, which together with the risk from the lethal genes are used to compute the total risk probability. The accuracy of the predicted risk probabilities can then be computed as the

correlation between the predicted total risk probabilities of the test individual’s offspring and the observed proportion of offspring mortality. SIMULATION EXPERIMENTS We tested the

proposed approach with a dataset of 15 replicates simulated using the stochastic simulation program ADAM (Pedersen et al. 2009). In each replicate, a population of animals was simulated for

4 years with overlapping generations and assuming no selection. In each year, 50 males were randomly selected to mate with 10,000 females of different parities, with each male mated to 200

females. Offspring’s sex was assigned randomly with a probability of 50% for males and females. The simulations resulted in a total of 40,000 individuals in four generations with

approximately equal proportions of males and females. Animals born in generations 1–3 were used as the reference population, while animals in generation 4 (G4) were used as the test

population. Genotype data were simulated mimicking the real linkage disequilibrium profile in the Danish Holstein as described in detail by Thomasen et al. (2019). The simulated genotype

data included 40K markers, 1980 QTLs with polygenic effects, and 20 lethal genes with recessive allele frequencies between 0.04 and 0.05. A large number of QTLs with polygenic effects were

assumed in order to mimic a trait of mixed polygenic-major gene inheritance, where a large number of QTLs with small effects each and a few lethal genes composed the genetic architecture.

Other simulation studies mimicking the bovine genome for genomic prediction have assumed similar number of QTLs (Lourenco et al. 2013), or slightly more QTLs (Hayes et al. 2009; Thomasen et

al. 2019), underlying various polygenic traits. The assumed recessive allele frequencies were chosen based on the range reported for recessive lethal haplotypes detected in the Danish and

Nordic cattle breeds (Wu et al. 2020). SNPs were distributed across 30 chromosomes, each 100 cM in length. The QTLs were assumed to be evenly distributed across the genome, such that on each

chromosome, 66 SNPs were randomly sampled to be QTLs. The effect of each QTL was sampled from a normal distribution, and the effects collectively explained all the variation in the

simulated polygenic true breeding values (TBV). Thus, the TBV for each animal _i_ was defined as the sum of all QTL genotypic values: $${\mathrm{TBV}}_{i} = {\sum} {{g}_{j}\,{Q}_{{ij}}},$$

where _g__j_ is the allele substitution effect of _j_th QTL, _Q__ij_ is the QTL genotype at locus _j_ in individual _i_, coded as 0, 1, or 2 representing the number of copies for a

particular allele in the genotype. The TBVs were finally scaled to have a variance of 1 in base population through dividing TBV by the standard deviation of TBVs in the base population,

i.e., setting the additive genetic variance as: $\sigma _a^2 = 1$. Simulation of the TBVs was performed at the liability scale, with a target heritability of 0.02 at the observed scale,

according to heritability estimates reported in the literature for the Holstein breed (e.g., Hansen et al. 2003; Fuerst-Waltl and Sørensen 2010; Henderson et al. 2011). The target

observed-scale heritability was transformed to the underlying scale using the formula proposed by Dempster and Lerner (1950): $$h_l^2 = \frac{{h_x^2 \times \pi \left( {1 - \pi }

\right)}}{{z^2}},$$ (19) where $h_l^2$ is the heritability at the underlying scale, _z_ is the height of the normal distribution curve at the threshold, $h_x^2$ is the heritability at

the observed scale, which is 0.02 in this study, and _π_ is the proportion for _y_ = 1 (_π_ = 0.068 in this study). Thus, heritability at the underlying scale was 0.075. Simulated QTLs were

not included in the construction of the genomic relationship matrices (GRMs) used for prediction. Recessive lethal loci were assigned by randomly sampling from SNPs with minor allele

frequencies (MAFs) between 0.04 and 0.05 on 20 randomly selected chromosomes. Each lethal locus was located on a different chromosome, and the loci were thus assumed to be independent of one

another. Four different scenarios were simulated with regard to the penetrance of recessive lethal alleles. These included three scenarios where all 20 lethal alleles were assumed to have

an equal penetrance of 60, 80, or 100%. The fourth scenario considered a mixture of four penetrance groups (with an equal number of lethal alleles): 60, 70, 80, and 100% penetrance.

Liability for death was generated by adding a residual effect to TBV. The residual effect was sampled from _e_ ~ _N_(0, $\sigma _e^2$), where the residual variance ($\sigma _e^2$) was

$\frac{{\left( {1 - h_l^2} \right)\sigma _a^2}}{{h_l^2}}$ = (1 − 0.075)/0.075. Simulated phenotypic values in observed scale were either 0 if the animal survived or 1 if the animal died.

Individuals were assigned phenotypic values in a stepwise manner, considering both polygenic and recessive lethal allele components. First, a threshold was calculated as the inverse

cumulative distribution function of a target mortality incidence (_y_ = 1) of 6.8% from a normal distribution with mean 0 and variance $\sigma _a^2 + \sigma _e^2$. Individuals with a

liability greater than the threshold were subsequently assigned a phenotypic value of 1. Second, an individual’s phenotype was assigned as 1, regardless of the assigned phenotype due to

polygenic component, if its genotype for at least one of the recessive lethal alleles was in a homozygous state. When recessive lethal alleles were assigned a penetrance other than 100%, a

proportion of the homozygous individuals for each allele was assigned a phenotype of 1 in accordance with the penetrance value assumed (60, 80%, or a mixture with an average of 75%). The

average observed mortality over 15 replicates was 9.43%. On average, 29.73% of the total mortality was caused by recessive lethal allele effects, while 68.21% was due to polygenic risk and

2.06% was due to both. STATISTICAL ANALYSIS Breeding values and risk probabilities were estimated using the above approach. To predict risks from polygenic and recessive lethal allele

effects separately, the data used to predict polygenic effects excluded records of death caused by recessive lethal alleles (Data_poly). Similarly, the GRM used for the prediction of GEBVs

was constructed without recessive lethal alleles. All GRMs used for the different scenarios were calculated using the first method presented by VanRaden (2008), and SNP allele frequencies

for building GRMs were calculated directly from the SNP data. Risk probabilities from the recessive lethal alleles were computed separately and then used to compute the total risk using Eq.

(18). To test the proposed approach’s superiority over the conventional approach, breeding values and risk probabilities were also estimated using a conventional approach that did not

distinguish the polygenic and lethal allele effects. Thus, the conventional approach predicted total breeding values using phenotypic data without excluding observations of death due to

recessive lethal alleles (Data_all) and GRMs that did not include recessive lethal genotypes. Subsequently, the predicted GEBVs were transformed into total risk probabilities. In addition,

we also assessed the accuracy of the polygenic GEBVs predicted without distinguishing the two effects, i.e., with the data that included death due to recessive lethal alleles, but taking

into account the effects of the recessive lethal alleles, either by including recessive lethal alleles in the construction of the GRM or using models that included fixed regression on

recessive allelic genotype code. Below, we present the methods used to predict the GEBVs that were subsequently transformed into risk probabilities for each of the models (probit, logit, and

linear). These approaches were implemented using three statistical models, namely, generalized linear mixed models with logit and probit link functions and a linear mixed model, using DMU

software (Madsen and Jensen 2013). THE PROBIT MODELS FOR ANALYSIS OF THE SIMULATED DATA Three probit models were used to estimate breeding values and risk probabilities due to polygenic

effects. The first probit model was: $${\mathrm{Probit1}}:{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\mathbf{Z}}{\mathbf{g}},$$ (20) where element _i_ of Η is _η__i_ = _Φ_−1(_π__i_), _μ_ is the

overall mean, G is the random additive genetic effects with distribution ${\mathbf{g}}\sim N\left( {0{\mathrm{,}}\,{\mathbf{G}}{\it{\upsigma }}_{\it{a}}^2} \right)$, and G is the GRM

constructed using only markers, i.e., excluding the recessive lethal loci. Both Data_poly and Data_all were analyzed using this model. The second probit model includes fixed regression on

lethal genotype: $${\mathrm{Probit2:}}\,{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\mathbf{x}}d + {\mathbf{Z}}{\mathbf{g}},$$ (21) where _d_ is the fixed regression coefficient for lethal

genotype score and _X_ is the vector of recessive lethal statuses. Since a homozygous recessive genotype at a given locus can cause mortality regardless of the genotype at another recessive

lethal locus, the element of _x_ is 1 as long as the recessive lethal allele is in a homozygous state at any locus and “0” otherwise. The G matrix in this model also excluded genotypes of

recessive lethal loci, and this model was used to analyze Data_all. The third probit model was implemented to test the impact of including recessive lethal alleles in the GRM on genomic

prediction: $${\mathrm{Probit3:}}\,{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\mathbf{Z}}{\mathbf{g}}^ \ast ,$$ (22) where G* are the vectors of random additive genetic effects with distribution

$N\left( {0,\,{\mathbf{G}}^ \ast \sigma _{{a}^ \ast }^2} \right)$, where G* is the GRM based on the markers including lethal alleles and $\sigma _{{a}^ \ast }^2$ is the corresponding

genetic variance. This model was used to analyze Data_all. THE LOGIT MODELS FOR ANALYSIS OF THE SIMULATED DATA The logit models used in the analysis were:

$${\mathrm{Logit1:}}\,{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\mathbf{Z}}{\mathbf{g}},$$ (23) $${\mathrm{Logit2:}}\;{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\boldsymbol{x}}{d} +

{\boldsymbol{Z}}{\mathbf{g}},$$ (24) $${\mathrm{Logit3:}}\;{\mathbf{\upeta}} = {\mathbf{1}}\mu + {\mathbf{Z}}{\mathbf{g}}^ \ast,$$ (25) where element _i_ of Η is \(\eta _i = {\rm{log}}\left(

{\frac{{\pi _i}}{{1 - \pi _i}}} \right)\) and the rest of the model components are the same as those in Models (20)–(22). THE LINEAR MODELS FOR ANALYSIS OF THE SIMULATED DATA The

implemented LM included three genomic best linear unbiased prediction models: $${\mathrm{LM1:}}\,{\mathbf{y}} = {\mathbf{1}}\mu + {\mathbf{Z}}{\mathbf{g}} + {\mathbf{e}},$$ (26)

$${\mathrm{LM2:}}\,{\mathbf{y}} = {\mathbf{1}}\mu + {\boldsymbol{x}}{d} + {\mathbf{Zg}} + {\mathbf{e}},$$ (27) $${\mathrm{LM3:}}\,{\mathbf{y}} = {\mathbf{1}}\mu + {\mathbf{Zg}}^ \ast +

{\mathbf{e}},$$ (28) where Y is the vector of observations (0, 1) and E is the vector of residuals with distribution ${\mathbf{e}}\sim {N}\left( {0,{\mathbf{I}}\,\sigma _e^2} \right),$

where I is an identity matrix and $\sigma _e^2$ is the residual variance. The rest of the model components in (26)–(28) are as described in Models (20)–(22). COMPUTATION OF PREDICTION

ACCURACY AND BIAS To assess the accuracy of predicted risk probabilities, the predicted total transmitted risk probabilities of the sires of the G4 animals were compared with the observed

proportion of deaths among their G4 offspring, whose phenotypes were masked during the prediction. This is to be consistent with the situation where the candidate animals do not have

offspring at the time of selection. For the approach that distinguished polygenic and recessive lethal allele effects, the predicted total risk probability was calculated using Eq. (18) to

combine the risk probabilities from the recessive lethal allele and polygenic components obtained from GEBVs that were predicted using LM1, Logit1, or Probit1 and Data_poly. For the analysis

that did not consider lethal genes, the predicted total risk probabilities were calculated from the GEBVs predicted using LM1, Logit1, or Probit1 but based on Data_all. Prediction bias was

measured as the coefficient of regression of the observed rate of calf mortality against the predicted probabilities for the validation sires. In addition, the advantages of incorporating

recessive lethal alleles, by either including them in the GRM (LM3, Logit3, and Probit3) or considering lethal genotype as a fixed effect in regression (LM2, Logit2, and Probit2) to predict

polygenic breeding values, were assessed by the accuracy of GEBVs, which was measured as the correlation between GEBVs and simulated polygenic TBVs. The statistical significance of

differences in prediction accuracies between scenarios (approaches) and models, i.e., probit, logit, and linear, was tested using a pairwise _t_-test across the replicates. RESULTS ACCURACY

AND BIAS OF PREDICTED RISK PROBABILITIES Figure 1 presents the accuracy of the total risk probability predicted with the two approaches in comparison: (1) the novel approach in which risk

probabilities from the polygenic and recessive lethal components were estimated separately, with the polygenic component predicted using LM1, Logit1, or Probit1 based on Data_poly, and (2)

the conventional approach where risk probability was estimated without distinguishing the polygenic and recessive lethal effects and obtained from GEBVs predicted using LM1, Logit1, or

Probit1 but based on Data_all. Across all penetrance scenarios, the accuracies obtained with the novel approach were significantly higher (_P_ < 0.001) than those obtained with the

conventional approach. The difference in prediction accuracy between the two approaches ranged between 20 and 29.1 percentage points, depending on the penetrance scenario assumed. In all

three statistical models, i.e., LM1, Probit1, and Logit1, the highest accuracy was observed when all lethal alleles had 100% penetrance (Pen100), while the lowest accuracy was observed when

all alleles had the lowest penetrance (Pen60). Among the three models, Probit1 resulted in the highest prediction accuracies, followed by Logit1, in both approaches. However, these

differences in prediction accuracy between the three models were not statistically significant. Table 1 presents the regression coefficients for observed calf mortality against predicted

total risk probability for the novel approach which distinguished recessive lethal allele and polygenic effects and used Data_poly, and the conventional approach, which did not distinguish

the two effects and used Data_all. For the analysis distinguishing the two effects, the regression coefficients were close to 1 for all three models and penetrance scenarios. For the

analysis with conventional approach, however, the regression coefficients deviated from 1 for all models and penetrance scenarios, with the largest deviation observed for the LM. ACCURACY OF

PREDICTION OF GEBVS Figure 2 shows the accuracy of GEBVs predicted with the different approaches: (1) using Data_poly and a model with a GRM that did not include recessive lethal loci (LM1,

Probit1, and Logit1), (2) using Data_all and a model with regression on lethal genotype (LM2, Probit2, and Logit2), and (3) using Data_all and a model with a GRM including genotypes of

recessive lethal loci. In general, approach (1) resulted in the highest accuracy of predicted GEBVs, ranging from 0.319 to 0.323 according to penetrance class. Approach (2) resulted in

slightly lower accuracies compared with those in approach (1), ranging from 0.307 to 0.322. However, the differences in GEBV prediction accuracies between the two approaches were only

statistically significant in scenarios Pen60 (_P_ < 0.01) and PenGRP (_P_ < 0.05). Among the three approaches, approach (3) produced the lowest GEBV prediction accuracies. The

prediction accuracies obtained using this approach were significantly lower (_P_ < 0.001) than those obtained using approaches (1) and (2). DISCUSSION PREDICTION OF RISK PROBABILITIES

Genomic prediction for traits with mixed “major” genes and polygenic inheritance has been shown to benefit from models that account for differences in marker effects compared with models

with “infinitesimal” assumptions (e.g., Cole et al. 2009; Hayes et al. 2010; Legarra et al. 2011). Recessive lethal loci might be considered special cases of “major” genes. While a single

recessive lethal locus might have a large effect on an observed phenotype, the effect on the individual itself might be different from the effect on its future offspring. The carrier status

of a single recessive lethal allele alone does not affect the carrier’s mortality but can determine the categorical outcome (death or survival) of an offspring. In this study, we present an

efficient approach for predicting the total risk probabilities for future offspring of selection candidates by predicting risk probabilities from polygenic and recessive lethal components

separately. By using simulated data, the prediction accuracies of this approach were compared with those of a conventional approach that did not distinguish polygenic and recessive lethal

allele effects (Data_all vs. Data_poly in Fig. 1). The results show that the prediction of risk probabilities with the proposed approach leads to high accuracy in predicting mortality for

the future offspring of selection candidates, with a gain in accuracy up to 29.1 percentage points. By blending the risk probabilities estimated separately for the two components, the novel

approach allows efficient utilization of information from the recessive lethal component, which otherwise tends to be difficult to untangle with simultaneous modeling due to differing modes

of gene action. The gain in accuracy achieved by distinguishing the polygenic and recessive lethal allele effects, compared with the approach that does not distinguish the two effects, is

dependent on the rate of mortality caused by the two effects. In our analysis, the gain in accuracy (29.1%) was comparable to the rate of mortality caused by recessive lethal alleles in the

simulation (29.73%). A potential challenge for the separate prediction of risk probabilities caused by polygenic and lethal allele effects in real data scenarios might be the difficulty of

conclusively distinguishing mortality caused by recessive lethal alleles and that caused by polygenic components. This is because genotypic information may not be available for dead animals.

In this study, we further investigated if taking into account the recessive lethal genotypes through, either including regression on lethal genotypes in the models or by accounting for

genotypes of the recessive lethal loci in the GRM, could improve the accuracy of predicted GEBVs when using data that included mortality due to lethal alleles. The results showed that when

the data included records of mortality caused by recessive lethal alleles, including the recessive lethal alleles in the GRM did not improve GEBV prediction accuracy. In contrast, when using

data that included records of mortality due to lethal alleles, models with regression on lethal genotype resulted in prediction accuracy being comparable to the approach that distinguished

the two effects using Data_poly. Compared with including genotypes of recessive lethal loci in the GRM, using models with fixed regression on lethal genotypes improved the accuracy of

polygenic GEBVs by 4–9.3 percentage points, based on data that included records of mortality due to lethal alleles. These results demonstrate that in the situations where excluding mortality

caused by recessive lethal alleles is difficult, using models with regression on lethal genotypes can improve prediction accuracies. Potential challenges in the approach considering fixed

regressions on lethal genotype is the definition of a lethal covariable and the uncertain relationship between a lethal covariable and observations in the case of incomplete penetrance and

unequal penetrances among lethal loci. Consequently, prediction accuracies might be affected by penetrance. This was demonstrated in our simulation, where the prediction accuracy of the

model considering fixed regression on lethal genotype was significantly lower than that of the approach using Data_poly for scenarios having a lower penetrance (Pen60) and a mixture of

penetrance levels (PenGRP). In our simulation study, the accuracy of the predicted GEBVs was generally low across the compared models. This is expected given the low heritability considered

in the simulation. In dairy cattle breeding, definitions of different calf and young stock mortality traits are dependent on monitoring period. In general, studies across several dairy

cattle breeds have shown very low heritability estimates for calf and young stock mortality traits (e.g., Hansen et al. 2003; Fuerst-Waltl and Sørensen 2010; Henderson et al. 2011), which

limit the expected genomic prediction accuracy. However, given the major deleterious effects of individual recessive lethal genes, the prediction accuracy for mortality traits can be

improved with the efficient incorporation of genotypic information on such genes. The results of this study indicate that the presented novel approach is quite advantageous in integrating

information on recessive lethal and polygenic components for the prediction of mortality traits. By bringing the two components to a comparable scale, i.e., risk probability, the approach

allows utilizing information from both effects to predict the mortality status of future offspring of a breeding animal. A somewhat comparable approach to the prediction of risk

probabilities presented in this study is the polygenic risk score (PRS) approach, which is commonly used in human genetics to predict an individual’s risk of succumbing to a particular

disease (Wray et al. 2007, 2019; Evans et al. 2009). However, the PRS used in human genetics is predicted based on SNP effects estimated from genome-wide association studies that are often

based on fitting one SNP at a time, thus ignoring all other SNPs (Wray et al. 2007). Moreover, PRSs in human genetics are used to predict the future phenotypes of an individual, while the

primary objective in our approach, and in animal breeding more generally, is to predict a selection candidate’s transmission ability to its future offspring. There are several assumptions in

our simulation study that might not be fully consistent with the features of real data and thus might affect prediction accuracies to some extent. We have shown that the gain in accuracy

achieved by distinguishing polygenic and recessive lethal allele effects is dependent on the rate of mortality caused by recessive lethal alleles. This rate, in turn, depends on the number

of recessive lethal loci and recessive allele frequencies. In the simulation, 20 loci with lethal allele frequencies between 0.04 and 0.05 were assumed. These might be considered

high-frequency lethal alleles compared with what one would expect for a lethal allele under mutation-selection balance or drift. Therefore, in cases with smaller numbers of recessive lethal

loci with lower MAFs, the mortality caused by lethal alleles will be lower, subsequently resulting in a smaller gain achieved by distinguishing the two effects. However, several recessive

lethal mutations have been identified in cattle breeds, and the numbers continue to increase (Cole 2015), with some reaching high recessive allele frequency (e.g., Kadri et al. 2014; Sahana

et al. 2016; Hoff et al. 2017). An additional assumption potentially prone to violation in real scenarios is the independence of recessive lethal loci and the independence of recessive

lethal loci and nonlethal loci across the genome. In reality, recessive lethal alleles might be in LD with each other as well as with other loci. However, this is expected to have negligible

consequences when using the novel approach that predicts risk probability due to lethal allele and polygenic effects separately, where polygenic GEBVs are estimated using Data_poly, but may

cause confounding between the two effects when using the approach that does not distinguish the two effects, based on Data_all. An additional issue that was not taken into account in our

simulation is the possibility of synergistic epistasis between the recessive lethal loci and other loci with polygenic effects. Under such interaction, the lethality, or penetrance, of

recessive lethal loci may depend on polygenic effects, thus risking double counting of lethal effects when GEBVs are estimated in the presence of the lethal alleles. Such epistatic

interactions were not considered in this study due to the complexity and lack of prior information for the simulation. COMPARISON OF MODELS Categorical traits are not normally distributed,

and thus linear mixed models are believed to behave poorly in modeling such traits (Portnoy 1982). Despite such violations of normality assumptions, the use of linear mixed models in the

genetic analysis of categorical traits is gaining popularity due to their straightforward implementation. Meijering and Gianola (1985) demonstrated that LM can be applied without much loss

of statistical power. In our study, slight differences in prediction accuracy were observed between the three models implemented, i.e., logit model, probit model, and LM, but the differences

were not statistically significant. These results indicate that our approach can be implemented in LM with negligible loss of accuracy. The regression coefficients of observed proportions

of calf mortality against predicted risk probabilities were different from 1 when using the models that did not distinguish polygenic and lethal allele effects. The deviation from 1 was much

larger for the LM than for the logit and probit models. This could partly be because in the LM, the threshold is approximated by direct calculation of mortality from the data, as opposed to

the probit model, where the threshold is set to 0 for convenience and the underlying liability moves the origin accordingly. For the LM, the deviation from 1 was even larger for the

approach based on Data_all that did not distinguish recessive lethal allele and polygenic effects. This could be explained by the fact that the threshold was approximated from observed

mortality in the data, including mortality due to recessive lethal alleles, and hence the approximate threshold could be far from the threshold for the polygenic model. Moreover, the

relationship between sire risk probability due to polygenic effects and offspring mortality is not necessarily linear. Consequently, the regression coefficient of observed mortality against

the predicted risk probability might not necessarily be 1. MANAGEMENT OF RECESSIVE LETHAL ALLELES IN BREEDING PROGRAMS To date, commonly proposed methods for managing recessive lethal

alleles have focused on the optimization of mate selection to avoid carrier-to-carrier matings. Van Eenennaam and Kinghorn (2014) proposed methods and programs that allow selection against

the total number of lethal alleles and recessive lethal genotypes. Cole (2015) extended the parent-average penalizing method for controlling inbreeding proposed by Pryce et al. (2012),

allowing it to consider information on recessive lethal alleles. Some studies also suggested the complete removal of carriers from the breeding population to eradicate recessive lethal

mutations (e.g., Thompson et al. 2006). Managing recessive lethal alleles requires a trade-off between controlling recessive lethal alleles in the long run and maintaining genetic gains in

production and functional traits (Segelke et al. 2014). Previously proposed methods aimed at optimizing mate selection as well as culling carriers might allow the control of recessive lethal

allele frequencies and avoid lethal homozygous genotypes. However, these methods represent classic tandem selection, where breeding animals are excluded from mating due to low merit for one

trait (recessive lethal alleles in this case), regardless of their superiority in other traits. The consequence of this approach is a reduction in selection intensity and a subsequent

reduction in genetic gain. The slightly different approach proposed by Segelke et al. (2014) recommends a selection index that weights the carrier status of recessive lethal haplotypes based

on economic consequences and population allele frequencies when selecting females for mating. A drawback of this approach, and many other mate-allocation-based approaches, is the inability

to handle many recessive lethal alleles. For instance, Cole (2015) pointed out the difficulty of assigning proper weights and costs for each recessive lethal allele as the number of

identified alleles increases. The approach proposed in this study enables blending polygenic breeding values for a given trait with risk probabilities from recessive lethal alleles. Thus,

the method is beneficial for a balance between controlling recessive lethal frequencies in the population and maintaining genetic gains in economically important traits. In contrast to the

methods where carrier status for each recessive lethal allele is a selection criterion, the proposed method integrates the effect of each recessive lethal allele into the breeding value for

a particular trait (mortality or survival), which can be used for selection decisions. Therefore, an overall weight for the trait of interest can be used to integrate the breeding values,

which account for both the polygenic and recessive lethal allele components, into a selection index with no need to assign weights for each recessive lethal allele. CONCLUSIONS This study

proposed an approach for predicting the probability of mortality of future offspring by predicting the risk probabilities from polygenic and recessive lethal components separately. The

approach was tested using simulated data and found to be superior to approaches that do not distinguish polygenic and lethal allele effects. No statistically significant differences in

prediction accuracy were observed between the probit model, logit model, and LM, suggesting that the novel approach can be implemented using different models, with comparable power. DATA

AVAILABILITY Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.xd2547ddv. REFERENCES * Agerholm JS, Bendixen C, Andersen O, Arnbjerg J (2001) Complex vertebral

malformation in holstein calves. J Vet Diagn Invest 13(4):283–289 Article CAS Google Scholar * Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, Cambisano N et al. (2008)

Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet 40(4):449–454. https://doi.org/10.1038/ng.96 Article CAS PubMed Google Scholar

* Cole JB (2015) A simple strategy for managing many recessive disorders in a dairy cattle breeding program. Genet Sel Evol 30(47):94. https://doi.org/10.1186/s12711-015-0174-9 Article CAS

Google Scholar * Cole JB, Null DJ, VanRaden PM (2016) Phenotypic and genetic effects of recessive haplotypes on yield, longevity, and fertility. J Dairy Sci 99(9):7274–7288.

https://doi.org/10.3168/jds.2015-10777 Article CAS PubMed Google Scholar * Cole JB, VanRaden PM, O’Connell JR, Van Tassell CP, Sonstegard TS, Schnabel RD et al. (2009) Distribution and

location of genetic effects for dairy traits. J Dairy Sci 92(6):2931–2946. https://doi.org/10.3168/jds.2008-1762 Article CAS PubMed Google Scholar * Dempster ER, Lerner IM (1950)

Heritability of threshold characters. Genetics 35:212–235 CAS PubMed PubMed Central Google Scholar * Evans DM, Visscher PM, Wray NR (2009) Harnessing the information contained within

genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet 18:3525–3531. https://doi.org/10.1093/hmg/ddp295 Article CAS PubMed Google Scholar

* Fernando RL (1989) Grossman M. Marker assisted selection using best linear unbiased prediction. Genet Sel Evol 21:467–477 Article Google Scholar * Fritz S, Capitan A, Djari A,

Rodriguez SC, Barbat A, Baur A et al. (2013) Detection of haplotypes associated with prenatal death in dairy cattle and identification of deleterious mutations in GART, SHBG and SLC37A2.

PLoS One 8(6):e65550 Article CAS Google Scholar * Fuerst-Waltl B, Sørensen MK (2010) Genetic analysis of calf and heifer losses in Danish Holstein. J Dairy Sci 93(11):5436–5442.

https://doi.org/10.3168/jds.2010-3227 Article CAS PubMed Google Scholar * Hansen M, Madsen P, Jensen J, Pedersen J, Christensen LG (2003) Genetic parameters of postnatal mortality in

Danish Holstein calves. J Dairy Sci 86(5):1807–1817 Article CAS Google Scholar * Hayes BJ, Visscher PM, Goddard ME (2009) Increased accuracy of artificial selection by using the realized

relationship matrix. Genet Res 91(1):47–60. https://doi.org/10.1017/S0016672308009981 Article CAS Google Scholar * Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME (2010) Genetic

architecture of complex traits and accuracy of genomic prediction: coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 6:e1001139 Article

Google Scholar * Henderson L, Miglior F, Sewalem A, Kelton D, Robinson A, Leslie KE (2011) Estimation of genetic parameters for measures of calf survival in a population of Holstein heifer

calves from a heifer-raising facility in New York State. J Dairy Sci 94(1):461–470. https://doi.org/10.3168/jds.2010-3243 Article CAS PubMed Google Scholar * Hoeschele I (1988) Genetic

evaluation with data presenting evidence of mixed major gene and polygenic inheritance. Theor Appl Genet 76(1):81–92. https://doi.org/10.1007/BF00288836 Article CAS PubMed Google Scholar

* Hoff JL, Decker JE, Schnabel RD, Taylor JF (2017) Candidate lethal haplotypes and causal mutations in Angus cattle. BMC Genomics 18(1):799 Article Google Scholar * Johnsson M, Gaynor

RC, Jenko J, Gorjanc G, de Koning DJ, Hickey JM (2019) Removal of alleles by genome editing (RAGE) against deleterious load. Genet Sel Evol 51(1):14.

https://doi.org/10.1186/s12711-019-0456-8 Article PubMed PubMed Central Google Scholar * Kadri NK, Sahana G, Charlier C, Iso-Touru T, Guldbrandtsen B, Karim L et al (2014) A 660-Kb

deletion with antagonistic effects on fertility and milk production segregates at high frequency in Nordic Red cattle: additional evidence for the common occurrence of balancing selection in

livestock PLoS Genet 10(1):e1004049 Article Google Scholar * Legarra A, Robert-Granié C, Croiseau P, Guillaume F, Fritz S (2011) Improved Lasso for genomic selection. Genet Res 93:77–87

Article CAS Google Scholar * Lourenco DA, Misztal I, Wang H, Aguilar I, Tsuruta S, Bertrand JK (2013) Prediction accuracy for a simulated maternally affected trait of beef cattle using

different genomic evaluation models. J Anim Sci 91(9):4090–4098. https://doi.org/10.2527/jas.2012-5826 Article CAS PubMed Google Scholar * Madsen P, Jensen J (2013) A user’s guide to

DMU. version 6, release 5.2. Aarhus University Foulum, Denmark Google Scholar * Meijering A, Gianola D (1985) Linear versus nonlinear methods of sire evaluation for categorical traits: a

simulation study. Genet Sel Evol 17(1):115–132. https://doi.org/10.1186/1297-9686-17-1-115 Article CAS PubMed PubMed Central Google Scholar * Pedersen LD, Sørensen AC, Henryon M,

Ansari-Mahyari S, Berg P (2009) ADAM: a computer program to simulate selectivebreeding schemes for animals. Livest Sci 121:343–344. https://doi.org/10.1016/j.livsci.2008.06.028. Article

Google Scholar * Peñagaricano F, Urioste JI, Naya H, de los Campos G, Gianola D (2011) Assessment of poisson, probit and linear models for genetic analysis of presence and number of black

spots in Corriedale sheep. J Anim Breed Genet 128(2):105–113. https://doi.org/10.1111/j.1439-0388.2010.00893.x Article PubMed Google Scholar * Portnoy S (1982) Maximizing the probability

of correctly ordering random variables using linear predictors. J Mult Anal 12:256–269 Article Google Scholar * Pryce JE, Hayes BJ, Goddard ME (2012) Novel strategies to minimize progeny

inbreeding while maximizing genetic gain using genomic information. J Dairy Sci 95:377–388 Article CAS Google Scholar * Rao S, Xia L (2000) Strategies for genetic mapping of categorical

traits. Genetica 109(3):183–197 Article CAS Google Scholar * Sahana G, Nielsen US, Aamand GP, Lund MS, Guldbrandtsen B (2013) Novel harmful recessive haplotypes identified for fertility

traits in Nordic Holstein cattle. PLoS One 20(12):e82909 Article Google Scholar * Sahana G, Iso-Touru T, Wu X, Nielsen US, de Koning DJ, Lund MS et al. (2016) A 0.5-Mbp deletion on bovine

chromosome 23 is a strong candidate for stillbirth in Nordic Red cattle. Genet Sel Evol 48:35 Article Google Scholar * Segelke D, Täubert H, Jansen S, Pausch H, Reinhardt F, Thaller G

(2014) Management of genetic characteristics. Interbull Bull 48:85–88 Google Scholar * Shuster DE, Kehrli Jr ME, Ackermann MR, Gilbert RO (1992) Identification and prevalence of a genetic

defect that causes leukocyte adhesion deficiency in Holstein cattle. Proc Nat Acad Sci USA 89(19):9225–9229 Article CAS Google Scholar * Thomasen JR, Liu H, Sørensen AC (2019) Genotyping

more cows increases genetic gain and reduces rate of true inbreeding in a dairy cattle breeding scheme using female reproductive technologies. J Dairy Sci 13.

https://doi.org/10.3168/jds.2019-16974. * Thompson PN, Heesterbeek JA, van Arendonk JA (2006) Changes in disease gene frequency over time with differential genotypic fitness and various

control strategies. J Anim Sci 84(10):2629–2635 Article CAS Google Scholar * Van Eenennaam AL, Kinghorn BP (2014) Use of mate selection software to manage lethal recessive conditions in

livestock populations. In: Proceedings of the 10th World Congress on Genetics Applied to Livestock Production, Vancouver, 17–22 Aug 2014 * VanRaden PM (2008) Efficient methods to compute

genomic predictions. J Dairy Sci 91:4414–4423 Article CAS Google Scholar * VanRaden PM, Olson KM, Null DJ, Hutchison JL (2011) Harmful recessive effects on fertility detected by absence

of homozygous haplotypes. J Dairy Sci 94:6153–6161 Article CAS Google Scholar * Wray NR, Goddard ME, Visscher PM (2007) Prediction of individual genetic risk to disease from genome-wide

association studies. Genome Res 17:1520–1528. https://doi.org/10.1101/gr.6665407 Article CAS PubMed PubMed Central Google Scholar * Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM

(2019) Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans: genomic prediction. Genetics 211(4):1131–1141. https://doi.org/10.1534/genetics.119.301859

Article PubMed PubMed Central Google Scholar * Wright S (1934) An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19:506 Article CAS Google

Scholar * Wu X, Mesbah-Uddin M, Guldbrandtsen B, Lund MS, Sahana G (2020) Novel haplotypes responsible for prenatal death in Nordic Red and Danish Jersey cattle. J Dairy Sci

103(5):4570–4578. https://doi.org/10.3168/jds.2019-17831 Article CAS PubMed Google Scholar Download references ACKNOWLEDGEMENTS This work was supported by the GUDP project “LiveCalf”

(No. 34009-16-1101) from the Ministry of Environment and Food of Denmark (Copenhagen). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Center for Quantitative Genetics and Genomics, Aarhus

University, Blichers Alle, 8830, Tjele, Denmark Grum Gebreyesus, Goutam Sahana, A. Christian Sørensen, Mogens S. Lund & Guosheng Su Authors * Grum Gebreyesus View author publications You

can also search for this author inPubMed Google Scholar * Goutam Sahana View author publications You can also search for this author inPubMed Google Scholar * A. Christian Sørensen View

author publications You can also search for this author inPubMed Google Scholar * Mogens S. Lund View author publications You can also search for this author inPubMed Google Scholar *

Guosheng Su View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to Guosheng Su. ETHICS DECLARATIONS CONFLICT OF INTEREST

The authors declare that they have no conflict of interest. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and

institutional affiliations. Associate editor: Sara Knott RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which

permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to

the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless

indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or

exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints

and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Gebreyesus, G., Sahana, G., Christian Sørensen, A. _et al._ Novel approach to incorporate information about recessive lethal genes

increases the accuracy of genomic prediction for mortality traits. _Heredity_ 125, 155–166 (2020). https://doi.org/10.1038/s41437-020-0329-5 Download citation * Received: 21 January 2020 *

Revised: 02 June 2020 * Accepted: 02 June 2020 * Published: 12 June 2020 * Issue Date: September 2020 * DOI: https://doi.org/10.1038/s41437-020-0329-5 SHARE THIS ARTICLE Anyone you share the

following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer

Nature SharedIt content-sharing initiative