Improving national-scale breeding bird surveys with integrated distance sampling

Improving national-scale breeding bird surveys with integrated distance sampling

Play all audios:

Loading...

ABSTRACT Bird population estimation over broad spatial and temporal scales is a key objective in ornithology. To date, bird ecologists mainly relied on standard point counts where the number


of detected individuals is interpreted as either the true abundance or proportionally related to it. However, providing accurate estimates of species abundance requires modelling the


observation process with temporally replicated data, which is not always possible with the increasing use of ever-bigger datasets from citizen science programs. Data integration methods


allow combining temporally replicated sampling at coarser spatial grains with data collected over larger spatial extents. Here, we developed an Integrated distance sampling (IDS) to combine


national structured and semi-structured citizen-based bird surveys in France to estimate species abundances using observation distances and accounting for availability, i.e. the probability


of individuals being detectable during a given sampling visit. While our simulation study showed an overall increase in the accuracy of estimated parameters for both ecological and


observation processes, without significant biases, our case study suggests that such model improvements will depend on specific sampling scenarios. Integrated models represent a promising


tool for ecological science, permitting the joint use of large unstructured datasets with scale-restricted structured surveys. SIMILAR CONTENT BEING VIEWED BY OTHERS JOINT ANALYSIS OF


STRUCTURED AND SEMI-STRUCTURED COMMUNITY SCIENCE DATA IMPROVES PRECISION OF RELATIVE ABUNDANCE BUT NOT TRENDS IN BIRDS Article Open access 24 November 2022 SUPPLEMENTAL STRUCTURED SURVEYS


AND PRE-EXISTING DETECTION MODELS IMPROVE FINE-SCALE DENSITY AND POPULATION ESTIMATION WITH OPPORTUNISTIC COMMUNITY SCIENCE DATA Article Open access 14 May 2024 RELIABILITY OF ENVIRONMENTAL


DNA SURVEYS TO DETECT POND OCCUPANCY BY NEWTS AT A NATIONAL SCALE Article Open access 25 January 2022 INTRODUCTION To estimate bird species abundance, ornithologists mainly rely upon


standardised point-count methods consisting of individual records of detected birds, either visual or auditory, over a given time period1. While standard models like GLMs (generalised linear


models) allow extrapolating these observed counts to novel unsampled conditions through covariates, they also imply that the number of detected individuals represent an accurate estimate of


true abundance, or corresponds to a constant proportion of the sampled population across space and time2,3. However, multiple studies have shown that this assumption is not always


viable2,4,5 because of variations in species detectability arising from observation errors3, or changes in species phenologies6. can affect the actual proportion of detected individuals. For


a set sampling effort, data collection faces a trade-off between (i) the sampling of a large quantity of unstructured data across a broad spatial scale, or (ii) sampling of highly


standardised data collected at a smaller scale7. Given the nature and volume of data collected by standard protocols, ecologists must address this issue relying increasingly on more or less


opportunistic or semi-structured Citizen Science (CS) programs8. However, reliable abundance estimates require additional information such as repeated visits, collection of detection


distances or data collected by multiple observers, to enable the combined modelling of the distinct ecological and observation processes9, see Box 1. While the ecological process corresponds


to species response to environmental covariates variations through space and/or time, the observation process depicts a probabilistic representation of mechanism underlying data


collection10. Nichols et al.,11 describe the observation process as being represented by four components; (i) the probability that individuals’ home ranges overlap the sampling units


\({p}_{s}\); (ii) given \({p}_{s}\), the probability that individuals are present on the sampling units during observers visits \({p}_{p}\); (iii) the probability that individuals’ are


available for detection (for instance, bird vocalizing during observer visits) denoted \({p}_{a}\) and (iv) the probability of detection given individuals presence and detectability


\({p}_{d}\). While \({p}_{s}\) is assessed through sampling design and \({p}_{d}\) can be inferred from specific data collection, such as detection distances; \({p}_{a}\) and \({p}_{p}\)


probabilities require temporal replicates to be estimated12. Ecological inferences, explicitly accounting for the ecological and observation processes, require flexible statistical tools


such as hierarchical models able to account for global model complexity by a succession of submodels of lesser complexity13. These models vary depending on the studied ecological process14


from species presence/absence – (occupancy models; Ref.15) to species abundance – (hierarchical distance sampling,16; N-mixture models,17) or demographic parameters estimation –


(Cormack-Jolly-Seber models, Ref.18). In the last two decades, Citizen Science has seen exponential growth19 thanks to the development of several online databases such as eBird


(www.ebird.org), iNaturalist (www.inaturalist.org) and GBIF (www.gbif.org) aiming to handle observation data collected by volunteers20 over increasingly longer temporal and larger spatial


scales21. These databases rely mostly on opportunistic data, information gathered without sampling design or focused taxa21. While the use of metadata and ad hoc filters can increase the


value of collected data22, Citizen Science tends to lack specificities of structured surveys, including intra- and inter-year repeated visits23,24. Data integration, or the simultaneous


joint analysis of an ecological process using multiple datasets25 developed a growing interest in recent years26,27. It is used, for instance, in the case of complex ecological inference


requiring different data sources, such as integrated population models (IPM;25). These models rely on count data as well as nest monitoring and/or banding to infer population spatiotemporal


variations and population growth parameters28,29; or to combine data collected at different spatial and/or temporal resolutions30. Here, we focus on data collected for breeding bird atlases,


depicting known distribution and population size estimates using data collected over a short timeframe. In France, the previous breeding bird atlas31 was based on a semi-quantitative method


to estimate national population size32. This approach extrapolated bird densities locally determined over a few local areas without accounting for the detection process. It resulted in


biased estimations of French breeding bird populations when compared to estimates inferred from a structured CS scheme EPOC-ODF (Structured Estimation of Common Bird Population Size, see33).


While structured schemes result in intensive data collection to collect high-quality data, they tend to be conducted over a rather limited spatial extent. In contrast, semi-structured


schemes aim at overcoming this issue to gather interpretable data while still enlisting the largest possible number of observers and associated field data34. For our study, we used datasets


from both the structured CS scheme EPOC-ODF and the semi-structured CS scheme EPOC (Estimation of Common Bird Population Size), where one scheme allows inference of the detection process


through repeated visits, while the other focuses on the collection of environmental data without repeated visits, akin to a double-sampling design35. Recent studies have shown the potential


of data integration on ecological inferences combining data from multiple data sources for occupancy modelling36,37 and species abundance estimates38. In this study, we relied on a joint


likelihood approach39 based on the integrated distance sampling (IDS) formulation from38. While Kéry et al.,38 formulated an IDS model integrating data from unreplicated distance sampling


data using point count and detection/non-detection data assessing species availability through list duration, we aim to calibrate an IDS model accounting for species availability through


temporal replicates. Availability, or temporary emigration11,12,17, can represent different biological processes, such as (i) random temporary emigration, when individuals display


conspicuous behaviours allowing increased detection rate during survey (birds vocalisations40, burrowing or diving41,42); (ii) spatial temporary emigration, where individuals remain


undetected due to being physically outside the sampled sites during survey period; and (iii) availability resulting from variation in population-level processes, such as recruitment,


survival, emigration or immigration13,43. Survey duration, addressed in38, accounts primarily for random temporary emigration where individuals could be present on site but remained


undetected due to a lack of emitted vocal or visual cues. In contrast, temporal replicates across broader time scales, used in this study, mainly account for spatial temporary emigration


instead. In this manuscript, we applied the developed IDS model to a structured and semi-structured dataset, EPOC-ODF and EPOC, collected over three French regions under distinct data


collection schemes. We compared ecological and observation parameters estimates from the IDS model to those obtained from a HDS model calibrated using only data collected by EPOC-ODF to test


if data integration could lead to improvement in the accuracy of estimated parameters, _i.e._ reduction of their uncertainties. In addition, we conducted a simulation study aiming (i) to


assess model identifiability, _i.e._, its capabilities to accurately estimate parameters; and (ii) to test potential improvement in estimated accuracy over multiple ranges of variation of


simulated species availability, detectability and sampling scenarios. MATERIAL AND METHODS HIERARCHICAL DISTANCE SAMPLING Hierarchical distance sampling (HDS) model aimed to estimate species


abundance while taking account of the observation process13. As conventional distance sampling assumes perfect detection44 at a null distance from the observers (i.e. \(f(x=0) = 1\), see


below), HDS can relax this assumption by assessing the probability that the individual is present and available for detection during survey occasions17 through lists duration or multiple


visits at the same site. Considering a population following Poisson distribution with mean \({\lambda }_{i}\), at each site i = 1,2,..,I we have the local population size \({M}_{i}\):


$${M}_{i}\sim Poisson({\lambda }_{i})$$ Given multiple visits _j_ (_j_ = 1,2, ... ,J), at site _i_, the number of individuals available for detection \({N}_{i,j}\) follows a binomial


distribution from the local population \({M}_{i}\) with a probability of being exposed to sampling, i.e. available for detection, \({\varphi }_{i,j}\): $${N}_{i,j}\sim


Binomial({M}_{i},{\varphi }_{i,j})$$ For each site _i_ and visit _j_, observers measure the distance of observation between themselves and detected individuals. A vector of cell


probabilities \({\pi }_{i,j}\) derived from a detection function _f_44, assigns probabilities to distinct distance bins. Observation \({y}_{i,j}\) can then be described as a multinomial


outcome given the number of individuals available for detection and its distance (\({x}_{i,j}\)): $${y}_{i,j}\sim Multinomial\left({N}_{i,j},{\pi }_{i,j}\right), with\ {\pi


}_{i,j}=f({x}_{i,j},\sigma )$$ In our study, we relied on point count data using observation distances between observers and detected individuals. We also considered a half-normal model,


with parameter (\(\sigma\)) for the detection function. SIMULATION STUDY 1: MODEL IDENTIFIABILITY For simulation study 1, we generated 1000 cases each consisting of a structured dataset,


with 9 temporal replicates, collected over 200 sites and a semi-structured dataset containing 1000 sites with single visits over one season (Fig. 1). For each case, we randomly generated


sets of parameters related to the ecological and observation processes, with (\({\beta }_{0}\)) species mean abundance, (\(\beta\)) effect of covariate \({X}_{i}\) on species abundance;


(\({\varphi }_{0}^{DSopen};{\varphi }_{0}^{DS}\)) depicting mean species availability estimated by, respectively the structured and semi-structured dataset; (\(\gamma\)) effect of covariates


\({U}_{i,j}\) and \({V}_{i}\) over species availability; (\({\sigma }_{0}\)) mean species detectability and (\(\alpha\)) effect of covariate \({Z}_{i,j}\) over species detectability, see


Box 1 and Eq. (1). We also included residual errors on species abundance and species detectability, respectively (\({\varepsilon }_{i}^{abund}\); \({\varepsilon }_{i}^{det}\)) generated from


a normal distribution of mean 0 and standard deviation (\({\sigma }_{{\varepsilon }^{abund}}\); \({\sigma }_{{\varepsilon }^{det}}\)).


$$\left\{\begin{array}{c}\begin{array}{c}\text{log}\left({\lambda }_{i}\right)={\beta }_{0}+\beta *{X}_{i}+{\varepsilon }_{i}^{abund} \\ logit\left({\varphi }_{i,j}^{DSopen}\right)= {\varphi


}_{0}^{DSopen}+\gamma *{U}_{i,j} \\ logit\left({\varphi }_{i}^{DS}\right)={\varphi }_{0}^{DS}+\gamma *{V}_{i}\end{array}\\ \text{log}\left({\sigma }_{i,j}\right)={\sigma }_{0}+\alpha


*{Z}_{i,j}+ {\varepsilon }_{i,j}^{det}\end{array}\right.$$ (1) We used an altered version of the function simHDSopen from _AHMbook_45 to simulate the datasets. All models were fitted using


JAGS 4.3.146 through the _jagsUI_47 R package, while MCMC samples were retrieved using _mcmcoutput_48. See appendix S1 for MCMC parameters and priors used for simulation and case study.


SIMULATION STUDY 2: ESTIMATES ACCURACY ACROSS DIFFERENT SAMPLING SCENARIOS In simulation study 2, we aimed to assess improvement in accuracy of estimated parameters through data integration


across multiple sampling scenarios. We used the same 1000 cases generated in simulation study 1, but varied the number of structured sites (ranging from 50 to 300) and semi-structured lists.


The latter was determined by the multiplication of the number of structured sites, using a ratio ranging from 1 to 6. We defined ranges of the number of structured sites and ratio of added


semi-structured lists based on the proportion of sampling schemes in our case study, see Fig. 2 and appendix S2. Inference improvement was associated with a reduction of uncertainty (_i.e_.


reduction of the posterior distribution spread of estimated parameters using the 95% credible intervals CRI). We calibrated a linear model of the log-transformed CRI width to assess if its


reduction was affected by factors such as the model formulation used (either HDS or IDS) or estimated parameters. As we expect that model formulation could benefit from the number of input


data, we included an interaction between model formulation and the simulated sampling design, _i.e._ the number of simulated structured sites and the ratio of added semi-structured sites. As


the response variable of our intended model is derived from simulation results, we conducted a bootstrap to assess variation of CRI reduction through resamples over simulated cases and


their associated parameters. Confidence intervals were estimated using 100 linear models, each based on resamples of 250 from converging IDS and HDS models. CASE STUDY We relied on EPOC-ODF


(Structured Estimation of Common Bird Population Size) and EPOC (Estimation of Common Bird Population Size) citizen science schemes data collected over 2021–2023 breeding seasons. These two


schemes consist of 5-min point count completed checklists, during which observers point locations of detected individuals using the mobile app NaturaList49. Observation distances between


observers and detected individuals are measured through GIS (Geographic Information System) using observers location determined by GPS. We used data from 31 bird species collected during


their breeding season over 2021–2023 across three French regions (Bourgogne-Franche-Comté, Nouvelle-Aquitaine and Normandie). These regional datasets differ in terms of data quantity


providing diverse distributions of structured and semi-structured data collections (Fig. 2.). The EPOC scheme does not constrain observers to pre-selected sites, nor require repeated visits


whereas, for EPOC-ODF, survey locations are randomly selected from a systematic grid and have to be visited three times during the breeding season, each session consisting of three


successive 5-min point counts. For the semi-structured dataset (EPOC), we applied a spatial filter to select EPOC lists collected at least two kilometres away from sites with temporal


replicates (EPOC-ODF) and other EPOC lists, see appendix S2. For each species, we calibrated a HDS model, using only data collected by the EPOC-ODF schemes and an IDS model using data


collected by both schemes. Bird species selection was based upon targeted species from the two schemes33 and had a sufficient number of observations, at least detected once at 20 distinct


EPOC-ODF sites, in each region. We applied a temporal filter that considered both observed bird activities during the breeding season and expert knowledge to define the breeding phenology of


each targeted species and exclude potential early or late migrants. For each species, we applied a right-side truncation of 5% over the observation distance to remove extreme distance


values for model robustness50. We modelled the population size of a site \({M}_{i}\) using a Zero-inflated Poisson with parameter \({\mu }_{i}\) (Fig. 3): $${M}_{i}\sim Poisson\left({\mu


}_{i}\right), with \ {\mu }_{i}={\lambda }_{i}*(1-{\omega }_{i})$$ The expected species abundance parameter (\({\lambda }_{i}\)) was modelled using reduced habitat51 and bioclimatic52


covariates obtained through PCA33. $$log\left({\lambda }_{i}\right) ={\beta }_{0}+{\sum }_{a=1}^{3}{\beta }_{a}*Habitat PCA{s}_{i}+{\sum }_{a=4}^{6}{\beta }_{a}*Bioclimatic


PCA{s}_{i}+{\varepsilon }_{i}^{abund}$$ The zero-inflation parameter (\({\omega }_{i}\)) corresponds to site suitability depicted by a Bernoulli process with the probability (\({\rho


}_{i}\)) of a site being considered unsuitable. We modelled \({\rho }_{i}\) in regards to site ecoregions, as a categorical variable53 and its spatial continuity54. We also included a site


random effect for abundance (\({\varepsilon }_{i}^{abund}\)). $${\omega }_{i}\sim Bernoulli({\rho }_{i})$$ $$logit\left({\rho }_{i}\right)={\rho }_{0}+{\sum }_{a=1}^{e}{\delta


}_{a}^{cat}*Ecoregio{n}_{i}+\delta *Spatial\ continuit{y}_{i}$$ From the sampling scheme and temporal intervals between EPOC-ODF sessions, we considered that species availability primarily


reflected spatial temporary emigration, due to migratory arrivals and departures during breeding seasons, potentially affecting the number of individuals potentially present on sites during


surveys. Consequently, we modelled the probability of an individual being available for detection (\({\varphi }_{i,j}\)) using covariates such as hour from sunrise and julian date with


quadratic effect to represent birds’ phenology across the breeding season. In the IDS model, we included a categorical covariate (\({\gamma }^{cat}\)) to account for variations in species


availability due to the difference of temporal sampling over breeding seasons of the two schemes. $$logit\left({\varphi }_{i,j}\right)={\varphi }_{0}+{\gamma }_{1}*Da{y}_{j}+{\gamma


}_{2}*Da{y}_{j}^{2}+{\gamma }_{3}*Hr.su{n}_{j}+{\gamma }_{4}*Hr.su{n}_{j}^{2}+{\varepsilon }_{i,j}^{avail}+ {\eta }_{i}^{avail}$$ For species detectability, we used a half-normal detection


function with parameter (\({\sigma }_{i,j}\)), where we modelled observers detection probabilities in regards to observed distances using categorical variables describing the habitat over


four categories (Agricultural, Forest, Open and Urban;33) as well as the distance between their GPS locations and the nearest road55. For the IDS model, we considered two distinct intercepts


allowing calibration of two separate detection functions, one for each dataset. $$log \left({\sigma }_{i,j}\right) ={\sigma }_{0}+\alpha *Dist.Roa{d}_{i}+{\sum }_{a=2}^{4}{\alpha


}_{a}^{cat}*Near\ habita{t}_{i}+{\varepsilon }_{i,j}^{det}+{\eta }_{i}^{det}$$ For species availability and detectability; we accounted for the study design of the structured dataset by


implementing random effects over each session (\({\varepsilon }_{i,j}^{avail}\) and \({\varepsilon }_{i,j}^{det}\)) while also adding observers random effect over surveyed sites or lists


(\({\eta }_{i}^{avail}\) and \({\eta }_{i}^{det}\)), as one observer can partake in both CS schemes, see appendix S1 for used priors. We fitted two linear mixed-effects models for assessing


CRI reduction and shift in means of estimated parameters between the IDS and HDS. For both linear models, we considered a fixed effect of estimated parameters and included an interaction


between model formulation and studied regions. We also added nested random effects over species and studied regions to account for specific species response for each region. Models were


fitted using _lme4_56. We used _emmeans_57 to estimate marginal means from the linear model and pairwise post hoc multiple comparisons. For the case study analysis, we removed (\({\gamma


}^{cat}\)) and (\({\delta }^{cat}\)) parameters from comparison as the \({\gamma }^{cat}\) is not estimated in the HDS formulation and \({\delta }^{cat}\) parameters varied across studied


regions. RESULTS SIMULATION STUDY SIMULATION STUDY 1 For simulation study 1, 861 out of 1000 simulated datasets resulted in converging models. Overall, the IDS model demonstrated its ability


to accurately estimate the parameters for both the ecological and observation processes. While \({\varphi }_{0}^{DSopen}\), \({\varphi }_{0}^{DS}\), \({\sigma }_{{\varepsilon }^{abund}}\)


and \({\sigma }_{{\varepsilon }^{det}}\) parameters appeared to have lower precision, all parameters had a coefficient of correlation (R2) above 0.85 between their simulated and estimated


values (Fig. 4; S4.1). We also see that estimation of \({\beta }_{0}\) were centered over the generated value across all simulation. See appendix S3 for an analysis of model convergence of


simulation studies. SIMULATION STUDY 2 For simulation study 2, out of 1000 simulated datasets, we had 892 converging models using the HDS formulation and 930 converging models using the IDS


formulation. There were no signs of major bias between simulated and the mean of parameter estimates considered (Fig. 5). Bootstrap resamples were based on 844 converging models for both the


HDS and IDS formulation. We obtained a considerable reduction of CRI width across all estimated parameters for the IDS model (Fig. 6a). Overall, the IDS and the HDS models produced narrower


CRI for available and more easily detectable species, however, the IDS model produced narrower CRI, for equivalent species availability-detectability profiles simulated than the HDS (Fig. 


6b). The number of structured sites, i.e. including temporal replicates, was correlated with a reduction of CRI width for both models (Fig. 6c), although the IDS model CRI reduction was also


correlated with an increasing proportion of semi-structured sites added to the calibration dataset (Fig. 6c). While the increasing proportion of semi-structured sites added to the


calibration dataset had no substantial effect on the HDS model, we found an important correlation to a CRI reduction for the IDS model (Fig. 6c). See Appendix S6 for a comparison of CRI


reduction in simulation study 2, where three temporal replicates were considered instead of nine. CASE STUDY Marginal effect plots from the linear model (Fig. 7) showed that Credible


Intervals (CRI) were slightly wider for Normandie (Nor), the region with fewer structured sites and semi-structured sites than Bourgogne-Franche-Comté (BFC), the region with a few numbers of


structured sites and a large number of semi-structured sites, and Nouvelle-Aquitaine (NvA), the region with a larger number of semi-structured sites (Fig. 2). While there were no


considerable differences (indicated by an overlap of marginal response confidence intervals) between the HDS and IDS CRI for NvA and Nor, CRI from the IDS model were considerably narrower


than the HDS ones for all estimated parameters in BFC (Fig. 7 and appendix S5). Pairwise comparison of marginal means showed no signs of significant differences (p-values > 0.05) between


the HDS and IDS mean estimated parameters across all monitored parameters and studied regions. Squared-GVIFs (Generalised Variance-Inflation Factor; Ref.58), measured using _car_ R


package59, were less than 4, showing no signs of multicollinearity for the terms used in each model. DISCUSSION The present work brings new evidence that Integrated Distance Sampling (IDS)


models can accurately identify parameters of a complex ecological process and expand their application accounting for species availability determined through repeated visits. Moreover, it


also shows that data integration improves ecological inference, through the reduction of credible intervals (CRI) width, for all parameters of the studied ecological process, across multiple


sampling design scenarios and species availability-detectability continuums. Results from the case study further strengthen the simulation study, by showing that this reduction of CRI span


without significant variations of estimated mean parameters depends on the ratio of structured and semi-structured data used for each case study. In recent years, there has been an increase


in the interest for integrated models26,27 due to their efficiency in reducing potential biases inherent to a single dataset60 and allowing reliance on automated and non-invasive data


collection methods61,62. Data integration through joint likelihood63 still has potential drawbacks when temporal and/or spatial mismatches, corresponding to discontinuity between dataset


timeframes and spatial heterogeneity, are unaccounted for. Such mismatches could lead to biased inferences where the sampled timeframes and/or regions do not correctly represent the


ecological process of interest63,64. In our simulation studies, we did not include spatial bias in data collection, which could potentially misrepresent citizen science spatial sampling


bias65. We accommodated this mismatch in the case study through a spatial filter over the semi-structured dataset based upon the 2 × 2 km systematic grid resolution of the structured scheme.


This resulted in an important decrease in available data from the semi-structured dataset (see appendix S2) that could be resolved through random effects in the model27. In our case, we


could consider distinct spatial subsets of the semi-structured dataset and implement them into a random effect structure encompassing the modelled sub-processes of the integrated model.


While hierarchical models offer a viable option to disentangle variations due to the observation process from the variations originating from the ecological process of interest9, the


trade-off between data specificities and data quantity can limit their applications. Data integration corresponds to a valuable option to increase the number of available data to help


calibrate such models. Data integration also needs to account for sampling schemes specificities and their potential effect on estimated parameters. For instance, the variation of species


availability in regards to list duration between standardised schemes and non-standardised schemes with varying durations38. Options to calibrate integrated hierarchical models exist in a


frequentist framework38 allowing fast computation. However, given the types of available data and ecological processes of interest, data integration is prone to rely on Bayesian frameworks.


Bayesian computation is based on Markov chain Monte Carlo (MCMC) techniques which are computationally intensive66. Novel approaches exist such as Integrated Nested Laplace Approximation


(INLA) or Bayesian emulation67,68 allowing efficient computation and facilitating implementation of spatial components69. Integrated models could represent an important tool for


macro-ecology related studies, spanning across large spatial scales27 or requiring multiple institutions to coordinate data collection70,71. It could be used, for instance, in the study of


bird populations across Europe from the pan-european common bird monitoring (PECBMS), which gathers data from point count, line transect, or territory mapping schemes across 28 countries and


varying numbers of fieldworkers72, while taking account of country discrepancies in sampling design, sampling effort or varying starting period that could alter estimation of long-term


trend73. The joint analysis of multiple data sources, notably through the use of data collected upon schemes lacking design-based methodology74, could represent a substantial increase in the


quantity of data available for the study of cryptic species75,76 and improve assessment of migratory patterns over large spatial scales77. It represents an influx of data for the estimation


of ecological processes of interest27, potentially reducing the sampling effort of robust designs. For instance, the number of temporal replicates considered in simulation studies and case


study exceeds that of most commonly used schemes. To assess data integration utility beyond our specific case, we conducted an additional simulation study considering a structured scheme


composed of three temporal replicates instead of nine (see appendix S6). Comparison of the HDS and IDS formulations over both temporal replicate quantities revealed that data integration had


a greater effect in parameter accuracy when applied to the less demanding structured survey. However, it remained less accurate than estimates derived from using only data collected from


the structured scheme with nine temporal replicates (Figure S6.1–3). Before their implementation, we highly recommend assessing whether ‘lessen’ structured sampling designs developed in a


data integration context are still capable of estimating the targeted ecological parameters or only partially, using power analysis78 and assessment of integrated models identifiability via


simulations79. Our results highlight the benefits of relying on statistical frameworks such as Integrated Models capable of improving estimates accuracy through expansion of usable data


collected from structured and semi-structured surveys. While our simulation results showed a constant reduction of estimates uncertainty, results from field surveys in three distinct French


regions, depicting distinct ratios in quantity of structured and semi-structured data, showed that this improvement is case-dependant and significantly reduced estimates uncertainty with a


low quantity of structured data and high quantity of semi-structured data. While we advocate for thorough planning before sampling, this suggests that Integrated Models could represent a


conceivable alternative in case of insufficient collection from structured surveys and could also greatly benefit from data collected by citizen science schemes. DATA AVAILABILITY Scripts,


BUGS model files and data for simulation and case studies replications are available online: https://doi.org/10.5281/zenodo.11452853 REFERENCES * Blondel, J., Ferry, C. & Frochot, B.


Point counts with unlimited distance. Stud. Avian Biol. (1981). * Yoccoz, N. G., Nichols, J. D. & Boulinier, T. Monitoring of biological diversity in space and time. _Trends Ecol. Evol._


16, 446–453 (2001). Article  Google Scholar  * Kellner, K. F. & Swihart, R. K. Accounting for imperfect detection in ecology: A quantitative review. _PLoS One_ 9, e111436 (2014).


Article  ADS  PubMed  PubMed Central  Google Scholar  * Burnham, K. P. Summarizing remarks: Environmental influences. _Stud. Avian Biol._ 6, 324–325 (1981). Google Scholar  * Thompson, W. L.


Towards reliable bird surveys: Accounting for individuals present but not detected. _Auk_ 119, 18–25 (2002). Article  Google Scholar  * Lehikoinen, A. Climate change, phenology and species


detectability in a monitoring scheme. _Popul. Ecol._ 55, 315–323 (2013). Article  Google Scholar  * Devictor, V., Whittaker, R. J. & Beltrame, C. Beyond scarcity: citizen science


programmes as useful tools for conservation biogeography. _Divers. Distrib._ 16, 354–362 (2010). Article  Google Scholar  * Castagneyrol, B. et al. Can school children support ecological


research? Lessons from the Oak bodyguard citizen science project. Citiz. Sci. Theory Pract. 5 (2020). * Guillera-Arroita, G. Modelling of species distributions, range dynamics and


communities under imperfect detection: Advances, challenges and opportunities. _Ecography_ 40, 281–295 (2017). Article  ADS  Google Scholar  * Royle, J. A. & Dorazio, R. M. Hierarchical


modeling and inference in ecology: The analysis of data from populations, metapopulations and communities. https://pubs.usgs.gov/publication/5200344 (2008). * Nichols, J. D., Thomas, L.


& Conn, P. B. In Model. Demogr. Process. Mark. Popul. (eds. Thomson, D. L., Cooch, E. G. & Conroy, M. J.), 201–235 (Springer US, 2009). https://doi.org/10.1007/978-0-387-78151-8_9 *


Mizel, J. D., Schmidt, J. H. & Lindberg, M. S. Accommodating temporary emigration in spatial distance sampling models. _J. Appl. Ecol._ 55, 1456–1464 (2018). Article  Google Scholar  *


Kéry, M. & Royle, J. A. _Applied Hierarchical Modeling in Ecology: Analysis of Distribution, Abundance and Species Richness in R and BUGS: Volume 1: Prelude and Static Models_ (Academic


Press, 2016). Google Scholar  * King, R. Statistical ecology. _Annu. Rev. Stat. Appl._ 1, 401–426 (2014). Article  Google Scholar  * MacKenzie, D. I. et al. Estimating site occupancy rates


when detection probabilities are less than one. _Ecology_ 83, 2248–2255 (2002). Article  Google Scholar  * Sollmann, R., Gardner, B., Williams, K. A., Gilbert, A. T. & Veit, R. R. A


hierarchical distance sampling model to estimate abundance and covariate associations of species and communities. _Methods Ecol. Evol._ 7, 529–537 (2016). Article  Google Scholar  *


Chandler, R. B., Royle, J. A. & King, D. I. Inference about density and temporary emigration in unmarked populations. _Ecology_ 92, 1429–1435 (2011). Article  PubMed  Google Scholar  *


Gimenez, O. et al. State-space modelling of data on marked individuals. _Ecol. Model._ 206, 431–438 (2007). Article  Google Scholar  * Sullivan, B. L. et al. eBird: A citizen-based bird


observation network in the biological sciences. _Biol. Conserv._ 142, 2282–2292 (2009). Article  Google Scholar  * Bonney, R. et al. Citizen science: A developing tool for expanding science


knowledge and scientific literacy. _Bioscience_ 59, 977–984 (2009). Article  Google Scholar  * Hochachka, W. M. et al. Data-intensive science applied to broad-scale citizen science. _Trends


Ecol. Evol._ 27, 130–137 (2012). Article  PubMed  Google Scholar  * Johnston, A. et al. Analytical guidelines to increase the value of community science data: An example using eBird data to


estimate species distributions. _Divers. Distrib._ 27, 1265–1277 (2021). Article  Google Scholar  * Bayraktarov, E. et al. Do Big Unstructured biodiversity data mean more knowledge? Front.


Ecol. Evol. 6 (2019). * Johnston, A., Matechou, E. & Dennis, E. B. Outstanding challenges and future directions for biodiversity monitoring using citizen science data. Methods Ecol.


Evol. (2022). * Schaub, M. & Abadi, F. Integrated population models: A novel analysis framework for deeper insights into population dynamics. _J. Ornithol._ 152, 227–237 (2011). Article


  Google Scholar  * Fletcher, R. J. et al. A practical guide for combining data to model species distributions. _Ecology_ 100, e02710 (2019). Article  PubMed  Google Scholar  * Zipkin, E. F.


et al. Addressing data integration challenges to link ecological processes across scales. _Front. Ecol. Environ._ 19, 30–38 (2021). Article  Google Scholar  * Besbeas, P., Freeman, S. N.,


Morgan, B. J. T. & Catchpole, E. A. Integrating mark-recapture-recovery and census data to estimate animal abundance and demographic parameters. _Biometrics_ 58, 540–547 (2002). Article


  MathSciNet  CAS  PubMed  Google Scholar  * Schaub, M. _Popul. Ecol. Pract._ 215–236 (Wiley-Blackwell, 2020). Google Scholar  * Keil, P., Wilson, A. M. & Jetz, W. Uncertainty, priors,


autocorrelation and disparate data in downscaling of species distributions. _Divers. Distrib._ 20, 797–812 (2014). Article  Google Scholar  * Issa, N. & Muller, Y. _Atlas des oiseaux de


France métropolitaine: Nidification et présence hivernale_ (DELACHAUX, 2015). Google Scholar  * Roché, J.-E., Muller, Y. & Siblet, J.-P. Une méthode simple pour estimer les populations


d’oiseaux communs nicheurs en France. _Alauda_ 81, 241–268 (2013). Google Scholar  * Nabias, J. et al. Reassessment of French breeding bird population sizes using citizen science and


accounting for species detectability. _PeerJ_ 12, e17889 (2024). Article  PubMed  PubMed Central  Google Scholar  * Kelling, S. et al. Using semistructured surveys to improve citizen science


data for monitoring biodiversity. _Bioscience_ 69, 170–179 (2019). Article  PubMed  PubMed Central  Google Scholar  * Mackenzie, D. I. & Royle, J. A. Designing occupancy studies:


General advice and allocating survey effort. _J. Appl. Ecol._ 42, 1105–1114 (2005). Article  Google Scholar  * Lauret, V., Labach, H., Authier, M. & Gimenez, O. Using single visits into


integrated occupancy models to make the most of existing monitoring programs. _Ecology_ 102, e03535 (2021). Article  PubMed  Google Scholar  * von Hirschheydt, G., Stofer, S. & Kéry, M.


“Mixed” occupancy designs: When do additional single-visit data improve the inferences from standard multi-visit models?. _Basic Appl. Ecol._ 67, 61–69 (2023). Article  Google Scholar  *


Kéry, M. et al. Integrated distance sampling models for simple point counts. _Ecology_ 105, e4292 (2024). Article  PubMed  Google Scholar  * Miller, D. A. W., Pacifici, K., Sanderlin, J. S.


& Reich, B. J. The recent past and promising future for data integration methods to estimate species’ distributions. _Methods Ecol. Evol._ 10, 22–37 (2019). Article  Google Scholar  *


Emlen, J. T. Estimating breeding season bird densities from transect counts. _Auk_ 94, 455–468 (1977). Google Scholar  * Andriolo, A. et al. The first aerial survey to estimate abundance of


humpback whales (_Megaptera_ _novaeangliae_) in the breeding ground off Brazil (Breeding Stock A). _J. Cetacean Res. Manag._ 8, 307–311 (2006). Article  Google Scholar  * Manning, J. A.


Factors affecting detection probability of burrowing owls in southwest agroecosystem environments. _J. Wildl. Manag._ 75, 1558–1567 (2011). Article  Google Scholar  * Kéry, M. & Royle,


J. A. _Applied Hierarchical Modeling in Ecology: Analysis of Distribution, Abundance and Species Richness in R and BUGS: Volume 2: Dynamic and Advanced Models_ (Academic Press, 2020). Google


Scholar  * Buckland, S. T., Rexstad, E. A., Marques, T. A. & Oedekoven, C. S. _Distance Sampling: Methods and Applications, Methods in Statistical Ecology_ (Springer International


Publishing, 2015). https://doi.org/10.1007/978-3-319-19219-2. Book  Google Scholar  * Kéry, M., Royle, A. & Meredith, M. AHMbook: Functions and data for the book ‘Applied Hierarchical


Modeling in Ecology’ Vols 1 and 2. https://cran.r-project.org/web/packages/AHMbook/index.html (2023). * Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs


sampling. In _Proc. 3rd Int. Workshop Distrib. Stat. Comput._ (2003). * Kellner, K. & Meredith, M. jagsUI: A wrapper around ‘rjags’ to Streamline ‘JAGS’ Analyses.


https://cran.r-project.org/web/packages/jagsUI/index.html (2021) * Juat, N., Meredith, M. & Kruschke, J. mcmcOutput: functions to store, manipulate and display Markov Chain Monte Carlo


(MCMC) Output. https://cran.r-project.org/web/packages/mcmcOutput/index.html (2022) * LPO France. Oiseaux de France: Fiche tutoriel.


https://oiseauxdefrance.org/get-involved/Tuto-EPOC-ODF.pdf * Buckland, S. T. et al. _Introduction to Distance Sampling: Estimating Abundance of Biological Populations_ (Oxford University


Press, 2001). Book  Google Scholar  * Thierion, V., Vincent, A. & Valero, S. Theia OSO Land Cover Map 2020. 10.5281/zenodo.6538861 (2022). * Fick, S. E. & Hijmans, R. J. WorldClim 2:


new 1-km spatial resolution climate surfaces for global land areas. _Int. J. Climatol._ 37, 4302–4315 (2017). Article  Google Scholar  * IGN. Sylvoécorégions - Cartographie des


sylvoécorégions. https://geo.data.gouv.fr/fr/datasets/a40c533b984bdcd33d8a38f2430a117672395bc0. (2011) * Guetté, A., Carruthers-Jones, J. & Carver, S. J. Projet CARTNAT Cartographie de


la Naturalité (2021). * Cote, C., Troncon, C., Troncon, C. & Troncon, C. ROUTE 500® Version 3.0 - Descriptif de contenu. 27 (2021). * Bates, D., Mächler, M., Bolker, B. & Walker, S.


Fitting linear mixed-effects models using lme4. _J. Stat. Softw._ 67, 1–48 (2015). Article  Google Scholar  * Lenth, R. V. et al. emmeans: Estimated Marginal Means, aka Least-Squares Means.


https://cran.r-project.org/web/packages/emmeans/index.html (2024). * Fox, J. & Monette, G. Generalized collinearity diagnostics. _J. Am. Stat. Assoc._ 87, 178–183 (1992). Article  Google


Scholar  * Fox, J. et al. car: Companion to applied regression. https://cran.r-project.org/web/packages/car/index.html (2023). * Zipkin, E. F., Inouye, B. D. & Beissinger, S. R.


Innovations in data integration for modeling populations. _Ecology_ 100, 1–3 (2019). Article  Google Scholar  * Doser, J. W., Finley, A. O., Weed, A. S. & Zipkin, E. F. Integrating


automated acoustic vocalization data and point count surveys for estimation of bird abundance. _Methods Ecol. Evol._ 12, 1040–1049 (2021). Article  Google Scholar  * Pavanato Julião, H.


Development of integrated distance sampling models. https://ourarchive.otago.ac.nz/handle/10523/1249 (2021). * Isaac, N. J. B. et al. Data integration for large-scale models of species


distributions. _Trends Ecol. Evol._ 35, 56–67 (2020). Article  PubMed  Google Scholar  * Powney, G. D., Preston, C. D., Purvis, A., Van Landuyt, W. & Roy, D. B. Can trait-based analyses


of changes in species distribution be transferred to new geographic areas?. _Glob. Ecol. Biogeogr._ 23, 1009–1018 (2014). Article  Google Scholar  * Johnston, A., Moran, N., Musgrove, A.,


Fink, D. & Baillie, S. R. Estimating species distributions from spatially biased citizen science data. _Ecol. Model._ 422, 108927 (2020). Article  Google Scholar  * Dorazio, R. M.


Bayesian data analysis in population ecology: Motivations, methods, and benefits. _Popul. Ecol._ 58, 31–44 (2016). Article  Google Scholar  * Rue, H. et al. Bayesian computing with INLA: A


review. _Annu. Rev. Stat. Its Appl._ 4, 395–421 (2017). Article  ADS  Google Scholar  * Fer, I. et al. Linking big models to big data: efficient ecosystem model calibration through Bayesian


model emulation. _Biogeosciences_ 15, 5801–5830 (2018). Article  ADS  CAS  Google Scholar  * Blangiardo, M., Cameletti, M., Baio, G. & Rue, H. Spatial and spatio-temporal models with


R-INLA. _Spat. Spatio-Temporal Epidemiol._ 7, 39–55 (2013). Article  Google Scholar  * Navarro, L. M. et al. Monitoring biodiversity change through effective global coordination. _Curr.


Opin. Environ. Sustain._ 29, 158–169 (2017). Article  Google Scholar  * Silva del Pozo, M., Body, G., Rerig, G. & Basille, M. Guide on harmonising biodiversity monitoring protocols


across scales. 60 (Biodiversa+, 2023). <https://www.biodiversa.eu/wp-content/uploads/2023/10/Biodiversa_Best-practices_2023_v5_WEB.pdf. * Brlík, V. et al. Long-term and large-scale


multispecies dataset tracking population changes of common European breeding birds. _Sci. Data_ 8, 21 (2021). Article  ADS  PubMed  PubMed Central  Google Scholar  * Duchenne, F. et al.


Controversy over the decline of arthropods: A matter of temporal baseline?. _Peer Community J._ https://doi.org/10.24072/pcjournal.131 (2022). Article  Google Scholar  * Farr, M. T.,


Zylstra, E. R., Ries, L. & Zipkin, E. F. Overcoming data gaps using integrated models to estimate migratory species’ dynamics during cryptic periods of the annual cycle. _Methods Ecol.


Evol._ 15, 413–426 (2024). Article  Google Scholar  * Martin, M. E. et al. An integrated spatial capture–recapture approach reveals the distribution of a cryptic carnivore in a protected


area. _Ecosphere_ 14, e4634 (2023). Article  Google Scholar  * Twining, J. P. et al. Integrating presence-only and detection/non-detection data to estimate distributions and expected


abundance of difficult-to-monitor species on a landscape-scale. _J. Appl. Ecol._ https://doi.org/10.1111/1365-2664.14633 (2024). Article  Google Scholar  * Meehan, T. D. et al. Integrating


data types to estimate spatial patterns of avian migration across the Western Hemisphere. _Ecol. Appl._ 32, e2679 (2022). Article  PubMed  PubMed Central  Google Scholar  * Guillera-Arroita,


G. & Lahoz-Monfort, J. J. Designing studies to detect differences in species occupancy: power analysis under imperfect detection. _Methods Ecol. Evol._ 3, 860–869 (2012). Article 


Google Scholar  * Ogle, K. & Barber, J. J. Ensuring identifiability in hierarchical mixed effects Bayesian models. _Ecol. Appl._ 30, e02159 (2020). Article  PubMed  Google Scholar 


Download references ACKNOWLEDGEMENTS EPOC and EPOC-ODF schemes are supervised by LPO (Ligue pour la Protection des Oiseaux), the representative of Birdlife in France. We thank all


participants and local coordinators from regional instances for contributing to data collection through faune-france.org, a collaborative online database. We would like to express our thanks


to Benoit Fontaine for his comments on prior versions of the manuscript, we also would like to thank Marc Kéry, Niggel Yoccoz and an anonymous reviewer for their comments and suggestions


that greatly improved this work. The IDS and HDS models were calibrated on the SACADO MCMeSU platform at Sorbonne Université in Paris – France. FUNDING Funding for this work was provided


through OFB (Office Français de la Biodiversité), LPO and ANRT (Association Nationale Recherche Technologie; CIFRE grant, number: 2021/0305). The French Ministry of the Environment and OFB


support the LPO through multi-year objectives agreements, in particular, to consolidate several EBV relating to birds, based on citizen science monitoring. AUTHOR INFORMATION AUTHORS AND


AFFILIATIONS * LPO-BirdLife France, Fonderies Royales, Rochefort Cedex, France Jean Nabias, Jérémy Dupuy & Laurent Couzi * CESCO, Muséum National d’Histoire Naturelle, CNRS,


Sorbonne-University, Paris, France Jean Nabias, Romain Lorrillière & Luc Barbaro * Centre de Recherches sur la Biologie des Populations d’Oiseaux (CRBPO), MNHN-CNRS-OFB, Paris, France


Romain Lorrillière * Dynafor, INRA-INPT, University of Toulouse, Auzeville, France Luc Barbaro Authors * Jean Nabias View author publications You can also search for this author inPubMed 


Google Scholar * Romain Lorrillière View author publications You can also search for this author inPubMed Google Scholar * Jérémy Dupuy View author publications You can also search for this


author inPubMed Google Scholar * Laurent Couzi View author publications You can also search for this author inPubMed Google Scholar * Luc Barbaro View author publications You can also search


for this author inPubMed Google Scholar CONTRIBUTIONS All authors contributed to the current work. Study conceptualization was led by JN, RL, LB. Data collection management was performed by


JD and LC. Analysis was carried out by JN and RL. Project administration was supervised by LB and LC. Writing was led by JN. Writing and reviews were conducted by JN and LB. CORRESPONDING


AUTHOR Correspondence to Jean Nabias. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains


neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION. RIGHTS AND PERMISSIONS OPEN ACCESS This


article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as


you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party


material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s


Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.


To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Nabias, J., Lorrillière, R., Dupuy, J. _et


al._ Improving national-scale breeding bird surveys with integrated distance sampling. _Sci Rep_ 15, 18312 (2025). https://doi.org/10.1038/s41598-025-96787-w Download citation * Received: 27


June 2024 * Accepted: 28 March 2025 * Published: 26 May 2025 * DOI: https://doi.org/10.1038/s41598-025-96787-w SHARE THIS ARTICLE Anyone you share the following link with will be able to


read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing


initiative KEYWORDS * Bird monitoring * Citizen science * Distance sampling * Data integration * Hierarchical modelling * Observation process