# Mapping brucellosis risk in Kenya and its implications for control strategies in sub-Saharan Africa

### Sample size determination

We conducted this national cross-sectional survey in Kenya between November 2020 and August 2021. The study covered all the five agro-ecological zones in Kenya, including agro-alpine, high and medium potential, semi-arid, arid, and very arid zones. Since the study aimed to estimate the *Brucella* seroprevalence at the national level, we estimated the sample size using the standard formula for determining a population proportion^{15}. In the absence of reference data, we assumed an a priori prevalence of 50%. Based on this assumption, the initial sample size generated was 384 animals. However, two adjustments were made to ensure robustness of the study population size. The first adjustment accounted for potential clustering of the outcome at the herd level, a characteristic expected of *Brucella* based on previous studies^{10,16,17,18}. We estimated a design effect of 8.2 based on an assumed intra-cluster correlation of 0.3 and a maximum of 25 cattle sampled per herd to meet the expectations of the Central Limit Theorem. This adjustment increased the sample size estimate from 384 to 3,150 animals. The second adjustment aimed to amplify the sample size to account for potential confounding variables. In this step, we assumed that the prediction model for *Brucella* would have at least two continuous predictors, each with a significant correlation of 0.5 with the outcome. Meeting the preceding assumptions required sampling 6,700 animals from 268 herds or sampling points.

### Sampling

A two-stage sampling technique was employed. The first stage involved selecting households across the country within the five Agro-Ecological Zones (AEZs) using random geographic coordinates (RGC) distributed proportionately to the geographical coverage and livestock population in each AEZ (Fig. 1). In addition, the second stage selected 25 healthy cattle per herd for sampling. After the RGCs were transferred to a handheld GPS to locate their physical locations within Kenya, the closest herds within a 5 km radius of each geolocation were identified for sampling. In the second stage, 25 healthy cattle were selected per herd to be included in the study. For cattle herds with more than 25, the team selected 25 animals randomly. If a chosen herd had fewer than 25 animals, additional nearby herds were included until the desired sample size of 25 animals per geolocation was achieved.

The team then administered a structured questionnaire to the household heads at each sampling point to capture herd-level data, including herd size, herd composition, and the presence or absence of reproductive syndromes within the past two years. For each randomly selected cattle, the questionnaire also captured the demographic characteristics such as age, sex and history of reproduction syndromes before the collection of blood samples.

### Blood sample collection and analysis

Animals were restrained appropriately and about 6 ml of blood was collected from the jugular vein using a barcoded plain vacutainer tube. The blood samples were initially kept for 15 min to allow clotting. They were then transported in a cool box to the field laboratory where serum was extracted after centrifugation at 1300 g for 10 min. The serum samples were aliquoted into barcoded cryovials and kept in a motorized freezer (− 20 ◦C), which was also used to transport the samples to a biosecurity level 2 laboratory at the International Livestock Research Institute (ILRI) in Nairobi. To detect circulating antibodies against *Brucella* spp., samples were analyzed in duplicates using an indirect enzyme-linked immunosorbent assay (ID screen Brucellosis serum Indirect ELISA Multispecies from IDvet innovative diagnostics, France), as per the manufacturer instructions, to detect antibodies against *Brucella abortus*, *Brucella melitensis* and *Brucella suis*. Testing and interpretation of results were done according to the guidelines provided by the manufacturer, with the mean optical density (OD) being used to determine a positive or negative status. The samples were analyzed in a biosecurity level 2 laboratory, as recommended for *Brucella*^{19,20,21}, while following the set biosafety and biosecurity measures for biosafety level 2 laboratory.

### Descriptive and risk factor analysis

We used the R statistical software (version 4.2.3)^{22} for data cleaning and analysis. Descriptive summaries were obtained via cross-classification tables using the *CrossTable* function in *gmodels* package^{23}, while risk factor analyses were implemented using Bayesian spatial logistic regression with a logit link and a binomial error distribution fitted using the integrated nested Laplace approximation (INLA) method^{24}. The model uses approximate Bayesian inference to estimate the posterior distribution of model parameters. The model was fitted to the data using the R-INLA function in the R-INLA package^{25}. INLA was preferred for these analyses because of its computational efficiency and accurate approximation of parameters of a wide class of generalized linear mixed models (GLMM), generalized additive models (GAM), spatial and spatio-temporal models using stochastic partial differential equations (SPDE)^{26}. The data were initially analysed using a GLMM, with herd ID as the random effect, but this model was later replaced by a GLMM model with a spatial random effect accounting for spatial autocorrelation.

The spatial logistic regression model is given in Eq. 1. The logit link function \({\eta }_{i}=log\left({\pi }_{i}/\left(1-{\pi }_{i}\right)\right)\) links the probability of animal-level *Brucella* seropositivity \({\pi }_{i}\) to the linear predictor.

$${\eta }_{i}={\beta }_{0}+\sum_{j=1}^{m}{\beta }_{j}{x}_{ji}+f({z}_{i})$$

In the linear predictor, \(\beta_{0}\) is the intercept, *m* is the number of covariates, \({\beta }_{j}\) is the regression slope for covariate \({x}_{j}\) and \(f(z_{i} )\) is a generic function accounting for spatial random effects. For the GLMM models \(f(z_{i} )\) was assumed to be structured according to independent and identically distributed Gaussian random effects (iid) represented by the herd ID. For the spatial model \(f(z_{i} )\) was approximated by an SPDE model.

Fitting the spatial Bayesian models and computing their posterior distributions and posterior predictive distributions requires computationally intensive numerical quadrature methods such as Bayesian quadrature or probabilistic numerics. The Integrated Nested Laplace Approximation (INLA) method is particularly well-suited for this purpose because it performs approximate Bayesian inference by directly calculating the posterior densities for Bayesian hierarchical models. A noteworthy and distinct advantage of R-INLA is its high computational efficiency and reliable approximation relative to the widely used Markov Chain Monte Carlo (MCMC) simulation method, even for spatial data with many observations. This enables rapidly fitting and exploring different contending models, gaining deep insights into the data, performing efficient cross-validation and improving transparency through quick code checking and running models by peers^{25}. Consequently, several recent studies have applied the SPDE approach implemented in R-INLA, including to neglected tropical diseases, such as visceral leishmaniasis in Ethiopia, onchocerciasis in Africa and Yemen and Ebola in the DRC^{26,27,28}.

### Univariable and multivariable analyses with the GLMM model

The univariable and multivariable models were fitted to the animal- and herd-level data in turns. The predictors considered for the analyses using the animal-level data included an animal’s age, sex and history of abortion, birth of weak calves, or retained placenta. The models that used herd-level data considered the effects of herd size, ecology and occurrence of abortion, birth of weak calves and retained placenta in the index herd. Parameter estimates for variables whose 95% credible intervals excluded zero were considered significant.

We also checked the 95% credible intervals for the estimated variance component to establish if it excluded zero. We built the multivariate model as follows. First, we ranked the univariate models using the Watanabe Information Criterion (WAIC). Second, starting with the best supported univariate model, we added the covariate from the second best supported model and the first-order interaction terms between the variables in the joint model. The second covariate was retained in the model if the WAIC for the joint model was smaller than for the best univariate model. Otherwise, the second covariate was dropeped from the joint model. Next, the covariate from the third best supported model was similarly added to the joint model and the process continued until all the covariates had been considered. Interaction terms higher than the first-order were omitted from the multivatiate model to limit or control the number of estimated parameters relative to the sample size or number of spatial sampling points.

### Spatial analyses

Because it used multiple spatial datasets, we commenced the spatial analysis by developing a causal web diagram to guide the selection of predictors for modelling (Figure S1). Previous studies indicate that *Brucella* is endemic in pastoral and agro-pastoral areas where livestock are raised in large herds, graze communally, and are likely to interact with wildlife^{9}. Climatic and environmental factors that influence types of livestock production systems and relevant socioeconomic practices in an area, were therefore considered as antecedent predictors. At the host level, individual characteristics such as species, age, physiological status, and production levels may also influence exposure patterns (Figure S1).

The spatial model was developed through five successive stages. The first involved the development of a mesh over the spatial domain and the second involved the specification of a projector matrix to connect the observed data with the nodes of the mesh (Supplementary text S2). The mesh was constructed using the Kenya shape file to define the location of the domain. We downloaded the shapefile from https://www.diva-gis.org/gdata. The third step involved defining the SPDE model; non-informative priors were specified in this case. The fourth and the fifth stages involved setting up an index for the spatial field and constructing a data stack that was required for fitting the model, respectively.

A total of 49 predictor variables were tested for their association with *Brucella* seropositivity. These included: (a) the host characteristics—sex and age—recorded during sampling; (b) spatial distribution of cattle, camels, sheep and goats based on census data from the Department of Veterinary Services of Kenya, and predictions from the gridded livestock of the world project^{27}; (c) environmental variables such as the aridity index, digital elevation indices, slope of the land surface, soil types, and (d) bioclimatic variables. A list of these datasets with a description of their resolutions and sources is provided in Table S3. A variable was considered significant if its 95% credible intervals excluded zero.

### Generating predictions from fitted models

The final model with ecological variables only was used to predict the probability of *Brucella* seroprevalence across the country. A full model with the animal-level factors could not be used for this purpose because there were no data on these characteristics for unsampled locations. We first generated a 5 km grid and centroids from the grid used to extract the predictor variables from relevant raster files to generate this prediction. We then plotted a map of the expected *Brucella* seroprevalence derived from the model predictions. Finally, we extracted and plotted the posterior marginal densities for the range and variance of the random effects. Estimates of these two parameters are relevant for designing future surveys and risk-based surveillance for *Brucella*.

### Ethics approval and consent to participate

The study obtained research approval from the National Commission for Science, Technology, and Innovation (NACOSTI) under reference number NACOSTI REF: 218346. The ILRI’s Institutional Animal Care and Use Committee (IACUC) also reviewed our protocols and granted an approval REF: ILRI-IACUC2021-18. The guidelines and regulations established by NACOSTI and IACUC were strictly adhered to during implementation of this study, while only including cattle from herds where informed consent was issued by the household head.