Seasonality and other risk factors for fleas infestations in domestic dogs and cats

Abstract Fleas in the genus Ctenocephalides are the most clinically important parasitic arthropods of dogs and cats worldwide yet risk factors that might increase the risk of infestation in small animals remains unclear. Here we developed a supervised text mining approach analysing key aspects of flea epidemiology using electronic health records from domestic cats and dogs seen at a sentinel network of 191 voluntary veterinary practices across Great Britain between March 2014 and July 2020. Our methods identified fleas as likely to have been present during 22,276 of 1,902,016 cat consultations (1.17%) and 12,168 of 4,844,850 dog consultations (0.25%). Multivariable logistic regression modelling found that animals originating from areas of least deprivation were associated with 50% reductions in odds of veterinary‐recorded flea infestation compared to the most deprived regions in England. Age of the animal was significantly associated with flea presentation in both cats and dogs, with cases peaking before animals reached 12 months. Cases were recorded through each study years, peaking between July and October, with fluctuations between each year. Our findings can be used towards healthcare messaging for veterinary practitioners and owners.


INTRODUCTION
There are 2500 species and sub-species of Siphonaptera flea across 18 families and 220 genera worldwide (Lewis, 1998), but within the sphere of companion animal veterinary medicine, just two of the cat flea Ctenocephalides felis (Bouche, 1835) (Siphonaptera: Pulicidae), and to a lesser extent the dog flea Ctenocephalides canis (Curtis, 1826) (Siphonaptera: Pulicidae), are of major health significance. Fleas are the most commonly seen ectoparasite in companion animal practice (Farkas et al., 2009), infesting homes and posing a considerable nuisance factor to owners. They are often associated with intense animal discomfort and flea allergy dermatitis and can act as vectors for several important zoonotic diseases (Florin et al., 2008;Lam & Yu, 2009;Pérez-Osorio et al., 2008).
Cat fleas are a cause of severe irritation and allergic dermatitis in susceptible, sensitized hosts causing allergic dermatitis, one of the most important dermatological conditions seen in small animal veterinary practices (Lam & Yu, 2009). Besides this, fleas bite people and can be vectors of zoonotic pathogens, such as Bartonella spp. and Rickettsia felis (Bai et al., 2017;Leulmi et al., 2014). Ctenocephalides felis readily completes its life cycle feeding on pets within homes or in peri-domestic settings where temperature and humidity conditions are suitable to support the immature flea stages in the environment (Cooper et al., 2020;Halos et al., 2014;Rust, 2017). This species does not appear to be particularly hostspecific; it is suggested that populations may be maintained in wildlife (Clark et al., 2018) which, in part, may explain its persistence in domestic infestations. There is a growing discussion about the roles of flea treatment with regards to the environmental impact they may have, following the recent demonstration of environmentally damaging levels of imidacloprid and fipronil residues in British rivers (Perkins et al., 2021).
There has been a large amount of previous research on domestic animal flea biology and control over the past 20 years such as a 2005 study across 31 UK veterinary practices that found flea infestations of 21.09% in cats and 6.82% in dogs finding increased infestations where a cat was present within a household with other pets (Bond et al., 2007). The overwhelming flea species across both cats and dogs was determined to be C. felis accounting for 98.83% in cats and 93.15% in dogs. A similar study in Hungary examining 13 veterinary practices found that 22.9% of cats and 14.1% in dogs were flea infested, and another study conducted in Italy across four veterinary practices reported 17.9% of dogs to be flea infested, also finding correlations between infestation and living with other dogs and cats (Farkas et al., 2009;Rinaldi et al., 2007). Recently, a national practicelevel survey completed in the UK involving 326 premises conducted examinations of 812 cats and 662 dogs between April and June 2018, of which they found 28.1% and 14.4% were infested with fleas respectively, and that 90% of recovered fleas was C. felis (Abdullah et al., 2019).
A survey-based study showed statistically significant geographical variation, including a significant decline in prevalence from south to north, but none of the animal factors investigated (breed, sex, neutered status, or whether the pet had been abroad) showed any relationship with the underlying geographical distribution (Cooper et al., 2020). Veterinary practice-level surveys generate important baseline information towards our understanding of the epidemiology of flea-associated disease, but interpretation of data is limited by the relatively small samples taken within a short period of time.
Many variables that may potentially lead to infestation and reinfestation of domestic pets remain undefined, in part because research aimed at exploring epidemiological risk factors often necessitates large-scale sampling and extensive work recruiting veterinarians, pets and owners. Veterinary electronic health records (EHRs) provide a near real-time capture of recorded events providing an opportunity to explore conditions whose epidemiology is not entirely understood.
The aim of this study is to use EHR data from a sentinel network of veterinary practices from across Great Britain (GB) over a six-year period to develop a text mining technique to identify at scale flea infestations recorded by veterinary professionals to assess the impact of season; geographical spatial distribution; pet breed; age; sex; and neutered status associated with flea cases diagnosis.

Data extraction and inclusion criteria
Veterinary EHRs were collected between March 2014 and July 2020 from a sentinel network of 191 volunteer veterinary practices across GB; a full description of the Small Animal Veterinary Surveillance Network (SAVSNET), has been presented elsewhere (Sánchez-Vizcaíno et al., 2017). Briefly, veterinary practices using practice management software previously made compatible with SAVSNET data exchange were recruited based on convenience. In participating practices, data is collected from each booked consultation (where an owner has made an appointment to see a veterinary surgeon or nurse). Owners attending participating practices are given the option to opt out at the time of their consultation, thereby excluding their data. For those that participate, data is collected on a consultation-by-consultation basis and can include information about the animal (e.g., species, breed, sex, neuter status, age, owner's postcode, insurance, and microchipping status), as well as a free-text clinical narrative, treatments dispensed, and the vaccination history. SAVSNET has ethical approval from the University of Liverpool Research Ethics Committee (RETH000964).
For the exploration of data used in this study, a case was first defined as an animal that presented with fleas or flea dirt at the time of consultation, as observed and record by the attending practitioner within the free-text clinical narrative where no other conflicting diagnosis was made. Putative cases were provisionally identified by screening all clinical narratives using a Python regular expression (Regex) for the identification of consultations where fleas or flea dirt were recorded matching our case definition.
The regex was developed iteratively, each time using a new random sample of 10,000 consultations selected from the entire SAVSNET database to provide phrases and spelling variations to better improve the accuracy of the regex in finding cases that match our case definition. For each iteration, up to 100 random identified records were manually read by one of the authors (SF) to identify new terms requiring inclusion of exclusion. Examples of common negations are with reference only to flea treatments; general flea-related advice; owner diagnosis of fleas; or previous infestations all of which was explicitly independent of fleas or flea dirt presence at the time of the consultation. This process was repeated until consistent true positive cases dominated these 100 random records and new negations were rare, before being applied to the entire SAVSNET dataset; any identified EHRs were at this point regarded as true cases. This process is outlined in Figure 1 with the final regex accessible in the supplementary information.

Animal data
In order to determine risk factors associated with recorded flea infestation, we conducted a retrospective case-control study. Control consultations (set at a ratio of one case to every six controls) were randomly chosen from the database where it did not match our regex produced to capture flea cases. For each case and control, date of birth, date of consultation, owner's postcode, sex, neuter status, species, breed, and the clinical narrative were collected. Where any case or control was missing any data (most frequently this was geographical or age data), they were removed from the study. When an animal was seen multiple times, either within the case or control population, or between them, duplicate animal consultation records were removed at random leaving one consultation record per animal. Once the final dataset had been attained, 100 consultations were randomly extracted and given to a separate author (DS) to manually verify the success of the extraction process. Whether a consultation was a case or control was hidden from the reviewer and only revealed once the manual reading was complete and for statistical metrics to be determined.
To allow for investigation of breed as a risk factor, breeds were categorized according to genetic markers, as identified by the Von-Holdt and Lipinski in separate breed-based genetic-associated papers (Lipinski et al., 2008;Vonholdt et al., 2010). This classification system organizes canine breeds into one of 11 genetic types (with an additional unclassified category, such as retriever, spaniel, and small terriers). We also include a 'crossbreed' breed type. For cats, the breeds were associated into four types based on regions of origin, Asian, Mediterranean, West European and crossbreed, with an additional unclassified group.
Owner postcodes were used to calculate Nomenclature of Territorial Units for Statistics (NUTS) 1 codes. Data on the degree of urbanization for each postcode also were obtained using a 10-fold score for England and Wales and 7-fold score for Scotland (Bibby, 2013;Scottish Government, 2016

Statistical analysis
Initial quantitative analysis was used to identify broader features of the dataset such as Kendall's т coefficient test to verify presence of correlation between number of cats against the number of dog cases per 100 consultations per surveyed practice. Univariate mixed effect logistic regression was conducted utilizing case-control status as a binary dependent variable. Every explanatory variable (Tables 1 and 2) was explored, with likelihood ratio test (LRT chi-squared test) used to assess fit compared to a null model, using practice as a random effect.
Continuous explanatory variables were assessed to ascertain presence (or absence) of a clear linear relationship with the dependent variable and those displaying non-linear relationships were fitted with polynomials, up to cubic fits, depending on which provided best fit via an LRT and visualized with sjPlot (Daniel Lüdecke, 2019). Explanatory variables with an LRT of p ≤ 0.2 compared to the null model were included in an initial multivariable logistic regression model. A backwards selection process was utilized in order to produce a model fit with the lowest Akaike information criterion (AIC) possible.
Multicollinearity in the final multivariable model, assessed via the variance inflation factor (VIF), was not found to be present. Due to challenges in comparing deprivation across England, Wales, and Scotland a separate multivariable model was produced for England only, with the same methodologies otherwise applied with the addition of IMD decile. All analyses were carried out using R version 4.0.2 (R Core Team, 2020).

Data management
When applied to the full SAVSNET database of 6,914,563 consultations, the final regex identified 51,793 veterinary professionalrecorded flea cases ( Figure 1) to verify that the case definition (fleas or flea dirt present verified by the attending veterinary practitioner) was satisfied, manual reading of 1000 random records gave a positive predictive value of 98.6%. A different sample of 1000 records was passed to a separate author for manual reading, this process was to verify the success of the extraction method to ensure that the latter results are applicable. Of the 1000 records reviewed, 98 true positives and 896 true negatives were correctly identified by the regex with three false positive and three false negatives were also present. This gives the regular expression a sensitivity of 0.97 and a specificity of 0.99, with precision, recall and F1 also reporting at 0.97. Cohen's Kappa analysis was performed revealing a result of 0.97 suggesting an 'almost perfect' agreement (Landis, J. Richard, & Gary G. Koch., 1977). The combination of results from both reviews of the retrieved dataset was deemed sufficiently accurate such that all retrieved cases were included as probable cases in further descriptive analysis and modelling. Cases were excluded if any data entry point was missing, such as age (N = 3253), location (N = 6964), breed/species (N = 5137) or sex (N = 1191). In the event, an animal was seen on multiple occasions (regardless of their presence in the case or control cohorts) a record was selected at random, and all others were excluded (N = 593). Controls were scrutinized to the same degree, and only used if the data was complete.
F I G U R E 1 Data mining pathway used to extract flea consultations from the SAVSNET electronic health record dataset, with inclusion of data cleaning stages before splitting on species.
T A B L E 1 Risk of veterinary-recorded flea infestation in dogs (N = 12,168) from multivariable logistic regression model throughout Great Britain.   Tables S1 and S2.
Recorded flea case prevalence in individual practices varied considerably in both cats and dogs; a positive correlation between practice-level recorded prevalence in dogs and cats was found, suggesting that practices that more commonly recorded flea presence in cats were also likely to record flea presence in dogs (Figure 3; τ = 0.52, z = 11.97, p < 0.001).

Risk factor modelling
Multivariable risk factor analysis was performed separately on cats and dogs (Tables 1 and 2 (Figure 4).

Spatial and temporal data
When NUTS1 regions were considered within the multivariable model, there was a general trend of reduced recorded cases in more northern NUTS1 regions ( Figure 5). For cats, when compared to UKC  (Table S3 and S4).

Other species
Rabbits (N = 210) and other species (N = 34) consisting of 23 ferrets, five Guinea pigs, two hamsters, two rats, one hedgehog and one cockatiel also were identified as being associated with recorded flea

DISCUSSION
Here we have capitalized on the availability of EHRs by analysing data from a six-year period to give a novel insight into the phenology of fleas associated with companion animals seen at veterinary practices.
This we believe provides a pragmatic solution to monitoring flea epidemiology at scale; however, it should be noted that SAVSNET recruits practices based on a convenience sample and as such should not be necessarily representative of the GB pet population as a whole.
This might have greatest bearing on risk factors that may be associated with the decision to present the animal to a veterinary practice in the first place, such as IMD.
This considered, consistent with previous practice-level studies, we have shown that the prevalence of veterinary professionalrecorded flea infestations in cats was approximately twice that of dogs. Although flea species data was not collected, we can speculate that similarities within this study suggest that the cat flea, The prevalence of infestation reported here for both cats and dogs is much lower than that obtained when animals are purposefully scrutinized for evidence of fleas using a standardized body screening/ adult flea sampling protocol in practices recruited for that purpose (Abdullah et al., 2019;Bond et al., 2007). Indeed, these previous UK-based studies showed that flea infestation exists between 21.09%-28.00% of cats and 6.82%-14.00% of dogs carried fleas. The prevalence of infestation data in this study is therefore not directly comparable due to differences in assessment methodologies, however, our findings demonstrate the utility of continual flea monitoring and suggest that purposeful examination of fleas in UK veterinary practices is rarely occurring.
The effect of temperature and humidity on the cat flea life cycle is well documented (Rothschild, 1975), but how seasonal variation in these parameters affects current phenology in the natural setting is poorly described, owing to the challenges of studying populations longitudinally. Where they are performed, such longitudinal studies are often geographically limited and/or caried out over short time periods.
Exploiting the continual annual availability of EHR data, we have shown that flea infestation seasonality is similar for both cats and dogs, and that the greatest number of cases are recorded between July and October in dogs and between July to September in cats, consistent with a previous study in Germany (Beck et al., 2006). Studies carried out on hedgehogs and dogs within Dublin recorded increases in fleas during the summer period of June to August, before dropping off through winter (Baker & Mulcahy, 1986). Interestingly, the peak of  (Kunkle et al., 2014). This observation requires further research and monitoring in future years since understanding such patterns and how they are affected by prophylactic treatments will help inform future therapeutic interventions (Dryden & Rust, 1994). It should be noted that Consistent with the only other previous study on geographical distribution in the UK, cases were seen to fluctuate across the country, with an overall trend of decreasing numbers as latitudes increased, such northerly latitudes being generally associated with cooler temperatures (Cooper et al., 2020). However, flea cases were recorded in every month and in every year, regardless of region; this may be explained by flea survival in the environment during low temperatures with induced quiescence in colder weather but with periods of activity during warmer spells. Continuous development of the immature flea stages in warm houses during colder weather is also known to occur (Carlotti & Jacobs, 2000).
There was an observed association with increased risk of fleas with breed, with small terriers and toy dogs being at greater risk  (Reeves et al., 2011;Roul et al., 2011). In addition, older dogs visiting the veterinary practice may be more likely to be attending for a pre-existing medical condition, leading the attending veterinary professional to focus on this issue in preference to looking for, or recording, fleas.
Interestingly, findings of a study of the flea Siphonaptera Pulicidae Xenopsylla ramesis (Rothschild, 1904) feeding on the rodent Rodentia Muridae Meriones crassus (Sundevall, 1842) showed that the size of a flea blood meal was largest when fleas fed on juveniles, and, by extension, the lifespan was seen to be longest in these fleas, possibly alluding to a longer persistence on and preference for juvenile animals (Roul et al., 2011). Although the models constructed here broadly capture age-based variability in flea infestation risk, rapid variation in case probability was noted particularly between animal ages 0-1, and 1-2 years of age in both cats and dogs. For these ages our modelling approach may have under and overestimated case probability, respectively. In future work, we aim to develop methods by which this interesting finding may be modelled more precisely.
In both cats and dogs, neutered animals were associated with reduced risk compared to those that were non-neutered (entire).
Although biological and behavioural factors might explain this finding, owners residing within more deprived areas have previously been shown to be less engaged with preventive veterinary care (Sánchez-Vizcaíno et al., 2018). Thus our finding here might be more suggestive of differences in owner engagement with preventive veterinary care (i.e., the decision to provide antiparasitic treatment less frequently to their animal) than there being a direct biological association between neuter status and flea infestation risk. Interestingly, we also observed that animals whose owners originated from more deprived areas of England were also associated with increased odds of recorded flea infestation. This does suggest that the ability to pay for flea control is having an impact on the odds of recorded flea infestations. Although flea treatment is readily and comparatively cheaply available in most supermarkets, those generally recommended by veterinary professionals as being more effective are only available via a prescription, and are likely to be associated with greater costs for the owner. It is possible that we are observing this inequity in our findings here, though we would like to remind the reader of the potential limitations associated with the convenience sampling used in our study. This considered, our findings suggest a potential avenue for targeted healthcare measures towards higher deprivation areas as a means to reduce flea prevalence, whether this be in reducing flea treatment costs or increasing availability of information.
Data from EHRs provides a rich source of practical information but does come with certain limitations. Although significant efforts were made to capture as many cases as possible here, it is inevitable that some were missed, either because our regular expression methodology did not identify the specific expression used in participating veterinary practices, or simply that the presence of fleas or flea dirt seen during a given consultation was not recorded. In addition, since our regex is not 100% accurate, a small proportion of false positive cases are included in the analysis. Another limitation is that veterinary practitioners rarely attempt to identify flea species infestation. Other potential study criticisms are that the data is restricted to those presenting at a SAVSNET participating practice, where recruitment is largely based on convenience, such that the findings of this study cannot be considered representative of the entire population of vet-visiting companion animals in GB. National coverage of SAVSNET data is variable such that accuracy of estimates will vary, as SAVSNET grows it is hoped that some of these coverage issues will be addressed.
To conclude, this study has used an extensive novel dataset and text mining methodology to analyse large volumes of EHR data assimilated over a six-year period. We described the seasonality, geographical distribution and risk factors associated with recorded flea cases across GB. These newly identified factors such as variances in breed types, age, and neuter status to name a few, could form the basis of more targeted health messages for veterinary practitioners and companion animal owners, allowing prophylactic and therapeutic treatments to be used on those animals most at risk, and at the right time.
This might ultimately reduce empirical flea treatment use, which has recently been shown to be contaminating rivers having great environmental impacts on aquatic life (Perkins et al., 2021). The method we describe here can provide a real-time, cost-effective contribution to monitoring strategy as a foundation for ongoing flea surveillance in small animals.

ACKNOWLEDGMENTS
We are grateful for the support and major funding from BBSRC (BB/N019547/1) and BSAVA. We wish to thank data providers in VetSolutions, Teleos, CVS, and other practitioners, without whose support and participation this research would not be possible. Finally, we are especially grateful for the help and support provided by

SAVSNET team members Bethaney Brant, Susan Bolan and Steven
Smyth.

DATA AVAILABILITY STATEMENT
Data is available by reasonable request to SF.