Introduction
The COVID-19 pandemic represents a global threat and poses challenges for health, economy, and well-being. This highlights the importance of analysing robust and timely data to support decisions regarding the implementation of public health measures at the national, regional, and municipal level. It has thus been recommended that epidemiological studies should consider multi-level investigations of reliable and representative environmental, societal, and population determinants (1). Behavioural, socioeconomic, and community factors, control measures, the effects of population mixing, and the use of appropriate spatial and temporal resolution and time frames all need to be carefully investigated (1).
The evidence from previous pandemics indicates that disadvantaged groups have been disproportionally affected (2, 3). The determinants of COVID-19 transmission are still uncertain, but previous studies suggest that population density, overcrowding, mobility, and socio-economic status are potentially relevant (2, 4, 5). These seem to vary according to contextual specific factors and moments in time, however. A small number of studies have attempted to identify municipality-level determinants of transmission using methods such as multiple linear regression (MLR) and neural network analysis (6, 7). One study mapped county (municipality-level) determinants of COVID-19 transmission in nursing homes in the USA, and found that factors like per-capita income, average household size, population density, and minority composition were significant predictors of COVID-19 cases in nursing homes (6).
The social determinants of health are interrelated and likely to play a major role in the COVID-19 pandemic. Education level influences occupation, which determines economic stability and income level, which can, in turn, impact the type of health care and health-seeking behaviour. Simultaneously, education might influence in which neighbourhood an individual lives, i.e., determining their social and community context (7). This intricate network makes the study of causality difficult and must be considered with caution. Nevertheless, it is relevant to assess the abovementioned factors and generate a hypothesis on how they influence the spread of COVID-19.
We thus aimed to identify the municipality-level determinants of COVID-19 cases in Portugal at 4 moments of the first epidemic wave.
Materials and methods
We conducted an ecological study to analyse the association of 65 municipal-level variables from official statistics drawn from 5 dimensions, i.e., population and settlement, disease, economy, social context, and mobility, and the number of daily cases per municipality at 4 pre-defined moments (taken from the official surveillance system). The dates for this analysis were selected according to the public health measures in place, i.e., the date of publication of guidelines/legal documents and the maximum number of cases (determined by the 3-day moving average) occurring in the following 2 weeks. Four periods were selected, starting on March 23 (the 1st day with information available per county (lockdown phase)), May 28 (the 1st phase of the gradual resumption of activities), June 8 (to evaluate the effects of the 2nd phase of the gradual resumption of activities), and June 27 (the gradual resumption of activities after the 3rd phase).
For each moment of analysis, we used a multivariate linear model (MLR) and a nonlinear model, i.e., artificial neural networks (ANN). MLR identified the strength of association between each independent variable and the outcome (number of cases). The variables presented in the final MLR were selected by backward elimination until all remaining variables had a p value <0.05. Results were summarized for each moment showing variables included in the final models.
ANN constitute a non-linear parametric model, with the advantage of implicitly detecting non-linear relationships between the outcome and explanatory variables. ANN have been used to identify risk factors for different health outcomes, including reported incidences of COVID-19 at the county/municipality level (8-10). As there is no need for independence and normality of the variables, applying ANN in the analysis of epidemiological data is attractive. In addition, neural processing is able to extract relationships from input variables directly over high-dimensional spaces, making such processing a valuable tool in complex pattern recognition problems. The selected non-linear approximation implemented is depicted in Figure 1. For details, please refer to the project website (11).
Results
For MLR, some of the identified variables (Table 1) were: resident population and population density, exports, overnight stays in touristic facilities, the location quotient of employment in accommodation, catering and similar activities, education, restaurants and lodging, some industries and building construction, the share of the population working outside the municipality, the net migration rate, income, and renting. For ANN, some of the identified variables (Table 2) were: population density and resident population, urbanization, students in higher education, income, exports, social housing buildings, production services employment, and the share of the population working outside their municipality of residence. There is a communality of factors identified at different epidemic moments by both methods and specific ones emerged for each epidemic moment.
Discussion
Our results attempted to identify municipality-level determinants of COVID-19 transmission using complementary approaches. Variables identified as being associated with the number of cases reported changed over time, emphasizing the dynamic nature of this communicable disease.
Initially, more affected areas presented international relations associated with tourism or exports (in MLR and ANN) and the socio-economic conditions of the population (more evident in ANN). Later, during the lifting of the lockdown, the epidemic surged in suburban areas with lower incomes and a higher number of immigrants, thus emphasizing the role of the socio-economic and cultural determinants of transmission (e.g., crowded housing conditions and the concentration of specific economic sectors with a high concentration of employment - building construction, beverage, and storage). Finally, at moment 4, higher-education students, 1st-cycle (of basic education) students, and urbanization became relevant. Population density and the share of people working outside their municipality of residence were identified as factors at all 4 moments and in both methods.
It has been stated that responding to COVID-19 requires continuous monitoring of environmental and societal determinants to implement adequate prevention strategies (1). Only a few studies have attempted to relate transmission levels to community-level determinants (6, 7, 12). One study found that per-capita income, average household size, population density, and minority population composition were significant predictors of COVID-19 cases in nursing homes (6). Another identified age, disability, language, race, occupation, and urban status as predictors (12). Areas with more deprived populations and social vulnerability have been reported to have worse outcomes in terms of COVID-19 transmission (13), also at the county level (14). Reports of deaths disproportionately affecting specific groups, e.g., those with a non-white ethnic background, have also been published (15). Some reports are calling COVID-19 a “sindemic” due to the concurrence of social, economic, and health vulnerabilities and the exponential increase of the pandemic (3, 16). The European Centre for Disease Prevention and Control (ECDC) also identified clusters of occupational economic activities and outbreaks in health care, food packaging and processing, factories/manufacturing, building and construction, and educational facilities (17). Our findings are in line with these other studies.
This is a preliminary approach to the study of municipality-level determinants in Portugal and some study limitations need to be acknowledged. First, the ecological design limited the ability to determine causal relationships (18). Second, the definition of the outcome as the daily number of cases might not have fully captured the spread of the disease; alternative definitions, e.g., changes in cases over time, could be considered in future analyses. Third, the number of COVID-19 cases identified is also influenced by surveillance system sensitivity and testing strategies (19, 20). Accounting for these was not feasible but could be investigated in future studies, to ensure comparability over time. Finally, the definition of initially selected variables might have been too broad, as these were official statistics readily available for analysis.
There is still a lot of uncertainty regarding the actual significance of these findings and the exact role of each variable in the causal network (21). Nonetheless, the fact that our results were consistent with those of previous studies was reassuring. Further studies should consider a more extensive analysis of several waves of the pandemic and both space and time patterns. Individual-based studies would be important to shed further light on both the determinants of transmission and the underlying mechanisms at work.
In conclusion, several factors were identified as possible determinants of COVID-19 transmission at the municipality level. Aspects regarding the socio-economic characteristics of the population showed varying relationships with COVID-19 cases, while population density and mobility-related aspects were consistently associated at all 4 moments analysed. Despite some study limitations, we believe that these preliminary results should be considered to support decisions regarding COVID-19 prevention and control measures. More studies are required to enhance the robustness of this methodological approach and its results.
Statement of ethics
This was an observational, ecological study using data from a secondary data source. Ethics approval and consent to participate are not applicable to this study.
Conflict of interest statement
There are no conflicts of interest.
Author contributions
P.S., E.M.C., A.C.F., and R.G.: conception and design of the work. P.S., V.R.P., and A.L.: introduction. N.M.C., J.R., P.A., and P.S.: collecting data, selection of variables, methodological implementation, and results. P.S., E.M.C., A.C.F., R.G., F.D.R., N.M.C., J.R., V.R.P., and A.L.: discussion of results. P.S., V.R.P., A.L., and A.C.F.: conclusions. All authors read and approved the final manuscript.