High-resolution spatial modelling of population welfare in El Salvador, a coding tutorial
Section 1 Introduction
Among the United Nations Sustainable Development Goals (SDGs), Goal 1 is to end poverty in all its forms everywhere by 2030, and Goal 4 and its Target 4.6 aim to ensure that all youth and a substantial proportion of adults, both men and women, achieve literacy and numeracy. The specification of achieving Goal 1 “everywhere” means that no one should be left behind. Therefore, national statistics need to include subnational-level populations and local heterogeneities to ensure representative monitoring and optimized intervention planning. Mapping the geographic distribution of populations in great detail—including aligning their characteristics with the goals’ indicators—is accordingly a central tool for meeting the SDGs.
National household surveys are typically representative of the regional level (administrative level 1), but household conditions may vary at a much finer scale. For instance, the poverty levels within a town and rural areas in the same region can differ markedly. Similarly, poverty levels between rural towns and neighboring hamlets can exhibit large differences, while local economic activities (e.g., the presence or absence of a manufacturer) can largely determine SDG outcomes. It is therefore important to have more detailed, high-resolution information to support the efficient allocation of resources across territories and the monitoring of SDG indicators.
Conventional approaches to producing high-resolution development indicators rely on small area estimation (SAE) methods that integrate household-survey and census data to estimate the proportion of households in poverty. Household surveys, conducted every one to five years, have been improved through the introduction of geolocated survey clusters that provide more fine-grained spatial data using GPS. In El Salvador, the national household survey Encuesta de Hogares de Propositos Multiples (EHPM) is conducted annually. Censuses, on the other hand, are typically undertaken irregularly, sometimes every 10 years or longer in many low-income countries. In El Salvador, the last census was conducted in 2007. The reliance on census data may weaken the reliability of SAE estimates, preventing the capability for ongoing monitoring of SDG indicators.
Georeferenced national household survey data provide an opportunity to achieve more spatially detailed, accurate, and regular estimates of poverty distribution and other SDG indicators. To further improve these estimates, novel sources of spatial data are increasingly used. For example, remote sensing (RS) and geographic information system data comprise continually collected information—such as on rainfall, temperature, and vegetation—which are related to agricultural productivity, or light at night and distance to roads and cities which are related to access to markets and information and local economic dynamism. These additional data, once combined with household surveys, can help the production of poverty maps at high spatial resolution.
Spatial interpolation approaches consist of overlapping such data with more traditional sources such as survey-based data to produce regularly updatable high-resolution maps of development indicators. In this report, we use spatial interpolation methods by integrating household-survey cluster data with geospatial covariates to produce high-resolution poverty, income, and literacy maps for El Salvador.
This report gives a coding tutorial for creating high-resolution maps of SDG indicators. It outlines the relevant data and how they can be processed and analyzed. The emphasis is on providing reproducible codes and examples, and all the RS data are open source.
The codes to go from the raw data to the high-resolution maps are presented in detail. This report is written with the
Markdown (Allaire, Horner, and al. 2019) and
bookdown (Xie 2016) packages. The tutorial assumes that the reader is familiar with the open-source statistical computing environment R. For an introductory book on using R, refer to https://cengel.github.io/R-intro/. For an introductory book on using spatial data with R, refer to https://cengel.github.io/R-spatial/.
The tutorial is structured as follows. Chapter 2 provides an overview of the analysis. Chapter 3 introduces the reader to data exploration and unsupervised predictors selection. In Chapter 4, a simple Bayesian model is fitted. Chapter 5 instructs how to do covariate selection using the jackknife approach, shows the results of the selected models, and produces a map of a socioeconomic variable. Chapter 6 concludes. Appendix 1 provides a short primer on INLA, an R package. Appendix 2 provides the codes required to preprocess the RS data, starting from the raw raster and vector layers up through the data in row and columns format used to fit the models.
Allaire, JJ, J. Horner, and Y. Xie et al. 2019. Markdown: Render Markdown with the c Library ’Sundown’. https://CRAN.R-project.org/package=markdown.
Xie, Y. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.