Urban pollution: A global perspective

We use worldwide satellite data to analyse how population size and density affect urban pollution. We find that density significantly increases pollution exposure. Looking only at urban areas, we find that population size affects exposure more than density. Moreover, the effect is driven mostly by population commuting to core cities rather than the core city population itself. We analyse heterogeneity by geography and income levels. By and large, the influence of population on pollution is greatest in Asia and middle-income countries. A counterfactual simulation shows that PM2.5 exposure would fall by up to 36% and NO2 exposure up to 53% if within countries population size were equalized across all cities.


Introduction
Pollution is an important determinant of urban quality of life. Households have flocked to cities over the last centuries and decades, attracted by various agglomeration economies, such as higher productivity and wages. However, city life has been and still is, to different extents, plagued by agglomeration costs stemming from crime, congestion, and pollution. Besides, urbanization and environmental degradation are not evenly spread throughout the world. While developed countries were already more than 50% urbanized by 1950, this threshold has been reached by less developed countries only in 2020. 1 Accordingly, the bulk of the recent and imminent increase in world urbanization will occur in developing regions. This is also where urban air pollution is most severe. For example, taking the average PM 2.5 concentration value from the WHO air quality database from 2022, 2 20 of the 25 dirtiest cities were located in India, China, Bangladesh or Pakistan, with the remaining in Cameroon, Iran, Mongolia, Madagascar and Afghanistan. Therefore, the relationship between agglomeration and pollution is also a question of socio-economic development. Reigning in pollution, especially in large cities, will be important as developing countries thrive to improve their citizens' well-being. Yet, while there is an extensive literature on the benefits stemming from agglomeration economies, there is much less research on the costs of agglomeration (Ahlfeldt and Pietrostefani, 2019).
In this paper, we contribute to filling this gap. We use global gridded data on air pollution and population to analyse how agglomeration, in the form of large and densely populated cities, affects exposure to PM 2.5 and NO 2 pollution.
In theory, population density might increase or decrease pollution concentration in cities. Borck and Schrauth (2021) present a model where residents of a monocentric city pollute due to commuting and residential energy use for heating, electricity, etc. They show that population density increases pollution concentration. The reason is that larger and more densely populated cities have more aggregate commuting and that residential energy use increases as well, even though residents live in smaller dwellings on average. 3 However, there are some countervailing forces. For instance, public transit is more viable in large and densely populated cities due to economies of density, and denser housing is more energy efficient. Therefore, the relation between density and pollution is theoretically ambiguous.
Similar opposing forces determine whether cities with larger total population (as opposed to density) are more polluted (see Borck and Pflüger, 2019;Borck and Tabuchi, 2019). Further, the relation between population density and pollution is likely to depend on many factors that vary between regions, such as geography, institutions (environmental policies) etc. Therefore, an interesting question that we look at is how pollution and its relation with density varies between regions with different characteristics.
We use 11-16 years (depending on the pollutant) of gridded satellite data to document the distribution of pollution over space and time. There are several main findings. First, we show that about 3/4 of the world population and about 79 percent of city dwellers live in places with particulate pollution above thresholds as recommended by the WHO. Thus it seems that pollution is especially severe in cities. We go on to estimate the elasticity of pollution with respect to population density for PM 2.5 and NO 2 . Using OLS regressions with country fixed effects, we find elasticities of 0.15-0.16 for NO 2 and 0.02-0.03 for PM 2.5 . To tackle concerns of reverse causality and omitted variables, we also instrument population density using historical populations from different periods in time. Doing so has only a very small effect on the estimated elasticities.
We present our results using both grid cells and cities (Functional Urban Areas, FUA) as units of observation. Examining cities allows us to explicitly differentiate between the different effects of agglomeration size versus population density on exposure. For cities, we find that population size seems to be more important than density. Furthermore, using the definition of FUAs allows us to differentiate between the core city and its surrounding commuting zone. In fact, it turns out that pollution exposure is not significantly affected by core city population, but does rise significantly with population living in FUAs' commuting zone.
Moreover, we study how the pollution-density relationship varies over continents and by income. For the rasterized global data, we find that the pollution-density relation is strongest in middle income countries and in Asia. For the city data, population/density affect pollution most in upper middle and high income countries as well as in Europe and North America.
We also present outcomes on a more local level by estimating the effect of within-city variations in density. Again, we find positive effects of density on exposure, but the effects are mostly smaller in size. Additionally, we estimate spatial first difference regressions, where the estimated elasticities are based on changes between neighbouring grid cells (Druckenmiller and Hsiang, 2018). The corresponding coefficient estimates turn out to be positive, but again smaller in size.
Lastly, we perform a simple counterfactual simulation. Using the exposure-population elasticity from our city analysis, estimated separately for each country, we ask how each country's total exposure would be affected by an equal redistribution of population across cities. We find that for PM 2.5 , exposure falls by 36.5% for the country with the largest drop (Indonesia), which has a large estimated elasticity. Conversely, there are some countries with negative elasticities, so exposure would rise in this counterfactual by a maximum of 22.5% in Senegal.
The study contributes to a small but growing economic literature on urban pollution generally, and on the relation between agglomeration and pollution in particular. Empirical papers in fields other than economics have largely been confined to cross-sectional studies. 4 However, omitted variables and reverse causality are difficult to tackle in these settings. Among the few serious efforts to identify the causal effect of population density on pollution are Borck and Schrauth (2021) and Carozzi and Roth (2022). Borck and Schrauth (2021) use panel data from German districts, while Carozzi and Roth (2022) use cross-sectional data from US metropolitan areas. Both papers instrument density with a variety of historical and geological instruments. Castells-Quintana et al. (2021) and Aldeco et al. (2019) also study global pollution. Aldeco et al. (2019) focus on studying the effect of various policies using a spatial equilibrium model. Castells-Quintana et al. (2021) is also closely related to our paper, but there are several differences. They study emissions in a global panel of cities, while we analyse exposure in both cities and raster cells, which allows for a truly global analysis and lets us study the urban-rural pollution gradient in addition to cross-city differences. Moreover, we do a variety of heterogeneity analyses, and instead of emissions, we look at pollution exposure which is more tightly linked to local welfare.
The paper is organized as follows. The next section presents our data, descriptive analyses and empirical approach. Section 3 shows the results. In Section 4, we simulate how total exposure would change if, within countries, we were to redistribute population equally among all cities. The last section concludes the paper.

Data and estimation 2.1 Data
Most data sets we use are derived from satellites and are provided as a grid of raster cells covering the entire world. Those rasterized grid maps come in different resolutions, mostly between 0.01 and 0.25 decimal degrees. We transform all data to 0.25 degree raster cells, which is a compromise between the different levels of aggregation of the native data and moreover alleviates concerns about auto-correlation at finer scales. At the equator, a quarter degree grid corresponds to 27.8 kilometres into one direction or roughly 775 square kilometres overall. 5 For our analyses, we use the years 2000, 2010 and 2015. In the following, the different data sets are described in more detail. 6

Units of observation
In the analyses we use two types of observational units. The first are raw grid cells from the raster data. The second are cities or "functional urban areas" (FUA) as defined by Moreno-Monroy et al. (2020). These FUA are cities and their surrounding areas with strong internal commuting links. We view these two data sets as providing complementary results. On the one hand, defining cities gets us closer to measuring activity in economically meaningful areas. On the other hand, using all grid cells -even very thinly populated ones -allows us to measure an urban-rural gradient of pollution. Thus, our paper differs from and complements other papers that have mostly studied cities only (e.g. Carozzi and Roth, 2022;Castells-Quintana et al., 2021). 7 Raster data. The first units of observations in our analysis are grid cells of a rasterized world map. The majority of the data we use is provided as raster maps, which then can be matched to each other geographically. The advantage of looking at grid cells is that we abstract from defining cities or urban areas and that there is an increasing database of worldwide data covering a wide range of topics. In addition, it will allow us to measure an urban-rural pollution gradient since observations take into account any type of inhabited land and do not depend on city definitions. We use grids of 0.25 decimal degrees and aggregate all the other raster maps to this size. This leaves us with more than 240,000 cells that fall on land to which we make some minor adjustments. 8 The chosen grid size is a compromise between data that is available at relatively fine grid scale, and data that is available at coarser levels only. It also mitigates concerns about spatial auto correlations. 9 5 Moving away from the equator means that equally sized grids cover smaller areas due to the curvature of the earth. At the 45th degree of latitude for example, which crosses South Dakota, Mongolia, France and Italy, 0.01 decimal degrees are equal to 787.1 meters in one direction. The value approaches zero at the poles. Most of human activity takes place between the 50th parallel south and the 60th parallel north. 6 The NO 2 data is only available from the years 2000 to 2012, of which we use the years 2000 and 2010. The PM 2.5 data is available until 2015.
7 Castells-Quintana et al. (2021) also look at the effect of density and polycentricity on pollution at the country level. 8 We drop grid cells which cannot be assigned unambiguously to one single country. As a consequence, about 21,400 grid cells that lie at country borders are dropped from the sample. Furthermore, we harmonize the country composition of our city and grid cell samples. Thus, all countries which do not contain at least one Functional Urban Area are dropped. 9 For variables available at higher grid resolutions, e.g. 0.1 decimal degrees, we re-project the data to 1 4 degrees using an appropriate function: For continuous variables, we calculate either the mean of all smaller grids within the respective quarter degree grid (pollution exposure for example is mean

Cities.
There are several reasons why we want to define city delineations. First, defining cities allows us to distinguish between city size and density. In a grid with equally sized grid cells, density would be strictly proportional to population. While basic urban economic theory also predicts a positive relation between population size and density (e.g. Brueckner (1987)), in practice the two vary independently, for instance, due to differences in zoning policies across cities. Since different agglomeration economies and diseconomies may operate at different spatial scales, population size and density might then affect pollution differently (Ahlfeldt and Pietrostefani, 2019;Cheshire and Magrini, 2008). Second, some of the PM 2.5 pollution stems from sources not directly attributable to daily human activities such as volcanoes or wildfires (see e.g. NASA Earth Observatory (2015)). While this may be interesting in its own right (if it leads to rural areas being dirtier than they would otherwise be), abstracting from these types of events by focusing on urban areas allows us to concentrate on the effect of human activity in cities on pollution. Third, we can conduct between city analyses to supplement the rural-urban gradient. This type of city size effect helps us connect the empirical analysis with theoretical considerations about optimal city size (see e.g. Borck and Tabuchi (2019)). Fourth, we can detect within city differences. Thus, we can study whether there is a core-periphery gradient of pollution exposure within cities and we can compare it to between-city effects or the urban-rural gradient of pollution exposure. Lastly, our historical population instruments consist of geo-coded city locations. Directly instrumenting urban areas rather than grid cells thus seems more adequate.
We define cities as Functional Urban Areas (FUA) following Moreno-Monroy et al. (2020). They use population and travel time data in 2015 to define unique urban centres including their commuting zones. 10 Thus, our city definitions do not vary over time. A FUA consists of an urban core with at least 50,000 inhabitants and the surrounding commuting zone, which is constructed using travel times. FUA were originally defined by the OECD for OECD countries and Colombia. Moreno-Monroy et al. (2020) use those OECD-defined FUAs to estimate city boundaries for the rest of the world. Figure 1 shows our two main units of analysis and the population distribution as provided by LandScan for the north-eastern USA. More precisely, the figure visualizes the New York, Philadelphia, Baltimore and Washington, D.C. area and the area's FUAs. Grid-cell analyses consider all the non-white grid cells that fall on land. Densely populated city cores are shown in red, and less dense suburbs and rural areas are shown in yellow and blue. The greyish transparent polygons overlaying the population grids depict the resulting FUAs. exposure within a quarter degree grid) or sum the values of these finely scaled grids, as appropriate. For categorical variables we take the modal value within a quarter degree grid.
Overall, there are 9,031 FUAs in 188 countries and about 245,000 grid cells in 185 countries. Since the main part of analyses contains within-country effects, we mostly restrict the samples such that very small countries with very few raster cells or cities are dropped. Note: The graph depicts a small section of the entire sample. White areas are water surfaces (lakes and oceans) that are excluded from estimations. The overlying grid corresponds to the raster units, while the greyish transparent polygons show single FUAs. Within both observation units, LandScan population data are shown in different colour gradations. Blue indicates very low population density and red indicates very high population density.

Air pollution data
The most direct measure of ground-level pollution concentration would be measurements from in-situ monitors. However, these are not widely available, especially in many lower income countries. Even among high income countries, only selected areas contain monitoring stations. To get coverage of world wide pollution we use satellite data, which captures air pollution concentration as vertical column densities in the troposphere. For ground-level observations, we resort to data sets that use chemical transport models to translate satellite measures into ground-level pollution. 11 The data products are annual means of dust and sea-salt removed PM 2.5 from 2000 until 2015 at 0.01x0.01 decimal degree (dd) resolution and annual means of NO 2 without corrections from 2000 until 2010 at 0.1x0.1 dd resolution. 12 For most of our analysis, we use the data that are weighted using geographically weighted regression (GWR), but outcomes are not sensitive to using the non-weighted data. 13 This native data serves to construct our pollution measure of interest, which is pollution exposure (see e.g. Carozzi and Roth (2022) or Aldeco et al. (2019)). Populationweighted pollution exposure E in grid cell G is then given by: where i indexes small grid cells within a large grid cell G. 14 Average pollution concentration in cell i is given by P G,i and population in i is represented by N G,i . Average pollution exposure is therefore the sum of grid-specific pollution exposure divided by overall population in a large unit G. We also repeat the analysis with our FUA dataset, where G represents a city instead of a raster cell.

Population density
Present population and density. For measures of population and population density, we use LandScan data (LandScan, 2018). Population in this data set is provided on a very fine spatial scale (30 arc-seconds) 15 , obtained from censuses and other sources worldwide. The data aims to show where people are located on average over the course of 24 hours. Thus, it includes place of residence and of work in its estimations. 16 However, there is no information about how exactly the information is implemented in the population grid estimates. Henderson et al. (2021) provide a "ground-truthing" exercise 11 See Hammer et al. (2020) and Van Donkelaar et al. (2016) for PM 2.5 and Lamsal et al. (2008) for NO 2 . This data is available online (Atmospheric Composition Analysis Group, 2018).
12 Satellite measurements of pollution are captured between 8:30 a.m. and 11:00 a.m. local time, depending on the satellite. The cell size means that a raster contains about one square kilometre at the equator compared to the 120 km 2 NO 2 raster cells.
13 GWR uses in-situ (on the ground) monitors to detect regional biases of satellite optical depth measurements. This bias is estimated and then corrected using different predictors like land cover or elevation difference (Van Donkelaar et al., 2015). The advantage of GWR is the more accurate representation of ground-level PM 2.5 .
14 For example, a PM 2.5 observation in a cell of 0.01x0.01 dd within a larger grid cell that spans over 0.25x0.25 dd.
15 30 arc seconds correspond to a little less than 0.01x0.01 dd, which is about 1km at the equator. 16 By contrast, other data sets distribute population obtained from administrative sources equally in space or use buildings as a proxy for where people live without distinguishing whether those buildings are commercial or residential ones. and conclude that LandScan data perform well and are suitable for analyses on a global scale. 17 Historical population measures. We also instrument population or density using historical population data, following a large literature in urban and regional economics since Ciccone and Hall (1996).
The data comes from Reba et al. (2016), who provide geo-referenced population data worldwide ranging from 3700 B.C. to 2000 A.D. using historical, archaeological, and census-based estimates. This data set comprises about 1500 settlements worldwide. We use it to construct two different instruments. The main instrument will be the population in 1900. This is the year with most observations in the historic population sample prior to 1914 and represents the population in industrialized times before the two world wars. The second instrument is population in the last year of observation before 1750, and thus in pre-industrialized times. In Section 2.3 below, we will come back to the issue of instrument relevance and exogeneity.
In the raster analysis, we assign historical population to single grid cells. Thus, each grid cell is instrumented with the historical population data of the settlement that lies within this grid cell. Some grid cells contain several data points. In this case, we sum up the population over these data points. The drawback of instrumenting with historical population counts is that the estimation sample is drastically reduced and that there is a concentration of settlements in economically developed countries. Since using those instruments therefore deprives us of much valuable information, we mainly present the IV results in order to compare them with OLS outcomes on a harmonized sample. For most of this study, we will concentrate on within-country OLS estimates.

Controls
We use a number of variables in order to control for potential observable factors that may be correlated with population density and pollution. Income is an obvious candidate variable that correlates with population density (Combes and Gobillon, 2015) and affects pollution. To control for economic development, we therefore use GDP from the dataset provided by Kummu et al. (2018). It is based on subnational accounts, like states in the U.S. or districts in Germany. In some specifications we control for the presence of coal-fired and other highly polluting power plants in a grid cell. 18 Since power plants may be close to dense areas, this may be one channel through which density affects pollution.
We also control for a variety of topological and climatological variables that may be correlated with density and pollution, such as ruggedness, temperature, wind speed, and precipitation. We compute ruggedness following Nunn and Puga (2012), which is roughly the grid-cell average difference in elevation between a point and the terrain surrounding it. Our ruggedness measure is calculated over land surface only, leaving out water. 19 Temperature and precipitation are taken as long-run averages over the 30 year period between 1960and 1990(FAO/IIASA, 2012. Wind data is retrieved from the global wind atlas (Davis et al., 2019).
In addition to weather, we control for variables that might influence pollution through their suitability for trade on the one hand and through their climatological impact on the other. These controls consist of three dummy variables (whether a major river, a large lake, or a coastline is within 25 kilometres of a grid centroid) and a continuous variable (distance of grid centroid to coast). In our city-level analysis, we measure the distance of a city border to the respective first nature characteristic. Data for coastlines, rivers and lakes come from Natural Earth (2018). 20 Moreover, we control for the agricultural suitability of a raster cell or city (compare Henderson et al., 2018). Locations that are densely populated due to their fertile soil may also suffer from high pollution, since agriculture is a strong producer of particulate matter pollution. We thus take into account land suitability (as continuous variable), and a set of biome indicators. Land suitability for agriculture is based on measures of climate and soil and predicts the probability of land to be cultivated (Ramankutty et al., 2002). Biomes describe the ecological system of an area and its dominant natural vegetation. These categories include for instance "tundras", "tropical and subtropical dry broadleaf forests", or "Deserts & Xeric Shrublands". The 14 biome indicators we use are taken from Olson et al. (2001). 21 We want to assess a variety of country features that may influence the relation between population and pollution. In order to do so, we make use of a large set of national indicators from the World Bank that contains economic performance, energy use or demographic characteristics (World Bank, 2019). This data set ranges from 1960 to 2018, which allows us to calculate for instance means of urbanization rates and their 19 In order to calculate the fraction of water surface we use the world map gridded at 1 km resolution provided by Lloyd et al. (2017), which is based on a global water mask dummy variable gridded at very high resolution (Feng et al., 2016). 20 We take the "high" resolution datasets from Natural Earth (2018). Rivers are categorized into 10 size ranks, where 1 are the largest and 10 the smallest rivers. We only consider rivers between the ranks 1-6. Large lakes are those with a surface area greater than 5,000 square kilometres, excluding unnatural dams. This leaves us with 29 major lakes. 21 Just like Henderson et al. (2018) we combine the categories "tropical and subtropical dry broadleaf forests" with "tropical and subtropical coniferous forests" as well as "tropical and subtropical grasslands and savannas and shrublands" with "flooded grasslands and savannas". Furthermore, we drop areas historically covered by ice or rocks from the analysis. Doing so does not, however, change our results. growth, residents of large agglomerations or energy use over different time periods. In our analysis below, we will correlate some of these measures with the density-elasticity of pollution.

Descriptive analysis
Human health is vulnerable to pollution. The WHO has set short-and long-run thresholds to indicate very high pollutant concentration levels that presumably pose major threats to human health. Short-run refers to the average value over 24 hours for PM 10 and one hour for NO 2 , while long-run means the annual mean. In our sample, the longterm threshold of 40µg/m 3 for NO 2 is not exceeded in a single raster or city, which would seem to suggest that this pollutant does not constitute a major health threat. 22 However, Borck and Schrauth (2021) show that NO 2 levels for both the short-and the long-run thresholds are transgressed in Germany on a more local level. The respective data is taken from in-situ monitors and therefore provides a more accurate local measure of air pollution. Indeed, the raster size we have available for NO 2 does not allow for a very local consideration of this pollutant. Furthermore, we only have available annual means, so we cannot analyse short-run threshold transgressions.
Things look different for PM 2.5 -pollution. Table 1 shows the share of population that lives in raster cells or cities where annual mean PM 2.5 pollution exceeds the short-run (25µg/m 3 ) or long-run (10µg/m 3 ) WHO thresholds. In 2015, out of the 7.26 billion people in the sample, about 5.52b lived in raster cells with mean PM 2.5 -levels beyond the long-term threshold of 10 µg/m 3 . This corresponds to around 76 percent of the overall world population. Approximately 39 percent were even permanently exposed to concentrations beyond 25 µg/m 3 , which is the WHO 24-hour mean and therefore recommended to be avoided over periods longer than a day. Note, however, that within a raster cell there is variation in pollution concentrations such that not everybody is actually exposed to those pollution concentrations. Therefore, the actual number of people permanently exposed to such high concentrations may lie below 75 percent.
If we only consider FUA, there are about 4 billion people living in urban areas (this is in line with the UN's estimate of worldwide city population) of whom about 79% live in cities with long-term mean PM 2.5 pollution beyond 10µg/m 3 . About 44% are in cities where average annual urban pollution even exceeds the short-term threshold of 25µg/m 3 .
There are marked differences between continents. In Asia, 92 percent of the population face an annual average PM 2.5 -pollution beyond 10µg/m 3 . In Europe the corresponding number is 67 percent, in Africa 62 percent, in North-America 37 percent, and in South America 32 percent. 23 Figure 2 shows the geographical distribution of PM 2.5 in 2015 and its change from 2000 to 2015. Dark grey/black areas in panel (a) are highly polluted with values close to or larger than 25 µg/m 3 while light grey/white ones may have values close to or below the annual WHO threshold of 10 µg/m 3 . In Panel (b), dark grey/black areas have seen an increase in PM 2.5 -concentrations between 2000 and 2015, while light grey/white areas experienced little change or even a decline. Apparently, many of the highly polluted areas in 2015 either developed into highly polluted areas or became even more polluted over the course of 15 years. The pollution problem has become much more serious especially in India and China, but also in Africa south of the equator (net of dust and sea salt). Many areas in the U.S., especially in the east, have improved their air quality over time, which is also the case in parts of Western Europe.
Pollution of NO 2 is much more concentrated in a few areas as shown in Figure 3a. The highest concentration exposure is found in North-eastern China, Middle Europe, parts of the United States and parts of Russia. The range of values is much smaller for NO 2 . The maximum value reached is 30 µg/m 3 in 2010. Figure 3b shows the change in NO 2 concentration levels. Again, dark grey/black areas are those where pollution most strongly increased from 2000 until 2010. Some parts of the U.S. and Europe have experienced air quality improvements. In Africa, Australia, and South America, NO 2 concentrations barely changed. Predominantly densely populated metropolitan areas such as Santiago de Chile, Cairo, or São Paulo seem to have higher pollution levels in 2015 compared to 2010. We now turn to our regression framework for estimating the effect of population on pollution exposure. Note: Figure 3a shows the worldwide distribution of NO 2 concentrations in 2010. White/light grey depicts low and dark grey/black high concentration levels. Figure 3b shows changes of NO 2 levels between 2000 and 2010. White/light grey areas saw negative, little or no change in pollution concentration; dark grey/black areas have experienced large increases in NO 2 pollution.

Estimation
In a first step, we run simple Ordinary Least Square (OLS) regressions of air pollution exposure on population (density) and control variables. To mitigate concerns about spatial autocorrelation, we analyse pollution within relatively large 1 4 decimal degree grids and we cluster standard errors within three-by-three squares of grid cells times year of analysis (following Henderson et al., 2018). This clustering approach accounts for the potential correlation of pollution in space since particulates for instance disperse spatially with the wind. In a second step, we restrict the sample to cities as defined above.
The OLS regression equation is: 24 where E GtS is exposure to NO 2 or PM 2.5 in grid cell/FUA G and year t in country S (in our baseline regressions we only include the last year of observation, i.e. 2010 for NO 2 and 2015 for PM 2.5 ). Our parameter of interest, ρ, measures the elasticity of pollution exposure with respect to population density D GtS (or population N GtS , depending on the specification). X GtS is a vector of control variables. As explained above, these contain the log of GDP, several variables about the suitability for trade (whether the raster/city lies on a river, on a lake or on the coast) and agriculture (land suitability, and biome indicators), temperature, precipitation (both as 1960-1990 long term means), wind speed, the presence of dirty power plants, ruggedness and latitude. We will compare OLS results to within-country estimates, which include a country dummy θ S . These within-country regressions compare raster cells/cities within a country to each other. The rationale is to control for any countrywide unobserved geographic or political features that may be correlated with both pollution and population (density).
Even though country fixed effects already account for a large portion of unobservables, OLS regressions may still be biased due to reverse causality or omitted variables. Economic theory and empirical evidence suggests that households would want to move to cleaner areas (Chen et al., 2022). Hence, population would be endogenous to pollution exposure. Moreover, within countries, there may be unobservable differences in policies, attitudes, and the like that are correlated with population measures and pollution. We therefore follow the urban economics literature in instrumenting population with historical population levels. 25 The main assumptions for using historical population data have been widely discussed in the urban economic literature. First, the distribution of population and economic activity tends to be persistent over time (see for instance Davis and Weinstein, 2002). This is intuitive, since infrastructure and buildings are durable and thus population changes are sluggish. Therefore, historical population is a good predictor of current agglomerations. Second, the exclusion restriction states that historical population should affect pollution only through its effect on current population. The argument is that, if we go back in time far enough, structural change will have led to a reshaping of local economies such that historic population levels should be exogenous to current pollution levels. Suppose, for instance, that a city formed close to a river in pre-industrial times in order to benefit from the trade advantage conferred by the river. It may have grown into a densely populated and highly polluted place nowadays due to industrial and traffic pollution. Then, the exogeneity assumption would be satisfied, since today's agglomeration pattern and its effect on pollution (namely, motorized traffic and industrial production) differs from the historic one (trade).
In constructing historical instruments, there is a trade-off: on the one hand, the exogeneity argument forces us to go back sufficiently long in time. On the other hand, availability constraints force us to use more recent data in order to have a sufficient number of observations. 26 With respect to exogeneity, we believe that population counts prior to 1750 have the stronger arguments compared to more recent population instruments. Before the industrial revolution, which started in the second half of the 18th century, air pollution was probably not a decisive factor for migration decisions, whereas during industrialization, there seems to be already some evidence of sorting with respect to pollution (Heblich et al., 2021). 27 Using population in 1900, however, provides us with at least twice as many observations than more historical population counts. We will use population in 1900 as our main instrument, but outcomes do not differ much using population from pre-industrialized times. 28 We instrument population (density) as follows: where the instrument Z is historical population. The predicted values for population, ln(D GtS ), are then used in the second stage instead of actual measures of population in equation (1). We also present results from long-difference estimations. These regress the changes in exposure between the last and first year of observation on changes in population/density. The idea here is that there may be some unobserved differences between units that simultaneously affect population density and pollution. For instance, sorting of "green" individuals into large cities might lead to a negative correlation between density and exposure. This kind of heterogeneity is, given that it is time invariant, differenced out in the long-difference estimation. 26 We also experimented with soil quality and other natural causes as instruments like Borck and Schrauth (2021) and Combes et al. (2010). However, we were not able to find strong instruments that could explain agglomerations all around the world. 27 Heblich et al. (2021) argue that more polluted parts of cities in England were poorer as the rich sorted into less polluted areas. The authors find that those sorting patterns have persisted until today. 28 Borck and Schrauth (2021) analyse German data and show that historically dense places have no more industrial employment than less dense ones. Since industry was a prime polluter following the industrial revolution, this lends some credibility to the exclusion restriction.
We also estimate city level regressions, using FUAs as units of observation. This approach compares cities (between city estimates) within countries, but in addition also allows us to look at within-city effects. In within-city grid-level regressions we estimate where C is the city index. We drop all the city-specific control variables since we now control for city fixed effects θ C and only look at within-city differences.
In addition, we estimate spatial first difference (SFD) models, following the approach proposed by Druckenmiller and Hsiang (2018). This transfers the idea of first differences in time into physical space. In short, SFD regresses the differences in outcomes between neighbouring grid cells on the differences in controls between these same cells. Since the estimation differences out any unobserved factors that are common to neighbouring cells (such as possibly geographic and institutional factors that may be correlated with population density and pollution), this mitigates omitted variable bias. On the downside, some interesting variation is lost by only considering variation between neighbouring cells. Further details are in Appendix C.

Results
Before presenting our main outcomes using within-country OLS, we briefly compare OLS and IV coefficients first. We estimate IV regressions in order to gauge the magnitude and direction of potential biases. The reason for not using IV results as our favoured outcomes, as described above, is that we are only able to instrument a small subsample of all observed units, which moreover is primarily restricted to a developed world sample. 29 To compare OLS and IV results, we harmonize the sample to those cities or grid cells for which the corresponding instrument is available. In all regressions we control for the full set of trade, agricultural, and weather variables as well as logged GDP, ruggedness, latitude and an indicator for the presence of a dirty power plant. We show results with population in 1900 as instrument in Table 2. The first insight is that the sample size is drastically reduced when considering instrumental variables. With raster cells as units of observation, only slightly more than 1000 grid cells of the roughly 200,000 total cells in the whole sample remain. Regarding FUAs, we have historic population for about 10% of all cities. The comparison of OLS and IV results shows only very small and insignificant differences as soon as we include country fixed effects (columns 3,4,7, and 8). Using population before industrialization as instrument yields similar results (see Appendix Table A.1). The population instruments are exactly identified. Table 3 shows the first stage regressions. The F-statistic indicates that the instruments are strong. Hence, it seems like omitted variable bias or reverse causality does not cause large biases in the estimates. 30 In the remainder of the paper we focus on within-country regressions using the whole sample available for both grid cells and cities.

Raster-level outcomes
We first consider raster-level outcomes. Table 4 compares outcomes between simple OLS and within country regressions using the entire sample for both pollutants, NO 2 and PM 2.5 . As this will become important in our city-level results, we differentiate between the sum of population within a grid cell and population density of a cell. All specifications include our baseline covariates, i.e. weather, GDP, geographical characteristics, suitability for trade and agriculture, and a dummy for whether there is at least one highly polluting power plant within a grid cell. Examining the results shows that the coeffi-    cients for both total population and density are reduced in magnitude when we include country fixed effects to the PM 2.5 -exposure estimations, while the difference between the estimates is smaller for NO 2 . In other words, the within-country effect of density on pollution exposure is much smaller than the overall effect. This suggests, for PM 2.5 , that the effect of density is partly driven by certain highly polluted countries with densely populated grid cells. It might be, for instance, that some countries have policies that both limit migration to large cities and pollution. Taking the within-country estimates, our main results show an elasticity of pollution exposure with respect to density of 0.02 in the PM 2.5 regressions and 0.15 in the NO 2 regressions. This implies that doubling population density would result in a 1.3 percent increase in PM 2.5 exposure and a 10.7 percent increase in NO 2 exposure.
Looking at the other coefficients, we find that once we control for country fixed effects, grid cells with higher GDP are more polluted. 31 Pollution exposure rises with temperature. By contrast, precipitation and wind speed are negatively correlated with pollution exposure. Exposure is also strongly affected by dirty power plants.
Using the log of pollution exposure leads to the treatment of all zero observations as missing. To avoid this, we repeat the estimation by replacing all zero values to the minimum non-zero values observed in the data. 32 Interestingly, the pollution-density elasticity for both pollutants becomes significantly higher (0.2 for PM 2.5 and 0.24 for NO 2 in the regressions with country fixed effects, see Tab. A.2).
That the elasticity is so much lower when we exclude grid cells with zero outcome points to a significant non-linearity in the effect of (log) density on (log) exposure. To address this question from a slightly different angle, we now present non-linear regressions, where we include categorical variables for large and densely populated areas instead of continuous variables. We thus attempt to more directly measure an urban-rural gap. In order to do so, we categorize grid cells with less than 50,000 inhabitants and those with density below 100 persons per sq. km as 'rural'. 33 The results are shown in Table A.3. The urban-rural gap is clearly evident: Going from rural to urban raster cells significantly increases pollution exposure. We redo the exercise with 4 instead of only 2 categories (see Table A.4). There is some variation in the effects; still, we find that the effect of going from what we call 'rural' to 'urban' is larger than the effect of going from one urban category to the next (e.g. from low to moderate density). In summary, there is an urban-rural gap in pollution exposure which trumps the effect of increasing density within urban areas. 31 Interestingly, the coefficient on GDP is negative in the PM 2.5 regression without country fixed effects. This suggests that around the world, grid cells in higher income regions tend to be less exposed to pollution, but this effect is driven by the fact that these grid cells are predominantly located in less polluted high income countries. 32 This follows Henderson et al. (2018), who use the approach of setting observations with a zero for night lights to the minimum value in the sample in their estimates presented as main results.
In the remainder of the raster-level results, we will mainly report within-country effects and focus on population density, which makes results more comparable to previous literature. The main estimates in Table 4 assume a homogeneous relationship between pollution and population in the entire world. In order to check whether this relationship changes with geography, country income, and the like, we now consider various interactions to analyse the heterogeneity of this effect.

Heterogeneity and robustness.
We now turn to analysing heterogeneities in the pollution-density gradient across continents and countries at different stages of development. Figure 4 shows the results of running within-country regressions by country income groups and by continents, where income groups follow the World Bank classification into low, lower middle, upper middle, and high income. Population density is a significant determinant of pollution over all income groups and continents, but to a different extent. Figure 4a exhibits that the strongest within-country effects of density are found in low middle income countries regarding PM 2.5 -exposure and in upper middle income countries for NO 2 . Hence, it seems like the density effect is to some extent nonlinear in income, and middle-income countries tend to have a stronger effect of density on pollution than both low and high-income countries. A potential reason is that density is not "dirty" (Carozzi and Roth, 2022) in low income countries, because there is little dirty activity such as driving and heating, whereas in high income cities, cleaner transport modes (e.g. public transport) and residential energy use (heating and cooking with electricity or "modern fuels") may mitigate the effects of density. 34 In contrast, middle income country agglomerations may be dirtier than low income ones because there is more driving and residential energy use, but technologies for these activities are not as clean as in high income country cities. Fig. 4b shows differences by continent. For NO 2 , the density effect is smallest in Africa and largest in Asia. In Fig. 5 and 6, we further show the density coefficients for each country from individual within-country regressions on world maps. The maps show some interesting ramifications. The NO 2 density elasticities in Fig. 5 suggest that China and India -where most of the biggest and most polluted cities in the world are located -seem to have the strongest effect of density on NO 2 pollution exposure in Asia. In North America, Mexico and the US have larger effects than Canada, while within Europe countries from the south seem to have higher elasticities than countries in the north. Fig. 6 shows the country-specific density elasticities for PM 2.5 . In Asia and Europe, most of the countries that have large density elasticities for NO 2 also have large elasticities  Table 4). Income group definitions are taken from the World Bank.
for PM 2.5 , while the density effect in the Americas seems to be smaller for PM 2.5 than for NO 2 .  The Landscan data distribute population to grid cells using certain grid characteristics, but the exact algorithm is not known. While Henderson et al. (2021) provide a ground-truthing exercise, it might still be the case that the data generation biases the results. As a robustness check, we therefore run the regressions using administrative areas as units of observations 35 . In these data, where population is smoothly distributed among all grid cells within an administrative unit. 36 We present the results in Table A.5. Again, they hardly differ from our previous results. 37 Lastly, we run long-difference regressions of pollution changes between the last and first year in our sample on population or density changes in the same period. Thus, we control for any time-invariant unobserved heterogeneity between administrative units that might affect both population and pollution. For instance, it might be that within countries, population sorting leads to residents of dense cities being 'greener' on average, which would bias our estimates (downwards in this case). Analysing long differences within rasters differences out these time invariant unobserved heterogeneities. Results are shown in Table A .6. 38 In the long-difference estimates, all the variables that are time-constant drop out, so the only explanatory variable left besides the population data is GDP. As the Table shows, the long-difference estimates again show a positive and significant effect of both population and density on both pollutant-exposure measures. The magnitudes are now reversed, however: it seems that population changes now affect NO 2 -exposure more strongly than PM 2.5 . A potential explanation is that the variation in the cross section as well as over time is much lower for NO 2 than for PM 2.5 . This implies that a given change in population over time affects changes in NO 2 less than changes in PM 2.5 .

Channels.
An interesting question in interpreting the findings is what mechanisms could be responsible for the observed relationship between density and pollution exposure (see also Borck and Schrauth, 2021;Carozzi and Roth, 2022). While a complete investigation is made difficult by the scarcity of available data at a worldwide scale, we nonetheless try to shed some light on these channels here. We follow the analysis in Borck and Schrauth (2021) and leave out some sets of explanatory variables. We then compare the density coefficient with and without these variables. The direction of change of the coefficient then allows us to determine how these variables affect pollution directly and indirectly through their correlation with density.
We report the results of leaving out, one by one, different groups of our explanatory variables in Tab. A.7. Column (1) shows the baseline regression results, col. (2) leaves 35 We use gridded population of the world (GPWv4) data on administrative (GADM) level, see https: //sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev11. 36 This obviously introduces other biases. Nonetheless, it is reassuring to find that the results do not seem to be driven by biases in the computation of grid specific density. 37 Using the gridded GPW data without aggregating it to GADM level does not change the results either (results not shown here).
38 These estimates are also based on the GPWv4 data on the GADM (administrative unit) level. The reason is that the quality of LandScan data significantly improved over time and therefore comparisons over time should not be made, as stated by the data provider itself (see https://gistlandscan01.ornl.gov/frequently-asked-questions). out GDP, (3) the weather variables, (4) the trade variables (closeness to river, lake or coast) and (5) the agricultural variables (land suitability and biomes).
As can be seen in the table, the density coefficient rises in all columns compared to the baseline. For both pollutants, we find the largest increase when we leave out weather and agricultural suitability. This seems to imply that the density effect is driven most by the fact that dense areas are located, on average, in areas that have weather that is conducive to high pollution exposure (such as hot, dry areas with little wind). Moreover, dense areas seem, on average, to be located in places with a first nature that is advantageous to agriculture, which tends to increase pollution. Higher income and suitability for trade apparently drive the density effect to a lesser extent. Note: The graphs show the distribution of the population density / pollution exposure gradient, when running estimations for each country in the world separately. The coefficients are obtained from regressions for each country separately, controlling for trade variables (river, lake, coastline within 25km and continuous distance to coast measure), agricultural ones (biome indicators, land suitability for agriculture), weather (wind speed, temperature, precipitation) as well as ruggedness, latitude, log(GDP), and an indicator for a dirty power plant nearby. The green hollow diamond is the coefficient for Germany (DEU), and the red hollow triangle the one for the United States (USA). These are highlighted in order to compare them with findings by Borck and Schrauth (2021) and Carozzi and Roth (2022) respectively. Outliers (2.5% highest and lowest coefficients) are excluded from the graphical representation.

Raster-level outcomes and country characteristics.
We now look at how the pollution-density relation changes with country characteristics to get additional insights into its determinants. Figure 7 ranks all country-specific coefficients and plots them by their size. 39 All coefficients plotted come from regressions including control variables as presented in Table 4. The green hollow diamond and the red hollow triangle show the coefficients for Germany (DEU) and the US, in order to compare them to prior papers in the field (Carozzi and Roth, 2022;Borck and Schrauth, 2021). Both coefficients are somewhat smaller than the ones in those papers. Both Carozzi and Roth (2022) and Borck and Schrauth (2021) run extensive tests to more credibly estimate a causal effect within one country; however, the samples differ from the one used here. In particular, Borck and Schrauth (2021) study German counties (Kreise) and Carozzi and Roth (2022) US CBSA. This difference notwithstanding, we think it is reassuring to see that the magnitude of the coefficients is roughly in line with previously estimated ones from studies that are better able to address causality issues than we are. 40 About 70% of PM 2.5 coefficients and 75% of NO 2 coefficients lie in a range between 0 and .3. About 82% of PM 2.5 coefficients and about 85% of NO 2 coefficients are positive, where about half of the cases with negative coefficients have negative ones for both pollutants. It is interesting to briefly look at the outliers, i.e. countries with the 2.5 percent highest and lowest (negative) coefficients. In most instances, these turn out to be small island states such as Jamaica, Malta, East Timor (downward) or Bahamas, Barbados, Cap Verde (upwards). Figure 8 shows simple scatter plots between the country-specific density elasticity estimates of our within-country regressions and urbanization patterns (where, again, the density coefficients stem from regressions with all basic controls described above). In general, most of the correlations seem insignificant for PM 2.5 , while we do find some interesting correlations for NO 2 . As the figure shows, the more people live in urban areas, the stronger is the density effect on pollution, while the effect of the urbanization rate is also positive but somewhat weaker. This suggests that density is more likely to increase NO 2 pollution when many people live in cities, which underlines the non-linear effects described above. It also links the paper's results to the theory of city systems; indeed it seems like total exposure will be reduced by shifting individuals from denser to less dense regions (Borck and Tabuchi, 2019). Moreover and interestingly, the density coefficients are negatively correlated with renewable energy use (figure not shown). Intuitively, when energy use is relatively clean, packing residents densely together does not produce as much pollution as when countries rely largely on fossil fuels.
In the next subsection, we examine regression results when we explicitly consider cities defined as functional urban areas.

City-level outcomes
We now present results from regressions on the FUA sample. We use different variables to analyse the effect of population on pollution: (i) logged mean population density within a city polygon, and (ii) the logged total population. In addition, we can differentiate between the core city population (log(Pop urban centre)) and that of the surrounding Note: Scatter plots of the country-specific population density effect on pollution exposure correlated with different World Bank indicators as specified by each subtitle. The coefficients are obtained from regressions for each country separately, controlling for trade variables (river, lake, coastline within 25km and continuous distance to coast measure), agricultural ones (biome indicators, land suitability for agriculture), weather (wind speed, temperature, precipitation) as well as ruggedness, latitude, log(GDP), and an indicator for a dirty power plant nearby. Outliers (2.5% highest and lowest coefficients) are excluded from the graphical representation.
commuting area (log(Pop commuting)). This allows us to move in the direction of considering mechanisms for the relation we study. Sprawling cities with many commuters and single family homes might have different pollution levels compared to dense cities with high-rise buildings and without much long-distance commuting. Note: The table presents coefficients of OLS regressions with FUA as units of observations. All estimations include the following control variables: Trade controls (river, lake, coastline within 25km), land suitability for agriculture, temperature, precipitation as well as ruggedness, latitude log(GDP), and country-fixed effects (Country FE). Standard errors are robust. t statistics are in parentheses. Statistical significance indicators: * p < 0.05, * * p < 0.01, * * * p < 0.001. Table 5 compares the effects of population density with those of total city population and additionally differentiates between population in the urban centre in commuting zones. Interestingly, for both pollutants, the effect of population density is not significant, while total population positively affects pollution. The coefficients indicate that a 1% increase in total population increases PM 2.5 exposure by 0.08 percent and NO 2 exposure by 0.95 percent. Compared to raster-level results, the population coefficient is larger for PM 2.5 and smaller for NO 2 . Hence, it seems that density per se does not drive higher pollution exposure; rather, the relation seems to be driven by the way that large populations are organized spatially within cities.
To elaborate on this theme, in columns (3) and (6), we distinguish between core city and commuting population. For PM 2.5 , we find that the coefficient of core city population is insignificant, while that on commuting population is positive and significant. For NO 2 , both are significant, but the coefficient on commuting population is about three times as large. Consequently, it seems that large cities per se are not more polluted than smaller ones. Rather, pollution seems to be significantly higher in cities with a large fraction of people commuting into the city from satellite cities. These findings thus shed more light on the link between density, population distribution and pollution exposure. It seems that, when we only look at cities, large and dense development does not need to be bad for the environment. This may be due to the fact that these urban features promote the use of clean public transport and energy efficient buildings, which may partly offset the increased pollution exposure stemming from a high concentration of polluting activities. 41 Figures 9 and 10 again show the distribution of coefficients by continents and income groups. For PM 2.5 , there is no clear trend by income. For NO 2 , however, the population effect is strongest among upper middle and high income countries. The upper panel of Fig. 10 shows that for PM 2.5 , there are no pronounced differences in the density effect between continents. For NO 2 , the population and density effects are lowest in Africa and Asia. The effect of commuting population is also lowest in Africa.

City level outcomes and country characteristics.
Just as for the raster-level results, we repeat the exercise of relating the country-specific population-pollution coefficients to country characteristics, such as urbanization rates and income. We again present scatter plots, where each point shows the country-specific coefficient of population on pollution exposure. Figure 11 shows the plots. The correlations for NO 2 again seem to be stronger than for PM 2.5 . We find that agglomeration seems to affect pollution more the higher the urbanization rate and the share of population living in large agglomerations of more than 1 million people.

Within city regressions.
Tab. 6 shows the results of within-city regressions, that is, we compare raster cells that lie within the same FUA. Since we control for city fixed effects, all FUA-level variables are absorbed by them, so there are no additional control Note: Coefficients of different population measures on pollution exposure by subgroups. The coefficients are obtained from regressions for each subgroup separately, controlling for trade variables (river, lake, coastline within 25km and continuous distance to coast measure), agricultural ones (biome indicators, land suitability for agriculture), weather (wind speed, temperature, precipitation) as well as ruggedness, latitude, log(GDP), and an indicator for a dirty power plant nearby. Note: Coefficients of different population measures on pollution exposure by subgroups. The coefficients are obtained from regressions for each subgroup separately, controlling for trade variables (river, lake, coastline within 25km and continuous distance to coast measure), agricultural ones (biome indicators, land suitability for agriculture), weather (wind speed, temperature, precipitation) as well as ruggedness, latitude, log(GDP), and an indicator for a dirty power plant nearby. Note: Scatter plots of the country-specific population density effect on pollution exposure correlated with different World Bank indicators as specified by each subtitle. The coefficients are obtained from regressions for each country separately, controlling for trade variables (river, lake, coastline within 25km and continuous distance to coast measure), agricultural ones (biome indicators, land suitability for agriculture), weather (wind speed, temperature, precipitation) as well as ruggedness, latitude, log(GDP), and an indicator for a dirty power plant nearby. Outliers (2.5% highest and lowest coefficients) are excluded from the graphical representation. variables. The table shows that the population effect for PM 2.5 is about the same as in the baseline raster results. For NO 2 , however, the coefficient is much lower. This may be due to the smaller variation within cities, or the fact that densely populated raster cells tend to lie in polluted cities. Beyond that, the smaller variation within cities for NO 2 is somewhat mechanically caused by the larger size of the grid cells. Figure 12 shows differences of results by income groups and by continents. Within cities, the pollution-population gradient steadily increases with income. Hence, different from what we observed before, higher income countries seem to have especially strong pollution in densely populated areas within cities. Looking at differences by continents, American cities exhibit the highest and African cities the lowest pollution-population gradient. Spatial First Differences. Tab. A.8 in the Appendix shows the results from spatial first differences (SFD) regressions. Again, we find that for both pollutants, the results remain positive and significant. This result is reassuring. The SFD estimate differences out any unobserved heterogeneity that is common to neighbouring cells. The result thus further lends credence to the relationship between density and pollution exposure at a local level.
However, the effect of density on pollution is much smaller when we consider differences between neighbouring cells than when we compare the entire sample, especially for NO 2 . Intuitively, the variation in pollution exposure and density is much smaller between neighbouring cells than in the entire sample, which likely explains the smaller effects.

Counterfactual simulation
In order to quantify the effects of the population exposure relation by country, we present results from a counterfactual simulation in this section, where we compute the countryspecific change in exposure from equalizing population across all cities in a country. We do this counterfactual with the FUA sample. We thus assume that a country's population is given by its population living in FUAs. The counterfactual then answers the question: what would be the effect on total exposure if all cities had the same population? Let country S have M S cities with population N i and total population N S = M S j=1 n i . We want to compute the effect of redistributing population equally among cities within a country. Consequently, all cities have identical counterfactual population N ′ i = N S /M S .
From our estimates, we predict current total exposure for city i asẼ i = N 1+ρ i , where ρ is the estimated exposure-population elasticity. Total exposure is then E = jẼj . 42 Now consider a counterfactual where city i's population is changed to N ′ i = N S /M S , so all cities are equally large. The counterfactual exposure in city i is is the proportional change in population. Total counterfactual exposure in the country is E ′ = j E ′ j . We can then compute the percentage change in exposure, ∆E =Ê − 1 = E ′ /E − 1.
We estimate ρ for all countries with at least 15 FUAs in our sample. We report the estimates of ρ along with total population, number of cities and the counterfactual change in exposure, ∆E, for the 10 countries with the smallest and largest change in Tab. 7 for PM 2.5 and in Tab. 8 for NO 2 .
Since total exposure is convex in population if and only if ρ > 0, all countries with a positive estimated population elasticity would benefit from a reduction in total exposure induced by population smoothing. It is apparent from Tab. 7 and 8 that the countries with the largest percentage drop in total exposure are those with the largest population elasticity. There are, however, some differences in the composition of countries. For PM 2.5 , we see the largest drops in total exposure in countries in East Asia/Pacific and Sub-Saharan Africa, plus two in Latin America/Carribean. The countries with the largest increase in exposure (where the population elasticity is negative) tend to be lower income countries in Latin America/Carribean and Sub-Saharan Africa, plus two in Middle East/North Africa.
For NO 2 , the countries with the largest percentage drop in total exposure tend to be in East Asia/Pacific and Latin America/Carribean, while the ones with increases in exposure are almost all in Sub-Saharan Africa, with the exception of North Korea.  Note: ρ is the estimated within-country population elasticity of exposure. ∆E is the percentage change in exposure in the counterfactual relative to the baseline.

Conclusion
This paper has studied the effect of population and population density on pollution exposure using worldwide gridded data. We find that population density increases exposure. Using city-level data, we find that population size, rather than density, increases exposure. Further, the reason seems not to be a large core city population, but rather a large population commuting into the core city. Lastly, we find positive but smaller effects of density on pollution exposure at more local levels.
We also document heterogeneities of the density effects across countries. Using the entire rasterized data as observational units, the influence of population seems largest in Asia and in middle-income countries. In the FUA sample, population affects pollution most in upper middle and high income countries as well as in Europe and North America.
Finally, we study how reallocating population among cities within countries affects exposure. For most countries, exposure would fall if population were equalized across cities, since total city exposure is convex in city population. This allowsus to connect the paper to the literature on optimal city size (Borck and Tabuchi, 2019). Using country specific exposure-population elasticities, we could in principle study how the distribution of optimal city size is determined by the trade-off between agglomeration benefits and costs, stemming from the increase in exposure.

C Spatial First Differences
The SFD estimation considers only differences between neighbouring cells. The crucial assumption to hold in this context is that which states that pollution y in adjacent neighbouring grid cells i and i − 1 would be equal if they had the same population density x i−1 . The authors refer to this as the "Local Conditional Independence Assumption" (LCIA). We estimate the following equation: where ∆ is the difference operator. To make sure that the LCIA assumption holds, we add country fixed effects θ c to our regressions. In a nutshell, we then compare neighbouring cells within countries.
Since there are neighbouring cells in North-South and in East-West direction, we will always run two separate regressions: one for horizontal neighbours (East-West direction) and one for vertical neighbours (North-South). We consider the SFD results as a lower bound on the density effects, since it does not allow to account for many potentially important differences between non-neighbouring cells. Note: The table presents coefficients of Spatial first differences regressions including different sets of control variables. For all control variables, coefficients are provided in the table. Standard errors (in parentheses) are clustered within three-by-three squares of grid cells times year. Statistical significance indicators: * p < 0.05, * * p < 0.01, * * * p < 0.001.