Duranton and Overman (2005) Replication

Table of Contents
1. Overview	2. Objective
3. Data and Simulation Design	4. Methodology
5. Results and Visualization	6. Formal Algorithmic Test
7. Results and Conclusion	8. Reference
9. Author Notes	10. Dependencies

1. Overview

This project replicates and illustrates the methodology of Duranton and Overman (2005), “Testing for Localization Using Micro-Geographic Data” (Review of Economic Studies, 72(4): 1077–1106).
Their paper introduces a distance-based test for localization, which determines whether firms in a specific industry are more spatially clustered or dispersed than would be expected under random placement.

In this replication, we simulate both unbiased (random) and biased (clustered) firm location data to test whether the Duranton–Overman methodology can correctly detect agglomeration when it is present.
By comparing pairwise distance distributions of randomly distributed firms against intentionally clustered industries, the project demonstrates how the method identifies significant deviations from spatial randomness.

To do so, the replication builds a Monte Carlo counterfactual benchmark representing the expected distribution of inter-firm distances under complete spatial randomness. Observed industries are then compared against this benchmark to test whether their spatial patterns reflect genuine clustering rather than random variation.

2. Objective

The replication demonstrates how to:

Simulate unbiased (uniformly distributed) firm locations as a control sample.
Simulate biased (spatially clustered) firm locations to represent agglomeration.
Construct a Monte Carlo counterfactual for random spatial distributions.
Bootstrap density estimates to generate 95% confidence bands for expected randomness.
Test whether Duranton & Overman’s localization method successfully detects clustering in biased data while not flagging unbiased data.

3. Data and Simulation Design

Parameter	Description
Population	1,000 firms uniformly distributed across a 2D grid ([0, 1] × [0, 1])
Industry A	100 firms drawn from the uniform grid (expected to behave randomly)
Industry B	100 firms drawn from $$X \sim \mathcal{N}(\mu = 0.2,, \sigma^2 = 0.25)$$ to represent biased data (clustering)
Monte Carlo Simulations	10,000 random draws of 100 firms each to form the counterfactual

All data are generated synthetically; no external datasets are required.

4. Methodology

4.1 Concept

The method replicates Duranton and Overman’s approach by comparing pairwise inter-firm distance distributions:

Randomly draw sets of firm locations to simulate spatial randomness.
Compute the density of inter-firm distances for each draw.
Aggregate simulations to construct an expected “random” distribution.
Calculate 95% confidence intervals (bands) for the random case.
Compare observed industry distance densities to these bands.

4.2 Monte Carlo Simulation Framework

In this replication, the Monte Carlo simulation serves as the random baseline for comparison against the observed industries.
Each iteration randomly selects 100 firms out of 1,000 possible locations within a uniform 2D grid.
This process is repeated 10,000 times, and in each simulation all pairwise distances between the sampled firms are computed.

This approach models the inherent randomness of spatial spacing — what inter-firm distances would look like if there were no clustering forces at all.
By aggregating these simulated distance distributions, we obtain a smooth estimate of the expected distance density under spatial randomness.

The resulting distribution forms the Monte Carlo confidence band, which represents the natural variability of random spatial configurations.
Industries whose observed distance distributions fall outside this band are interpreted as exhibiting non-random spatial behavior — specifically,

Higher density at short distances → spatial clustering or agglomeration, and
Lower density at short distances → spatial dispersion.

This setup provides a clear benchmark for testing whether Duranton and Overman’s localization method can correctly identify biased (clustered) data while not flagging unbiased (random) data as significant.

5. Results and Visualization

Unbiased (Random) Industry Comparison

The first visualization compares Industry A (uniformly distributed) with the Monte Carlo baseline.
The gray ribbon shows the 95% confidence interval, the blue line represents the random benchmark, and the red line represents the observed industry.

Interpretation:
Industry A’s curve remains within the confidence band, suggesting no spatial localization beyond inherent randomness.

Biased (Clustered) Industry Comparison

The second visualization introduces Industry B, drawn from a clustered normal distribution, and compares both industries to the Monte Carlo benchmark.

Interpretation:
The curve for Industry B exceeded the upper confidence band at short distances, revealing higher-than-expected density among nearby firms.
The formal algorithmic test confirmed this as a statistically significant deviation from randomness, consistent with the presence of spatial agglomeration.

6. Converting visual test to logic for a formal algorithmic based test

To complement the visual comparison of distance densities, this replication creates a formal algorithmic test function.

For each distance value, the observed industry’s kernel density estimate is compared to the upper and lower bounds of the 95% Monte Carlo confidence band.
If any part of the observed curve exceeds the upper bound, or falls below the lower bound, it indicates significant deviation from statistical randomness
If the entire curve remains within the bounds, we fail to reject the null hypothesis of spatial randomness.

By combining this logical test with the visualization, the replication verifies that the Duranton–Overman method correctly flags biased (clustered) data while recognizing unbiased (random) distributions as spatially neutral.

# If the density curve of the tested density ever falls out of the confidence band of the unbiased sample, then we reject the null (no significant deviance), and conclude there is agglomeration in this industry that is different from randomness

# Define outside of upper CI bound
outside_upper_a<- density_industry$y > CI_df$upper
# Define outside of lower CI bound
outside_lower_a<- density_industry$y < CI_df$lower

# write if statement to print message if any point is outside
if (any(outside_upper_a | outside_lower_a)){
  print("Statistical deviance from inherent randomness")
} else {
  print("No statisical difference from randomness")
}

8. Results and Conclusion

The replication demonstrates that Duranton and Overman’s (2005) distance-based localization method performs as intended when applied to simulated data.
Using the Monte Carlo counterfactual, the test effectively distinguishes between industries that are spatially random and those that are intentionally clustered.

Overall, this replication confirms that Duranton and Overman’s distance-based localization test is a robust statistical tool for identifying spatial concentration in firm distributions.
By simulating both unbiased and biased data, the project shows that the method can accurately distinguish random spacing from genuine clustering — even in a simplified, synthetic environment.

However, one limitation of this approach emerges at the extreme tails of the distance distribution.
Because the confidence bands are extremely narrow at these edges, the algorithmic test can occasionally misclassify random noise as statistical deviation, even when visual inspection clearly suggests that the pattern remains random. This sensitivity in the tails highlights the importance of interpreting the logical test in conjunction with the density plots, rather than relying solely on automated detection.

This work serves as a pedagogical replication, illustrating how spatial econometric methods can be validated through simulation.
It highlights the usefulness of Monte Carlo counterfactuals in modeling expected spatial randomness.

9. Reference

Duranton, Gilles & Overman, Henry. (2005). Testing for Localization Using Micro-Geographic Data. Review of Economic Studies. 72. 1077-1106. 10.1111/0034-6527.00362.

10. Author Notes

This simplified replication captures the statistical and computational core of the Duranton–Overman test using fully synthetic data.
It is intended for methodological and pedagogical purposes rather than empirical analysis.

11. Dependencies

This replication was conducted in R (≥ 4.3) and uses the following packages:

dplyr
spatstat
tibble
ggplot2

All other operations use base R functions. No external libraries or data sources are required.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Figures		Figures
Duranton and Overman Replication.Rmd		Duranton and Overman Replication.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Duranton and Overman (2005) Replication

1. Overview

2. Objective

3. Data and Simulation Design

4. Methodology

4.1 Concept

4.2 Monte Carlo Simulation Framework

5. Results and Visualization

Unbiased (Random) Industry Comparison

Biased (Clustered) Industry Comparison

6. Converting visual test to logic for a formal algorithmic based test

8. Results and Conclusion

9. Reference

10. Author Notes

11. Dependencies

About

Uh oh!

Releases

Packages

RoryQo/Assessing-Firm-Agglomeration-Using-Simulated-Counterfactuals

Folders and files

Latest commit

History

Repository files navigation

Duranton and Overman (2005) Replication

1. Overview

2. Objective

3. Data and Simulation Design

4. Methodology

4.1 Concept

4.2 Monte Carlo Simulation Framework

5. Results and Visualization

Unbiased (Random) Industry Comparison

Biased (Clustered) Industry Comparison

6. Converting visual test to logic for a formal algorithmic based test

8. Results and Conclusion

9. Reference

10. Author Notes

11. Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages