Point Pattern Analysis

Introduction:

Point pattern analysis is the evaluation of the pattern, or distribution, of a set of points on a surface. It can refer to the actual spatial or temporal location of these points or also include data from point sources. It is one of the most fundamental concepts in geography and spatial analysis. Point pattern analysis is particularly appropriate for population science, as human statistics often come in point or single event form. As in many spatial analysis techniques, distribution across space can also be conceived of and analyzed as distribution across time: ‘study area’ can be replaced with ‘study time’ and distance from a point in space can be replaced with distance from a point in time. A range of methods can be applied to point pattern analysis, ranging from the simple to the complex. There are many similarities between the statistics used for point pattern analysis and those used in number analysis, although point pattern analysis also includes some specialty techniques. In this article I will discuss some of the more common methods and how they can be applied to problems in population science.

Descriptive Statistics:

In its most basic form, we can think of point pattern analysis as an attempt to analyze the occurrence of points in a particular space. Often the first question asked is simply, how many points are there? For example, how many hospitals are there in a neighborhood of a city? How does that compare to a differing neighborhood, or that city as a whole? This is called studying the point’s frequency, intensity, or abundance. To answer this and some other basic questions we can use the simple descriptive statistics that we would use for a numerical datasets: Count, Mean, Median, and Standard Deviation. Applying these, we can describe how dense a pattern is, where the center of a set of points is, and how dispersed these points are.

Frequency and Density: Frequency is the most basic way of evaluating a spatial pattern and is simply counting the number of points in your study area. To get density, you divide this total by unit of area or time in whatever units you deem are most appropriate: Hospitals per square mile, violent crimes per month, etc. Frequency and density should almost always be determined at the start of your analysis.

Descriptive Spatial Statistics:  Many of the standard descriptive statistics can be applied or slightly modified to describe spatial data. Mean center is simply the mean of the X and the mean of the Y coordinates for your set of points, giving you the middle of your point pattern. Median Center is a slightly different way to calculate a middle (actually quite complicated to compute) and is the point in a pattern which minimizes the distance between itself and all other points. This can be seen in Figure 1.

Figure 1

Figure 1

Standard Distance Deviation: is the standard deviation of the distance of each point from the mean center. It is the spatial equivalent of standard deviation and likewise provides one with an idea of the deviation or variance of a dataset as a whole, or the deviation of a particular point from the rest. Standard Deviational Ellipse is a modified version of standard distance that captures the shape of this distribution by showing any directional bias in the pattern, as seen in Figure 2.

Figure 2

Figure 2

Other Statistical Operations: These are some of the many spatial statistics that can be treated as analogous to numerical data sets. Just as one can examine the median, or the distribution of a numerical data set, one can perform the same operations on spatial statistics. For example, a research question might be how does a particular frequency compare to another pattern, or another area? If Detroit averages one supermarket per square mile, how does that compare to New York’s fifteen supermarkets per square mile? Once this measure of frequency is reduced to a number, the entire set of non-spatial statistical tests can be performed, such as T-test comparing the frequency of two study areas, or K-S tests for comparing frequency distributions.

Dispersion & Arrangement:

Another set of questions in point pattern analysis concerns the relative pattern or arrangement of the points. Point patterns can be categorized as random, uniform, clustered or dispersed along the following two continuums:

•    Random vs. Uniform (stratified, regular)
•    Clustered vs. Dispersed

These two continuums are not necessarily related, and therefore a pattern of points could be randomly distributed in a clustered way (far right image Figure 3), or stratified and dispersed (middle image Figure 3). The attributes of one continuum have no effect on the attributes of the other continuum. The difference between these types of distributions can be easily seen in our example, but there are techniques to quantify the amount of stratification or clustering. This allows one to determine exactly how clustered something is, or to compare two sets of points. One can see applications to population science, for example being able to determine that the incidence of a disease is randomly distributed across a city or that it is clustered around a particular pollution source.

Figure 3

Figure 3

Complete Spatial Randomness: There are a number of techniques specifically designed for pattern analysis of point data. A concept that is common in these techniques is complete spatial randomness (CSR). CSR is a random or Poisson distribution of points in an area, the spatial version of a random or Poisson distribution of values in a numerical data set. Actual data is often contrasted with CSR and serves as a baseline for analysis.

Quadrat Analysis: In quadrat analysis you divide your study area into subsections of equal size, count the frequency of points in each subsection and then calculate the frequency of points in each subsection. See Figure 4 for an example of this method.

Figure 4

Figure 4

This can be performed for an entire data set, or can be conducted as a sampling technique to give one an idea about the total frequency or distribution of a larger area, as in Figure 5.

Figure 5

Figure 5

The subsections used are normally square, but can be of any shape depending on need. This method provides a study-specific, numerical measure of frequency. The different values for each cell also give a mean value of frequency of points across the study area, as well as the ability to calculate variance between subsections. These variance and mean frequency statistics are commonly combined to create a further statistic called variance to mean ratio (VMR). This is a unit-less statistic describing the spatial arrangement of your points. In general stratified distributions have a VMR of approximately 0, random distributions a VMR of 1 and clustered distributions a VMR of above 1.

Quadrat analysis is a simple, customizable, and valuable analysis tool. However, there are some criticisms of the method. Primarily, these arise from the ability to vary the size of the quadrat regions for each study. Therefore, quadrat analysis can provide different results for the same dataset using different sizes (a version of the Modifiable Areal Unit Problem). Also, the result of quadrat analysis is a single measure of frequency for the entire study, so variations within the regions are disguised. This method must be applied in an appropriate manner for each study and an understanding of the dataset is needed before performing this technique

Nearest Neighbor: In this analysis the distance of each point to its nearest neighbor is measured and the average nearest neighbor distance for all points is determined Figure 6.

Figure 6

Figure 6

The results can be compared between two actual areas, or by comparing your study area to CSR. An index statistic called the Nearest Neighbor Index (NNI) can be created by dividing the observed average distance by the expected average distance (of CSR) to create a unit-less measure of dispersion. Generally, random patterns have a NNI value of about 1, clustered patterns 0, and dispersed patterns larger than 2. An application of this technique might be studying how far apart hospital emergency rooms were to each other.

Software Tools:

All GIS software offers tools to conduct point pattern analysis. Two softwares specifically for this were created by Dr. Jared Aldstadt and others. A DOS/Linux software package called PPA is available for free download at this webpage: www.acsu.buffalo.edu/~geojared/tools.htm. The same team has also created a point pattern analysis web-tool, hosted here: www.nku.edu/~longa/cgi-bin/cgi-tcl-examples/generic/ppa/ppa.cgi.

Conclusion:

These are some of the more common spatial analysis techniques used for point patterns. There are a number of refinements and extensions of these techniques, including Ripleys K Function, the G function, the F function, and Moran’s I. Point pattern analysis, interpolation, and visualization are powerful techniques for analyzing population data, and are easy to add to your arsenal. View the Additional Reading section for more advanced work on this topic.

Related Concepts:
  • Spatial Dependence: Defined by Michael Goodchild as “the propensity for nearby locations to influence each other and to possess similar attributes”, spatial dependence is when spatial data are not independent from each other, violating an underlying assumption of many statistics. Statistical measures of the similarity of points are called spatial autocorrelation. For more one this topic, see: Spatial Analysis and GIS: Spatial Dependence at CSSIS.org.
  • Edge Effects: Edge effects are a type of Boundary problem, an issue particular to spatial analysis. They are artifacts produced by the necessity of drawing a limit around your chosen study area. For more one this topic see: Boundary Problems at Wikipedia.org.
  • First-order and second order effects in spatial analysis: First-order effects concern the way in which the expected value of the process varies across space, while second-order effects describe the correlation between values of the process at different regions in space. Region-wide trends are first-order effects while spatial dependence is a second-order effect.
  • Stationary: An adjective in spatial analysis which means there is no first-order effects for the particular object being studied.
Additional Reading:

Boots, B. N., and Arthur Getis. 1988. Point pattern analysis. Newbury Park, Calif: Sage Publications. “Boots and Getis provide a concise explanation of point pattern analysis – a series of techniques for identifying patterns of clustering or regularity in a set of geographical locations. They discuss quadrat and distance methods of measurement, and consider the problems associated with these methods. The authors also outline and compare other measures of arrangement, suggesting when these techniques should be used.” (Publishers Review)

Gatrell, Anthony C., Trevor C. Bailey, Peter J. Diggle, and Barry S. Rowlingson. 1996: Spatial point pattern analysis and its application in geographical epidemiology. Transactions of the Institute of British Geographers 21: 256-274.
An article length discussion of spatial point pattern analysis and how it can be applied to population science problems. A recommended first read for those looking for a more advanced understanding.

Fortin, Marie‐Josée, Mark R. T. Dale, and Jay Hoef. 2002. Spatial analysis in Ecology, in Encyclopedia of Environmetrics, Volume 4, edited by A. H. El-Shaarawi. pp 2051–2058, John Wiley and Sons.
An advanced article that discusses a number of spatial statistics, including nearest neighbor analysis, Ripley’s K, Moran’s I, and variograms. It provides equations for all of the above and more.

Fotheringham, A. Stewart, Chris Brunsdon, and Martin Charlton. 2000. Quantitative geography: perspectives on spatial data analysis. London: Sage Publications.
An excellent book on spatial data analysis, which includes a full chapter on point pattern analysis in which all of the above concepts are discussed in a more thorough manner.

Written By:
Daniel Ervin
PhD Student, Geography
University of California Santa Barbara
dervin at umail.ucsb.edu