Geostatistics

A very brief introduction to geostatistics

Geostatistics was initially used by geologists and geological surveyors who were interested in mapping geological phenomena across landscapes. It is now used across a broad range of disciplines and for a wide variety of data types. Essentially, geostatistics is a subfield of statistics that deals with spatial data and their distributions. (Spatial data are data that include spatial coordinates such as longitude and latitude. For example, a demographer might record numbers of deaths at specific locations on a map.)

Probably the earliest use of geostatistics was for mining efforts (Krige, 1951). For example, a geological surveyor could have information about the quality or amount of an ore that had been sampled at different locations and might want to project estimations of quality or amount in between survey sites.

Take Figure 1 as an example from a fictional dataset in which demographers might be interested. Circles indicate villages; the size of the circle corresponds to the fertility rate in that village. You can see that the fertility rates in between villages A and B are quite high and those villages are close together. We might also expect that any un-sampled villages in between villages A and B might have high fertility rates as well. Conversely, villages C and D are far apart. While both appear to have low fertility rates we should probably be less confident in making fertility estimates in between villages C and D. Geostatistics provides a formal way for estimating what is going on in between these villages.  The same basic techniques can be applied to events or factors such as crimes, deaths, births, disease contractions, exposures, or even subgroups of a population.

Geospatial data are frequently expressed as point data; however, unlike point pattern analysis, the points and their clusters are not necessarily the primary focus. Typically, the goal is to estimate the occurrence or prevalence of some phenomenon (event of interest) at a point where data have not actually been collected or to estimate an average occurrence across the landscape, sampling at arbitrarily drawn sites. While the field of geostatistics includes a wide and growing variety of methods, uses, and intentions, it can be divided into several overarching themes.

Kriging
Kriging includes methods that deal with interpolation between spatial data points. Interpolation is a method for estimating data values in between points with known data values. Going back to figure 1, we might want to interpolate data in between villages A and B or villages C and D. Kriging is a frequently used technique for interpolation in geostatistics (Matheron, 1963). Basically, kriging is a linear, least-squares method of interpolation; it is closely related to linear regression analyses. It borrows data from collected points and attempts to estimate values in between those points, using various methods to minimize expected error in those estimates. Such methods always include assumptions; perhaps the most common assumption is that there is some stationarity in the processes across the landscape being measured. For example, we might assume that whatever is making the fertility rate high in villages A and B is also present in the villages (not shown) between villages A and B.

Variograms (or Semivariograms)
Another major theme in geostatistics deals with spatial correlations and/or spatial continuity. The variogram is a quantitative measure of the standardized smoothness of a data set. Sometimes variograms are referred to as semivariograms. Although technically there is a difference between the two, the terms are frequently used interchangeably. Essentially, both measure potential changes in correlations between data, based on the geographical proximity of the spatial points at which the data are measured. If you had a dataset that includes many spatial data points, you could measure the difference based on specific points (with a semivariogram) or between all possible combinations of those points (with a variogram). While such measurements are sometimes of interest by themselves, variograms and semivariograms are also used to calculate weighting schemes in kriging.

Following Tobler’s first law, things that are closer together tend to be more alike (Tobler, 1970). Conversely, we might expect that things that are further apart are more likely to be different. For example, fertility rates in nearby villages may be more alike than fertility rates in villages that are very distant from each other. Using a variogram, we can see how quickly things begin to become different as we look at other pairs of data points that are at further distances from each other. Figure 2 is a hypothetical variogram.

 

The x-axis shows distance between two points and the y-axis shows a measure of variance or degree of difference between two points. You can see in this variogram that things that are at the exact same place (point 0 on the x-axis) have zero variance; they are exactly the same. On the other hand, things that are at a distance of 10 units from each other have a high degree of difference, reaching about 3.5 in our measurement. An interesting aspect of this variogram  is that variation by distance increases quickly in the first few units of distance but begins to level off over time.  Another situation might be where things increasingly become different with distance in a linear relationship.  For example, if there were a 1-unit increase in variance, for every 1-unit increase in distance between points, the line in the variogram would be linear instead of curved.  Therefore, by using a variogram, we can see if relationships between clustered phenomena change very quickly or very slowly as we zoom out from the original unit of observation.

Conclusion
Geostatistics offer a standardized means of understanding the distribution of various phenomena—from mineral deposits to crimes to demographic variables—across landscapes. While it was borne from geology, much of its focus has been on modeling and understanding things that are important to geologists. However, there has been a move in several other disciplines to use the powerful methods offered by geostatistics.

Other Resources:
Keep in mind that much of the literature on geostatistics, especially older literature, will be in the field of geology. This certainly doesn’t mean that the methods won’t apply to some demographic or other social phenomena! The following should get you started:

Diggle, P.J.  and P.J. Ribeiro, Jr. 2007. Model-based geostatistics. Springer.

Stein, M.l L. 1999. Interpolation of Spatial Data: Some Theory for Kriging. Springer.

The following book has a section devoted to geostatistics, including several nice examples using the software R:

Bivand, R.S., E.J. Pebesma, V. Gomez-Rubio. 2008. Applied Spatial Data Analysis with R. Springer.

 

Written By:

Daniel Parker
Ph.D. Student, Anthropology and Demography
The Pennsylvania State University