Spatial Analysis in R

A very brief introduction to R

R, or GNU S, is a powerful statistical programming language, statistical software environment, and graphics creator that is offered free of charge. R is increasingly used in a broad range of analyses, across many disciplines and substantive topics. Here we briefly introduce spatial analysis using R. The intention is not to offer an in-depth presentation, but to help the reader become familiar with the look and feel of the R environment with regards to spatial analysis. Arguably, one of the best aspects of using R is the widespread availability of help literature, both online and in hard-copy form. If there is something you would like to know more about, you can easily search the internet with the type of analysis you are interested in plus the term “R cran.”

I will address three main components of spatial analysis:

(1) plotting maps

(2) exploratory spatial data analysis

(3) formal spatial modeling

First, a little about using the R console:
R is a command line driven statistical program. For example, after opening R, I could simply type the following into the console:

1 + 1

and after hitting the return key, R would return: 2.” It is also possible to name this addition problem. For example, if I enter:

Add1 <- 1 + 1

R will store the solution and will produce it if I enter the term Add1.”

RIntro

Spatial data come in several different formats (lines, polygons, points, etc.) R is well equipped to handle many different types of data formats. In order to work with spatial data in R, the user will most likely need to download special packages that have been created to work with different types of data in R. For example, the package “sp” is a package for handling spatial classes of data and for viewing and plotting the data. Here I am plotting data from mineralogical surveys conducted near the Meuse River in Europe. This data set is commonly used for illustrating spatial analysis in R. First I plot the point data (these are points from which geological samples were taken.)

MeusePoint

These data points are clustered on the bank of the Meuse River, which I have plotted below in the form of polygon data:

We could also plot the survey area using grid data:

MeuseGrid

And perhaps the most useful thing to do is to visualize all of it together:

MeuseCombined

A next step in spatial analysis might be to conduct some exploratory spatial data analyses. For example, I might want to test for spatial autocorrelation using a Moran’s I scatter plot. Here I am using a package named GeoXp, an interactive spatial analysis package, and a dataset listing real estate prices from Baltimore, MD in 1978.

 

GeoXp allows the user to select data points and view where they are on the map interactively. Here it looks like we do have some spatial autocorrelation. High prices are clustered with high prices and low prices are clustered with low prices. Moran’s I scatter plots and statistics can be calculated in several other R packages as well.

For many spatial analyses, the last step is to fit a model to the data. There are many types of spatial models that can be implemented in R. For the purpose of this introduction, I use the packages “sp,” “rgdal,” and “spdep,” as well as some leukemia data from New York State (http://rss.acs.unt.edu/Rdoc/library/spdep/html/NY_data.html). Trichloroethylene (TCE) is an industrial solvent that persists in the environment and has been linked to disease. Here I am using TCE industrial sites as spatial points and the proximity of populations to those sites as potential factors in leukemia prevalence.

First I map the location (New York District 8), showing the connectivity matrix that I use to create spatial weights.

 

Then I fit the data to a simultaneous autoregressive model:

 

Next I decide to add a set of weights for population density within census tracts, obtaining the following results. Note that p-values have changed and that the AIC is smaller in the second model, indicating a better fit.

 

Other resources and references:

As I have alluded to previously, there are many packages and resources for doing spatial analysis in R. A comprehensive list can be found here: http://cran.r-project.org/web/views/Spatial.html. Also, see: http://spatialanalysis.co.uk/r/, http://genetics.agrsci.dk/statistics/courses/Rcourse-DJF2006/day3/spatial-slides.pdf, and http://www.itc.nl/~rossiter/teach/R/RSpatialIntro_ov.pdf.

A nice introduction to spatial analysis in R (with many good citations for theoretical background) can be found in the book Applied Spatial Data Analysis with R by Bivand et al. (2008).

Finally, when you have a package loaded in R (using the “library(package name)” command, you can request help through R using the “help(function name)” command. The help command will take you to an html page that lists the arguments for the function and will usually provide some examples.  Otherwise, a simple Google search asking “how do I _____ in R” will generally lead you in the right direction.

 

Written By:

Daniel Parker
Ph.D. Student, Anthropology and Demography
The Pennsylvania State University