Introduction to Using RIntroduction to Using R
Notes by Duncan Murdoch
12 September 2000; revised 3 Jan 2002

0. Introduction

R is a complicated statistics package. In these notes, we will learn the basics of using it:

  1. How to get it.
  2. How to start and stop it.
  3. Getting help.
  4. R objects: what they are, how to work with them.
  5. How to enter and edit data.
  6. How to calculate with it.
  7. How to make graphs, and to print them.
  8. Simple scripts and functions.

These instructions are written for R version 1.4 for Windows, but much of what we do will work in all versions on all platforms.

R is freeware. You can download it from the web and install it on your own computer. (See the instructions below if you want to do this.) It is also available for use on all of the PCs in the Statistics Computer Lab (WSC 256), with TA's to help on weekdays (from 10 AM to 6 PM, starting September 18).

A similar commercial package called S-PLUS is also available. Student versions of S-PLUS cost around $100. If you have one of those, it should be sufficient for this course; if not, I'd suggest sticking with R.

1. Getting R

You can download R from the web; start at http://cran.r-project.org. Choose your operating system from the list, e.g. R for Windows. Keep following the links until you get to the base directory, where you will see a list of files. Download SetupR.exe (approximately 19 megabytes). Run it, and follow the instructions to install R in the directory of your choice.

You don't need any of these now, but later you may want to download contributed packages for R. These contain statistical methods that aren't in the base package. To download those, run R, and click on Package|Install package from CRAN.

2. Starting and Stopping

To start R:

  1. First, sign on to the computer. Your user id is your student number; your initial password is your last name in all capital letters. Change it right away, to something secure. Do not give out your password to your friends: sharing your account will result in the loss of it, which will make assignments extremely difficult to complete!

  2. Start R by clicking on Start|Programs|Statistics software|R-GUI.

  3. After it has started, you'll see a window something like this:
    R : Copyright 2001, The R Development Core Team
    Version 1.4.0 (2001-12-19).
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type `license()' or `licence()' for distribution details.
    
    R is a collaborative project with many contributors.
    Type `contributors()' for more information.
    
    Type `demo()' for some demos, `help()' for on-line help, or
    `help.start()' for a HTML browser interface to help.
    Type `q()' to quit R.
    
    > 
    
    This is the R console window; everything you do in R will be typed in this window.

There are several ways to shut it down:

  1. The usual Windows methods work: Alt-F4, or the File|Exit menu, or clicking on the × in the top left corner.
  2. In the command window, execute the q() function call.

R remembers objects between sessions, so if you create a dataset and then have to quit, it will still be there the next time you sign on.

3. Getting help

If you know the name of an R function, you can get help on it through the Help menu or by typing ?name in the Commands Window. Warning: it takes a while to get used to the help files; they often give more detail than you want, without answering your question!

4. R objects

Everything that you work with in R is an object of some sort. The most common kinds of object are:

Scalars and Vectors:
The simplest objects in R are vectors of numbers. Scalars are vectors of length 1.

To create a scalar, just assign a numerical value to a variable. For example,

x <- 5
y <- x + 3
Note that assignment is done with <-. You may also use the underscore, e.g. x _ 5 to do the same thing: the underscore is sometimes easier to type, but can be confusing.

To create a vector, assign a vector value to a variable. The c(...) function is the simplest way to create a vector value:

x <- c(1,2,3,4,5)
y <- xv + 3
y[3] <- 1       # Assign a value to the third element
Almost all arithmetic operations can be done on vectors as well as scalars. They simply operate on one element at a time. You can mix vectors and scalars in expressions; it acts just as though the scalar was repeated for every element of the vector.

Watch out if you mix vectors of different lengths: you'll hardly ever get the result you want!

To see the contents of an object, just type its name on a line by itself, and the current value will be printed.

Character data:
Scalars and vectors can be made up of strings of characters instead of numbers. You can't mix numbers and characters in one vector: all its elements must be of the same type. For example,
x <- c('red','green','blue')
y <- c(x,'yellow','magenta','cyan')

Dataframes:
Most datasets are stored as dataframes. These are like arrays, but with the columns having their own names. Columns can be of different types from each other.

Use the data.frame() function to construct dataframes from vectors:

x <- c('red','green','blue')
y <- c(1,2,3)
d <- data.frame(x,y,z=c(4,5,6))
Note that you use = to set the value of the argument of a function, rather than <- which assigns a value to a variable.

Lists:
Dataframes are actually a special kind of list, or structure. Lists in R can contain any other objects. You usually don't construct these yourself, but many functions return complicated results as lists. You can see the names of the objects in a list using the names() function, and extract parts of it:
names(d)  # Print the names of the objects in the d dataframe.
d$x       # Print the x component of d

Functions:
Functions in R are objects too. It's possible to assign them and work with them; to call them, put the arguments in parentheses. You can name the argument you're calling, or have it understood by its position in the call. For example,
x _ c(1,2,3)
y _ c(4,5,6)
plot(x,y,main='My title')
R has a somewhat crazy scheme for handling abbreviations of argument names; avoid it! Type the argument name in full, or risk getting very, very confused.

5. Entering and editing data

You can use assignments like

x _ c(1,2,3)
to create small datasets. However, this quickly gets cumbersome. The best way to enter a larger dataset is to use an external editor (like Notepad) or a spreadsheet. What you want to create is a ``comma-separated-variable'' file, with titles at the top of each column.

For example, here are a few lines from a dataset on fuel consumption of my car:

Km,      Day, Month,   Year,    Fill,    Cost,    Litres,  City
530.5,   6,   7,       1997,    1,       18.5,    32.5,    0
838,     19,  7,       1997,    1,       13.25,   23.7,    0
1288,    19,  7,       1997,    1,       19,      32.3,    0
1800,    24,  7,       1997,    1,       19.5,    32.8,    0
If I had entered these into a file called C:\TEMP\CAR.CSV, I could load them into a variable called car like this:
car <- read.csv('C:\\TEMP\\CAR.CSV')
The read.table function can handle other formats; see ?read.table for the details.

6. Calculating

R is often used as a sophisticated desk calculator. Remember that most operations can be done on whole vectors at once. There are also functions to calculate statistics from vectors. Some useful functions are used below:

x <- 1:10     # The numbers 1 to 10
mean(x)       # The sample mean
var(x)        # The sample variance
sd(x)         # The sample standard deviation
y <- 11:20    # The numbers 11 to 20
var(x,y)      # The sample covariance
cor(x,y)      # The sample correlation
median(x)     # The sample median
summary(x)    # Several useful statistics

There are quite a few built-in functions for generating random numbers and working with their distributions:

x <- rnorm(100)    # Generate a vector of 100 Normal random values
plot(x,dnorm(x))   # Plot them against their density function
y <- runif(100)    # ... and a vector of 100 Uniform random values
plot(y,dunif(y))   # plotted against their density function.
Other built-in distributions include Student's t, Chi-square, gamma, F, lognormal, Poisson, binomial, ...

7. Graphing

One place in which R excels is in graphing. It is very flexible, but at the same time, simple plots are fairly easy to do.

For analysis or presentation graphics, some commonly used functions are:

plot(x,y)            # A scatterplot
plot(x,y,type='l')   # As above, with the points joined by lines
hist(x)              # A histogram
stem(x)              # A couple of plots that
piechart(x)          #  you should probably never use!

There is also a whole family of functions to add to a graph:

abline(a=3,b=4)      # Add the line y=a + b x to the plot
lines(x,y)           # Add lines joining the data points
points(x,y)          # Add points to the plot
text(x,y,labels=y)   # Add text to a plot 

To print a graph, select it on the screen, then use the File|Print menu selection. You can choose the dot matrix printers or the laser printer. Ask the TA in the Lab for instructions on how to use each. You can also save graphs in various file formats to be printed later, or to put on a web page.

8. Simple Scripts and Functions

R maintains a ``history'' of past commands that you have executed. You can retrieve these by hitting the up arrow on the keyboard in the command window. Once you've retrieved a previous command, you can edit it and hit Enter again to execute the changed version.

When you get to more than a few lines of code, it's a good idea to use a separate editor window to edit your code. For example, open Notepad beside your R window, and type your commands there. When you've got them right and want to execute them, there are two ways to proceed.

  1. The simplest way to execute a few lines is to use the Windows cut and paste facility. Just use your mouse to highlight the lines you want to execute, click on Edit | Copy, then move to the R command window, and click on Edit | Paste there. (There are also keyboard shortcuts, as shown in the Edit menus.

  2. Cut and paste gets tedious when you have a lot of lines to execute. In that case, it's often easiest to save the lines to a file from Notepad (e.g. C:\temp\script.r) and then execute
    source('C:\\temp\\script.r',echo=TRUE)
    
    in your R command window.

If your script contains more than one plot, the first ones will be displayed and lost right away. To avoid this, go to the graph window, and click on History|Recording. Then all plots will be saved, and you can use the PageUp and PageDown keys to switch between them.

Eventually, you'll find that you are repeating the same code. At that point it's best to write a function, so that you only need to type the complicated procedure once, but can use it many times.

Functions have three parts: a header, a body, and a return value. The header tells R how your function expects to be called. The body defines what the function will do. The return value (which is the value calculated in the last line of the body to be executed, or what you pass to return(), is what the system sees after your function has executed.

For example, to calculate the mean of all values except the biggest and smallest in a vector (a ``trimmed mean''), you could use the following function:

trimmedmean <- function(x, trim=1)
{
  x <- sort(x)          # sort into increasing order
  x <- x[-(1:trim)]     # delete the smallest values
  x <- x[-(length(x)+1-(1:trim))] # delete the largest values
  mean(x)
}
You can call this and see the results below:
> x <- rt(10,1)
> x
 [1]  0.12159140  0.37272247 -0.02500885 -1.56863046
 [5] -0.15048060  0.11176227  0.15118459 -1.54924705
 [9]  0.05186935 -0.89555572
> mean(x)
[1] -0.3379793
> trimmedmean(x)
[1] -0.2729856
> trimmedmean(x,2)
[1] -0.1309704
> x
 [1]  0.12159140  0.37272247 -0.02500885 -1.56863046
 [5] -0.15048060  0.11176227  0.15118459 -1.54924705
 [9]  0.05186935 -0.89555572
> 
Note that the parameter trim has a default value of 1, but can be specified to be 2 instead. Also note that x wasn't changed by the trimmedmean function: when you pass an argument to a function, it only gets a copy, so any changes it makes don't affect the original object.


File translated from TEX by TTH, version 1.91.
On 3 Jan 2002, 14:45.