R Data Frame

In this article, you will learn about data frames in R; how to create them, access their elements and modify them in your program.

Data frame is a two dimensional data structure in R. It is a special case of a list which has each component of equal length.

Each component forms the column and contents of the component form the rows.


Check if a variable is a data frame or not

We can check if a variable is a data frame or not using the class() function.

x <- data.frame(SN = c(1, 2), Age = c(21, 15), Name = c("John", "Dora"))

# print the data frame
print(x)

# check the type of x
print(typeof(x))

# check the class of x
print(class(x))

Output

SN Age Name
1  1  21 John
2  215 Dora
[1] "list"
[1] "data.frame"

In this example, x can be considered as a list of 3 components with each component having a two element vector. Some useful functions to know more about a data frame are given below.


How to create a Data Frame in R?

We can create a data frame using the data.frame() function.

For example, the above shown data frame can be created as follows.

# create a dataframe
x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"))

# print the structure of x
str(x)

Output

'data.frame':	2 obs. of  3 variables:
$ SN  :int  1 2
$ Age :num  21 15
$ Name:chr  "John" "Dora"

Notice above that the third column, Name is of type factor, instead of a character vector.

By default, data.frame() function converts character vectors into factors.

To suppress this behavior, we can pass the argument stringsAsFactors=FALSE.

x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)

# print the structure of x
str(x)

Output

'data.frame':	2 obs. of  3 variables:
$ SN  :int  1 2
$ Age :num  21 15
 $ Name:chr  "John" "Dora"

Many data input functions of R like, read.table(), read.csv(), read.delim(), read.fwf() also read data into a data frame.


How to Access Components of a Data Frame?

Components of the data frame can be accessed like a list or like a matrix. Let's discuss some of the ways.

Accessing like a list

We can use either [, [[ or $ operator to access columns of data frame.

x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)

# access the "Name" column using different methods
print(x["Name"])
print(x$Name)
print(x[["Name"]])
print(x[[3]])

Output

Name
1 John
2 Dora
[1] "John" "Dora"
[1] "John" "Dora"
[1] "John" "Dora"

Accessing with [[ or $ is similar. However, it differs for [ in that, indexing with [ will return us a data frame but the other two will reduce it into a vector.


Accessing like a matrix

Data frames can be accessed like a matrix by providing indexes for row and column.

To illustrate this, we use datasets already available in R. Datasets that are available can be listed with the command library(help = "datasets").

We will use the trees dataset which contains Girth, Height and Volume for Black Cherry Trees.

A data frame can be examined using functions like str() and head().

trees <- data.frame(
  Girth = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2),
  Height = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75),
  Volume = c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9)
)

# print the structure of trees
str(trees)

# display the first 3 rows of trees
head(trees, n = 3)

Output

'data.frame':	10 obs. of  3 variables:
$ Girth :num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2
$ Height:num  70 65 63 72 81 83 66 75 80 75
 $ Volume:num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9
Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2

We can see that trees are a data frame with 31 rows and 3 columns. We also display the first 3 rows of the data frame.

Now we proceed to access the data frame like a matrix.

trees <- data.frame(
  Girth = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2),
  Height = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75),
  Volume = c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9)
)

# select rows 2 and 3 of trees
trees[2:3, ]

# select rows with Height greater than 82
trees[trees$Height > 82, ]

# select the Height column of rows 10 to 12
trees[10:12, "Height"]

Output

Girth Height Volume
2   8.6     65   10.3
3   8.8     63   10.2
Girth Height Volume
6  10.8     83   19.7
[1] 75 NA NA

We can see in the last case that the returned type is a vector since we extracted data from a single column.

This behavior can be avoided by passing the argument drop=FALSE.


How to modify a Data Frame in R?

Data frames can be modified like we modified matrices through reassignment.

x <- data.frame(
  SN = c(1, 2),
  Age = c(21, 15),
  Name = c("John", "Dora")
)

# print the initial data frame
print(x)

# update the Age value in the first row to 20
x[1, "Age"] <- 20

# print the updated data frame
print(x)

Output

SN Age Name
1  1  21 John
2  2  15 Dora
SN Age Name
1  1  20 John
2  2  15 Dora

Adding Components to Data Frame

Rows can be added to a data frame using the rbind() function.

x <- data.frame(
  SN = c(1, 2),
  Age = c(20, 15),
  Name = c("John", "Dora")
)

# print the initial data frame
print(x)

# create a new row and bind it to the data frame
new_row <- list(SN = 1, Age = 16, Name = "Paul")
x <- rbind(x, new_row)

# print the updated data frame
print(x)

Output

SN Age Name
1  1  20 John
2  2  15 Dora
SN Age Name
1  1  20 John
2  2  15 Dora
3  1  16 Paul

Similarly, we can add columns using cbind().

x <- data.frame(
  SN = c(1, 2),
  Age = c(20, 15),
  Name = c("John", "Dora")
)

# print the initial data frame
print(x)

# add a new column "State" to the data frame using cbind()
x <- cbind(x, State = c("NY", "FL"))

# print the updated data frame
print(x)

Output

SN Age Name
1  1  20 John
2  2  15 Dora
SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL

Since data frames are implemented as lists, we can also add new columns through simple list-like assignments.


Deleting Component of Data Frame

Data frame columns can be deleted by assigning NULL to it.

x <- data.frame(
  SN = c(1, 2),
  Age = c(20, 15),
  Name = c("John", "Dora"),
  State = c("NY", "FL")
)

# print the initial data frame
print(x)

# remove the "State" column from the data frame
x$State <- NULL

# print the updated data frame
print(x)

Output

SN Age Name State
1  1  20 John    NY
2  2  15 Dora    FL
SN Age Name
1  1  20 John
2  2  15 Dora
Did you find this article helpful?