R Factors

In this article, you will learn to work with factors in R programming with the help of examples.

Factor is a data structure used for fields that takes only a predefined, finite number of values (categorical data).

For example: a data field such as marital status may contain only values from single, married, separated, divorced, or widowed.

In such a case, we know the possible values beforehand and these predefined, distinct values are called levels.


How to create a factor in R?

We can create a factor using the function factor(). Levels of a factor are inferred from the data if not provided.

x <- factor(c("single", "married", "married", "single"))
print(x)

x <- factor(c("single", "married", "married", "single"), levels = c("single", "married", "divorced"))
print(x)

Output

[1] single  married married single 
Levels: married single
[1] single  married married single 
Levels: single married divorced

We can see from the above example that levels may be predefined even if not used.

Factors are closely related with vectors. In fact, factors are stored as integer vectors. This is clearly seen from its structure.

x <- factor(c("single", "married", "married", "single"))
print(x)

str(x)

Output

[1] single  married married single 
Levels: married single
 Factor w/ 2 levels "married","single": 2 1 1 2

We see that levels are stored in a character vector and the individual elements are actually stored as indices.

Factors are also created when we read non-numeric columns into a data frame.

By default, data.frame() function converts character vectors into factors. To suppress this behavior, we have to pass the argument stringsAsFactors = FALSE.


How to access components of a factor?

Accessing components of a factor is very much similar to that of vectors.

x <- factor(c("single", "married", "married", "single"))
print(x)

print(x[3])
print(x[c(2, 4)])
print(x[-1])
print(x[c(TRUE, FALSE, FALSE, TRUE)])

Output

[1] single  married married single 
Levels: married single
[1] married
Levels: married single
[1] married single 
Levels: married single
[1] married married single 
Levels: married single
[1] single single
Levels: married single

How to modify a factor?

Components of a factor can be modified using simple assignments. However, we cannot choose values outside of its predefined levels.

x <- factor(c("single", "married", "married", "single"), levels = c("single", "married", "divorced"))
print(x)

x[2] <- "divorced"
print(x)

x[3] <- "widowed"
print(x)

Output

[1] single  married married single 
Levels: single married divorced
[1] single   divorced married  single  
Levels: single married divorced
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "widowed") :
  invalid factor level, NA generated
[1] single   divorced <NA>     single  
Levels: single married divorced

A workaround to this is to add the value to the level first.

x <- factor(c("single", "divorced", "widowed", "single"), levels = c("single", "married", "divorced"))
print(x)

levels(x) <- c(levels(x), "widowed")
x[3] <- "widowed"
print(x)
Did you find this article helpful?