This lesson is in the early stages of development (Alpha version)

Dataframes manipulation

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • Data-frames. What they are and how to manage them?

Objectives
  • Understand what is a data-frame and manipulate it.

Data-frames: The power of interdisciplinarity

Let’s beggin by creating a mock data set:

> musician <- data.frame(people = c("Medtner", "Radwimps", "Shakira"),
						 pieces = c(722,187,68),
 						 likes = c(0,1,1))
> musician

The content of our new object:

    people pieces likes
1  Medtner    722     0
2 Radwimps    187     1
3  Shakira     68     1

We have just created our first data-frame. We can see if this is true by the class() command:

> class(musician)
[1] "data.frame"

A data-frame is a collection of vectors, a list, whose components must be of the same data type within each vector. Whereas, a data-frame can save vectors of different data types:

Figure 3. Structure of the created data-frame.

We can begin to explore our new object by pulling out columns by the $ operator. In order to use it, you need to write the name of your data-frame, followed by the $ operator and the name of the column you want to extract:

> musician$people
[1] "Medtner"  "Radwimps" "Shakira" 

We can do operations with our columns

> musician$pieces + 20
[1] 742 207  88

Moreover, we can change the data type of one of the columns. By the next code we can see if the musicians are popular or not:

> typeof(musician$likes)
[1] "double"
> musician$likes <- as.logical(musician$likes)
> paste("Is",musician$people, "popular? :", musician$likes, sep = " ")
[1] "Is Medtner popular? : FALSE" "Is Radwimps popular? : TRUE" "Is Shakira popular? : TRUE"

Finally, we can extract information from a specific place in our data by using the “matrix” nomenclature [-,-], where the first number inside the brackets specifies the number of row, and the second the number of the column:

Figure 4. Extaction of specific data in a data-frame and a matrix.
> musician[1,2]  # The number of pieces that Nikolai Medtner composed
[1] 722

Key Points

  • Data-frames contain multiple columns with different types of data.