Chapter 3 Data types

R support different “atomic” object classes, this can be seen as data types. these are:

  • character
  • numeric
  • integer
  • complex
  • logical

Note: When writing numbers they are numeric as default, but can forced to become integer type by ending the number with an L. It is interesting to note that the integer type makes part of the numeric one this can be seen using the is.numeric function member of the is.x() family of functions which allow to identify if a variable is of a particular class.

## [1] "numeric"
## [1] "integer"
## [1] TRUE

3.1 Exotic data types

there as well some not as common data types such as date, raster, raw and others. raw for example creates a vector used for storing information in binary form into bytes written as hexadecimal pairs.

## [1] TRUE
## [1] "raw"

3.2 Data conversion

Data types have some hierarchy, from particular to general they can be organized as follows logical, integer, numeric, character. One can perform data type conversion from a particular type to a more general one. R support implicit conversion in order to keep types consistence. For example all of the elements of a matrix must be of the same type, when different types are used inside a matrix R automatically does this conversion so they all are the same type. Manual conversion can also be done using the as.x() family of functions.

## [1] "5"
## [1] 1
## [1] TRUE

On top of the “atomic” object, R has the following basic data structures: vector, list, matrix, data frame, and factors.

3.2.1 vector

Is an array of elements all of them of the same type, there are different ways to create a vector, an empty one through the vector function, by the concatenate method c() or by a sequence :

When mixing elements of different classes into a vector there are coercion rules that come into place to make sure the all the same rule applies. Peng (2019)

## [1] "1"     "TRUE"  "hello"

Note that all the values of the vector have transform into character type.

3.2.2 Vector operations

When programing you want to perform operations over your variables,when dealing with vector operations are made element by element. One thing to be cautious about is that when doing operation with vectors of different size R repeats the smaller vector until their sizes match. they need to be multiplies of one another in order to work, otherwise it would produce an error. This behavior extends the R’s expressiveness and language functionality, but can also can create difficult to spot unintended behavior if not treated carefully.

3.2.3 vector indexing

When working with vectors there can be numerical indexing (where elements of the array are retrive if it matches it’s position) as well as logical indexing where the element is retrieve if the condition set, is satisfy by the element.

## [1] 11
## [1] 11 12 13 14 15
## [1] 12 15 18

##list Are an special type of vector which can contain elements of different type.

## [[1]]
## [1] 1
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] "hello"
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL

a list can name each of it’s elements. One example I imagine was to create a list which represents the dimensions of a object, so each value represents the height, width and depth of the object.

## [1] "height" "width"  "depth"

3.3 matrices

Are constructed column-wise, but can be set to be row-wise by the parameter byrow = TRUE.

An descriptive name can be useful to understand better the data presented, that is why R allows Matrix to have can have both rownames and colnames

##   c d
## a 1 3
## b 2 4
##      col1 col2
## row1    1    3
## row2    2    4

3.4 factor

Are variables that allow to easily describe categories. For example imagine you make some poll to measure people’s satisfaction with some policy. Normally this types of question doesn’t just have a numeric value to be rated but can be more descriptive like for example having the options [“really bad”,“bad”,“good”,“really good”] when asking about how they feel the policy has been carried out. factor data types allow to easily express these subtle descriptive answers. if some analysis needs to be carried out and assigning a value to each one for example to find the mean of the answers. they can easily be ranked from lower to higher using levels.

## x
##  no yes 
##   4   4
## [1] 2 1 2 1 2 2 1 1
## attr(,"levels")
## [1] "no"  "yes"
## [1] good        bad         really good really bad  bad         really bad 
## [7] good       
## Levels: really bad bad good really good

3.5 Data frames

Data frames are structures design to store tabular data, most of the time they are used to store datsets read from files but they can also be explicitly created by the data.frame command. For example imagine you want to run a poll and store the data of each participant for further analysis, then a data frame structure can be used to organize this information.

##     name age  opinion
## 1  karen  27    agree
## 2 brayan  36 disagree
## 3   Tony  38    agree
## [1] "name"    "age"     "opinion"

you can always erase the names by setting them to null. Rows names can be set as well by row.names

3.5.1 Dollar sign and double brackets

The single bracket notation will give back a subset of the same element, if it is a data frame it will return a data frame and if it is a list it returns a list. however if the data is needed without the structure as it might be needed to properly work with some functions a double bracket remove the structure leaving only the data. the dollar sign notation have a similar behavior than the double bracket, but allows the values to be accessed by their name Grolemund (2014)

##   age
## 1  27
## 2  36
## 3  38
## [1] 27 36 38
## [1] 33.66667
## [1] 33.66667
## [1] 36

References

Grolemund, Garrett. 2014. Hands-on Programming with R. O’Reilly. https://rstudio-education.github.io/hopr/index.html.

Peng, Roger D. 2019. R Programming for Data Science. O’Reilly Media. https://leanpub.com/rprogramming.