Chapter 5 Scripting
Now that we have get acquaintance with the R language, some of it’s built-in functions, data types and structures, it’s time to learn about the building blocks of programming: control structures, cycles, and functions.
Like many other programming languages R implements the if
statement as a control structure and the for
statement and while
statement in order to produce cycles. there also exists the expressions repeat
, break
and next
which allow to specify a certain behavior different from the rest of the iteration, there are usually used inside an if statement to represent exception cases inside a loop.
number=1
if(number>0){
print("the number is positive")
}else if (number <0){
print("the number is negative")
}else {
print("the number is 0")
}
## [1] "the number is positive"
## [1] "a"
## [1] "b"
## [1] "c"
5.1 functions
functions are structures that allow for encapsulation and reutilization of an action. It allows programmers to use the same process under different condition using its inputs. The declaration of a function is as follows.
countdown <- function (n){
for(timeleft in n:1) {
cat("T minus: ", timeleft, "\n")
Sys.sleep(1)
}
cat("BOOM!! \n")
}
countdown(3)
## T minus: 3
## T minus: 2
## T minus: 1
## BOOM!!
## T minus: 5
## T minus: 4
## T minus: 3
## T minus: 2
## T minus: 1
## BOOM!!
So as you see we define the countdown function by assigning the reserved-word function
with the parameters to the function’s name, followed by the declaration of the functionality, contained between curly brackets. Finally we call the function in order to make it work.
As seen we can use cat
to display values, and this prompts the question: what is the difference between cat
and print
?
cat in the one hand “converts its arguments to character vectors, concatenates them to a single character vector” community (n.d.) these impose some limitations. It can only be applied to primary data types. Also it doesn’t produce a new line, so “\n” has to bee specified in order to do so. The returned value of this function is an invisible NULL.
print in the other hand “is a generic function which means that new printing methods can be easily added for new classes” community (n.d.). different from cat, print returns its value in an invisible form which makes it useful for pipping.
5.1.1 defining arguments
R allows for great flexibility when dealing with arguments in a function, some of this behaviors are:
Lazy evaluation: This means that values are only evaluated when needed, this allows R to declared non defined arguments with out a problem as long as they are not used.
default values: One can set arguments to have default values, when they are not specified in the call of the function they are set to the default value. This can become useful when dealing with many parameters who are intended to have to same value must of the time.
partially match: Arguments can be assigned position based fun(value1,value2,value3)
or name based fun(arg2=value2,arg3=value3,arg1=value1)
, name based allows to put in the arguments disorderly, however names can be to long, so a partial argument name can be used if it uniquely identifies the argument.
unspecified arguments: Sometimes you can not predict what other arguments might applied to the function, the three dot “…” argument can be used to assign unspecified arguments. This is usually used when extending an existing function. One thing to be aware of when using is that the remainder arguments after the “…” argument must be named or else they will be taken as part of “…” also partial matching doesn’t work as well.
this function example was taken from swirl’s functions module.
unpacking arguments: when working with the “…” one might need to extract an specific value. One can access these values by converting the “…” into a list
## [1] "hola"
All of these behaviors enhance user experience by allowing high level functionalities from functions.
5.2 Scoping rules
R holds each definition in memory as a symbol value pair. however the same symbol might be defined differently in in two separated occasion, so how does R determine which one to use?
This issue can be resolve thanks to scoping rules, which sort out the value of a symbol by following a hierarchy of environments that can be display by the command search(). First the symbol is look in the global environment, in second place it is search in the loaded libraries and then the rest of the build packages. R separates functions and function so you can have one with each with the same name.
external= "I am from the outside world"
f<-function(){
print(external)
}
f2<- function(){
external=" I am inside function 2"
f()
}
f()
## [1] "I am from the outside world"
## [1] "I am from the outside world"
external= "I am from the outside world"
f<-function(){
print(external)
}
f2<- function(){
f<-function(){
print(external)
}
external=" I am inside function 2"
f()
}
f()
## [1] "I am from the outside world"
## [1] " I am inside function 2"
In the previous example we saw that the function could use a variable that was not declared inside the function these are known as free variables. The scoping rules state that free variables would be search inside the environment where the function was defined, if not found, it is search up to the parent environment the example intends to be a reflection of this behavior.
what value does a free variable has is tight to what environment it is on. the ls() function can help you see what is inside an environment, and the function get() what value does it have.
## [1] "age" "alternative" "answers" "binary"
## [5] "classes" "clean_data" "clean_df" "countdown"
## [9] "data" "data2" "df" "dimensions"
## [13] "external" "external_deck" "f" "f2"
## [17] "good" "i" "initial" "L"
## [21] "M" "missing" "newnames" "number"
## [25] "opinion_poll" "satisfaction" "simon_says" "students"
## [29] "text" "x" "y" "z"
## [1] "I am from the outside world"
## [1] "I am from the outside world"
Do to lexical scoping all variables must be stored in memory.
5.3 What happened in 1970-01-01?
The first of January of 1970 is an important date and marks the start of time itself, at least as far as computers are concerned. This date was set as the start of the unix epoch a system that measures time based on the number of seconds past since this date. This system has been used widely and now the unix time (POSIX) is the primary way computers deal with time.
R has two ways of dealing with time:
- Dates: which can be represented using the Date class count the number of days since the unix epoch.
## [1] "Date"
## [1] 3256
- Time: which uses POSIX can become in two different standards. POSIXct which works deep down as a integer counter, it is to work with data frames; and POSIXlt which stores the time in a list structure, with much more baggage information that can be easier to access.
## [1] "2021-04-16 20:59:04 -05"
## [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday"
## [9] "isdst" "zone" "gmtoff"
## [1] 121
As the example showed the time is relative to 1-1-1970
We have shown so far how the system interprets dates and times but imagine you want to enter dates that are written as strings. strptime() take care of this, it only needs the string and the format. It also allows to specify the time zone if needed.
# ?strptime
str_date= "4 de julio de 1991"
date=strptime(str_date, format="%d de %B de %Y ",tz="GMT")
class(date)
## [1] "POSIXlt" "POSIXt"
## [1] "1991-07-04 00:06:05 GMT"
## [1] 6
5.4 SWIRL practices
As part as this weeks assignment one needed to complete the swirl practices of logic, functions and Dates and Times.
5.4.1 logical operations:
when using single operand the value gets evaluated by each one of the elements of the list, double operand evaluates only the first element.
## [1] TRUE FALSE FALSE
## [1] TRUE
5.4.3 Dates and times
If needed the lubridate package developed by Hadley Wickham enhances the possibilities of the basic Date class. dates might seem easy at first glance but as it was put by Wickham and Grolemund (2017)
Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and DST.
So it is important to appreciate the work others have put in, in order to make it easier to work with the messy way we measure time, so you can focus on whatever you are working on rather that getting distracted trying to format dates information.
References
community, R. n.d. “Rdocumentation.” https://www.rdocumentation.org/.
Wickham, H., and G. Grolemund. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.