Chapter 1 Getting Started

In this module (The Data Scientists Toolbox), one is introduced to the basics of what data science is, some of the topics discussed are: types of data, the job of the data scientist, and the process needed in order to cure data and information into usable knowledge. It also has a practical component, introducing you to an important tool at the data scientist disposal, the R language and it’s IDE Rstudio. The subject of version control is also addressed and the basics of git and github are explore. The lectures were made following the spirit of reproducible work and R potential use for automation the material consisted of written lectures and videos made completely autonomously extracting the information of the written material and putting it in video format. This stroke my curiosity, the videos made deliver far from perfect experience, but it displayed a practical application of what can be done with R.

One of the pieces of content that strike me as meaningful was at the introduction, answering the question why we need data science:

One of the reasons for the rise of data science in recent years is the vast amount of data currently available and being generated. Not only are massive amounts of data being collected about many aspects of the world and our lives,but we simultaneously have the rise of inexpensive computing. This has created the perfect storm in which we have rich data and the tools to analyze it, rising computer memory capabilities, better processors, more software and now, more data scientists with the skills to put this to use and answer questions using this data.

1.1 final assessment

As part of the final assessment one needs to demonstrate that has successfully set up a R installation with a working Rstudio. for me both installations went flawless and Rstudio detected the R program without any configuration issues.

Rstudio screenshot

Rstudio screenshot

One also need to demonstrate the creation of a github account mine can be found as jsduenass. create a markdown file and make a fork from jtleek’s How to share data with a statistician

I found interesting the peer review methodology used in this final assignment that encourage students participate and interact.