Chapter 10 On reproducible research
reproducible reporting is about is a validation of the data analysis
best practices should be done to promote and encourage reproducibility, particularly in what’s called ‘omics based research, such as genomics, pro-teomics, other similar areas involving high-throughput biological measurements
Peng (2016)
exploratory figures you need to understand were the data came from
divide the data into tow categories train data and validation data this ca be done with the binomial distribution link
mydata<- read.table("../data/student_info.csv",sep=";",header=TRUE)
n<- dim(mydata)[1]
set.seed(3435)
trainIndicator<-rbinom(n, size=1, prob=0.5)
train_data<- mydata[trainIndicator,]
val_data<-mydata[!trainIndicator,]
val_data
## Name.and.Lastname ID email.adress
## 2 Robert Downey 4525111559 rdownironman@mail.c
## 3 Douglas Adams 14884674721 zaphodbeeblebrox@mail.com
## 4 Stephen Wolfram 74682868914 wolframalpha@mail.com
## 5 Cleve Moler 17463589721 chiefmathematician@mail.com
## 6 Matt Parker 18457956247 parkersquare@mail.com
## 8 Emily Graslie 17973595287 brainscoop@mial.com
## 10 Destin Sandlin 17895782879 smarter@mail.com
## 11 Freddy Vega 17795795697 fredier@mail.com
## 13 Isaac Asimov 13589844557 robots@mail.com
## 15 R Daneel Olivaw 10001110101 dolivaw@mail.com
References
Peng, Roger D. 2016. “Report Writing for Data Sciencein R.” http://leanpub.com/reportwriting.