Importing a CSV File¶
Some practices when we carry out analyses simply do not feel reproducible or they lend themselves too easily to human error. Copy and paste for example. Taking two columns from a spreadsheet and making a new one is fairly easy, but would you make a mistake after 100 times? People have lost their jobs and worse because of carelessness in spreadsheet manipulation.
So let the computer do it for you.
Download: ExampleInDataFile.csv
import-csv-example
# (1) read in a csv file where one field is a date
# (2) write a new csv file with a new field added as a column
# NOTE: The new col will consist of days from some date
library(stats)
## read in the automated results
inFileName <- "ExampleInDataFile.csv"
inFilePath <- paste(getwd(),inFileName,sep="/")
inData <- read.table(inFilePath,header=T,sep=",")
header <- names(inData)
## prepare the outfile
outfileName <- "ExampleOutDataFile.csv"
newHeader <- paste('group','value','date','time','daysFrom',sep=",")
cat(newHeader,file=outfileName,append=F,fill=T)
## create a function to play with the columns
dayOne <- as.Date("1979-11-19")
process_file <- function(x, outfile) {
group <- x[1]
value <- x[2]
date <- x[3]
time <- x[4]
daysFrom <- as.Date(date) - dayOne
rowToOutput <- paste(group, value, date, time, daysFrom, sep=",")
print(paste(group, value, date, time, sep=","))
cat(rowToOutput,file=outfile,append=T,fill=T)
}
## write the data to file
apply(inData, 1, process_file,outfile=outfileName)
[1] "..."
[1] "a,15.05,2012-01-15,12:01"
[1] "a,21.11,2012-01-14,12:15"
[1] "a,21.85,2012-01-19,12:29"
[1] "a,16.01,2012-01-11,12:56"
[1] "b,19.99,2012-01-13,12:01"
[1] "b,29.23,2012-01-10,12:15"
[1] "b,27.44,2012-01-11,12:29"
[1] "b,25.06,2012-01-10,12:56"
NULL
[1] "..."
[1] "group" "value" "date" "time"
[1] 21.9675
Exercises¶
- Create a Sweave document that breaks this code up at each of the comments
So now lets read in the file and play with it:
import-csv-example
x <- read.csv("ExampleInDataFile.csv")
attach(x)
print(names(x))
print(mean(x$value))
[1] "..."
[1] "a,15.05,2012-01-15,12:01"
[1] "a,21.11,2012-01-14,12:15"
[1] "a,21.85,2012-01-19,12:29"
[1] "a,16.01,2012-01-11,12:56"
[1] "b,19.99,2012-01-13,12:01"
[1] "b,29.23,2012-01-10,12:15"
[1] "b,27.44,2012-01-11,12:29"
[1] "b,25.06,2012-01-10,12:56"
NULL
[1] "..."
[1] "group" "value" "date" "time"
[1] 21.9675