Data Cleaning in R

So I ran across a bit of a strange problem this week, and I thought I’d share my code in case others needed it.

I was trying to download a data set that has some 16 digit ID #s as one of the variables and transfer this data to another system.  The “easiest” way would be to just open the files and clean them up in Excel.  However, for reasons I cannot figure out, Excel rounds the last digit to the nearest zero value, which renders these IDs effectively useless.

So, as a workaround, I wrote a little R function to clean up those files.

Now, it’s not a perfect function and does require checking the numbers on your files, but it does do what I want to, which is take the files from a csv to txt and leave the name of the activity and date on the first line with the ID #s on the following lines.

If you want to use it, submit $25 to my Paypal.  Just kidding.  Here you go.

for (i in 9:10){
filemade<-paste(“AttendanceByEvent “,”(“,i,”)”,”.csv”, sep=””)
event<-read.csv(file=filemade, skip=6, colClasses = c(“NULL”, “character”, rep(“NULL”,11)), skipNul = T)
name<- paste(strtrim(details[1,1], 25), details[4,1])<-paste(“event”,i,”.txt”, sep=””)
names (event)<-name
write.table(event,, row.names=FALSE, quote=F)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s