Data Munging and R, A (continuously updating) Personal Reference

I am quick to Google and find answers when it comes to doing tasks. I’ve read studies that have mentioned that the generation that has grown up with the internet and all this knowledge at our fingertips has become worse and memorizing instructions outright but significantly better at efficiently finding instructions and remembering reference areas. As I often to find myself knowing where to look for an answer but not able to recall off the top of my head, I figure this would be a great place to start writing out my personal comprehensive reference guide to data munging in R!

R has a good help function but I feel that this is a good exercise to remember all these commands as well as have a broad reference dictionary.

SETTING UP WD:

getwd()
setwd("directory/directory")

Shows current working directory and sets a working directory

NAMESPACES:

attach(dataframe)
detach(dataframe)

Attaches current dataframe’s objects to main namespace or removes current dataframe’s objects from current namespace
*Not sure how useful this is I think I might use sparsely to avoid namespace conflicts.

READING CSV OR TABLE:

read.csv(file = "filename", head=T/F)
read.table(file="filename", sep="separator", head=T/F, 
     strings.as.factors=T/F,col.names=T/F, row.names=T/F, 
     strip.white = T/F)

Reads in a CSV or table with different settings.

SUMMARY OPTIONS:

head(dataframe, numofrows)
dim(dataframe)
names(dataframe)
summary(dataframe)
quantile(dataframe$column)
class(dataframe)
sapply(dataframe[1,],class)
unique(dataframe$column)
length(dataframe$column)
table(dataframe$column, useNA="ifany")

head: see first numofrows of dataframe
dim: see dimensions of dataframe
names: see col names
summary: get counts for qualitative variables or numerical summary of a quantitative variable
quantile: get quantiles of a quantitative variable
class: get class of dataframe of column
sapply: apply class function to first row of dataframe to get classes of all variables
unique: get unique values of a column
length: get length of column
table: create a table for unique values and counts mainly for qualitative variables, useNA=”ifany” shows NA value counts

TESTING DATA:

any(dataframe$column [condition])
all(dataframe$column [condition])

example:
any(data$column > 40) # true/false
all(data$column > 0 & data$column < 40)

any: tests condition for any matches
all: tests condition for all values

SUBSETTING DATA:

subset(dataframe, conditions, select = c(column, column))

subset: subset dataframe by certain conditions and only return selected columns

MERGE:

merge(dataframe1, dataframe2, by.x = dataframe1$column, 
      by.y = dataframe2$column, all = T/F)

by.x and by.y: specify a merge on columns that do not share the same name
all: specifies an outer join versus an inner join: T includes all records and inserts NA for all missing information

ORDER:

dataframe1$column[order(dataframe1$column)]

Returns a vector of row numbers in sorted order of the specified column

SORTING USING ORDER:

sorted <- dataframe1[order(dataframe1$column, dataframe1$column2),]

Stores into “sorted” dataframe1 sorted by row order by column1 then by column2. Additional levels of sorting can be added in the order function.
*Be careful to add the “,” after the order function or you will get an error.

MELT:

melt(dataframe, id.vars="idcolumn", variable.name="varnames",value.name="values")

Example output:

Input Matrix:
Name    TreatmentA    TreatmentB
John    4             1
Jane    5             2 

Result:
Name    Treatment   Value
John    TreatmentA  4
John    TreatmentB  1
Jane    TreatmentA  5
Jane    TreatmentB  2

This takes a matrix style table and reshapes it to have one observation per row.
*Requires install.packages(“reshape”)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: