sapply(dataframe$col, FUN = sum/mean/etc)
sapply: Applies a function to the column and returns a vector with the results.
lapply(dataframe$col, FUN = sum/mean/etc)
lapply: Applies a function to the column and returns a list with the results.
tapply(dataframe$col1, dataframe$col2, FUN = sum/mean/etc)
tapply: Takes data from the first column and applies the function while subsetting by the factor in column two.
aggregate(cbind(dataframe$col1, dataframe$col2) ~ dataframe$col3 + dataframe$col4, data = dataframe, FUN = sum/mean/etc)
aggregate: Creates pivots of quantitative data in col1 and col2 pivoted by col3 and col4 applying some function to the data.
Cut a continuous variable into a factor with g groups.
sample(1:rows, size=number, replace=T/F)
sample to get a list of row numbers of size with replacement or without replacement. Useful for generating random smaller subsets.
RANDOMLY SUBSETTING TRAIN AND TEST DATA:
set.seed(numeric) #set a random seed i <- rbinom(rownum, size=1, prob=.5) #flip coins to assign rows train <- dataframe[i==1,] #subset for train test <- dataframe[i==0,] #subset for test
plot(dataframe$col1, dataframe$col2, pch=bullettype, col=color or a descriptive variable, cex = size) #single plot plot(dataframe[,1:4]) #plot first 4 columns against each other
col option can be added with a factor in order to have different types of information put in different colors. colors can also be a formula to have different sized dots.
plotting multiple columns creates a matrix of plots
pch has integer values to represent different types of bullets
cex determines the size and detail in the plots
OTHER SCATTER PLOTS:
smoothscatter(x,y) hexbin(x,y) qqplot(x,y)
Smooth has gradients for frequency and hexbin provides a legend of point colors for frequency. QQplot plots quantiles of x vs quantiles of y (smooth distributions lie on a 45 degree line).
REGRESSION ON FACTORS:
lm(dataframe$quantitative ~ as.factor(dataframe$factor)) lm(dataframe$quantitative ~ relevel(dataframe$factor, ref ="reference variable")) # sets a reference variable for the lm
Creates a linear regression on factor variables for a quantitative variable. The first factor is the reference variable. Use second example to define a different reference variable.