There are times you just have too much data, random samples are nice to test assumptions and algorithms first.
So in R you can create a function to return a random sample of a data frame for such emergencies.
randomSample = function(df,n) { return (df[sample(nrow(df), n),]) }
And to use:
smallerDF<-randomSample(bigDF, 40)
(40 being the number of rows you want in your sample).
3 responses to “Random samples from R data frames.”
Thanks for the post! I was able to use this function. Strange that I hadn’t encountered previously the need to randomly sample observations from a data frame?!
To other readers, make sure you use the appropriate value for your replace = argument.I think for sampling a data frame, most often the default of replace = FALSE is probably going to be what you want.
Hi Philip,
Thanks for taking the time to comment. I don’t work with small data sets so the need to create a small random sample to test an algorithm is probably more than most. And you’re right about the FALSE statement, nice catch!
Regards
Jase
Note that the above method does not preserve the ordering of the rows. Use sort(sampe(..)) if this is desired.