Ever since I bought Hadley Wickham’s lovely book “ggplot2: Elegant Graphics for Data Analysis (Use R!)” a few weeks back, I’ve been meaning to write up a simple end-to-end example of data collection and plotting using ggplot2.
Thus, without further delay, let’s try to make a pretty picture of the rate at which I’ve been writing here (and thus, of the rate at which my rather naive site search implementation’s dataset is growing).
Here’s what we’ll do:
sudo aptitude install r-base r-cran-ggplot2Collect the data:
echo date bytes post > data.txt
(for f in $(find posts -name 'index.txt'); do
DATE=$(cat $f | head -n 3 | tail -n 1 | sed -e 's/^% //');
echo $(date -d "$DATE" +%s) \
$(stat -c '%s' $f) \
$(echo $f | sed -e 's,posts/,,' -e 's,/index.txt,,');
done) | sort -n -k1 >> data.txtSanity-check the resulting data:
$ head data.txt
date bytes post
1232600400 2038 joy_of_tex
1234674000 3947 irrefutability
1275796800 2076 openkey
1300248000 5958 afd_discussions
1300593600 1358 safe_phones
1301371200 1126 convergence
1302235200 1404 secrets
1302408000 2916 comment_systems
1307160000 833 schedulingMake the plot:
$ R
library("ggplot2") # load ggplot2
df <- read.table("data.txt", header = TRUE) # load the data
ndf <- df[order(df$date),] # sort the data
ndf$date2 <- as.POSIXct(ndf$date, origin="1970-01-01") # convert timestamps to dates
ndf$total_bytes <- cumsum(ndf$bytes) # count total bytes over time
svg(filename="data.svg", width=6, height=4) # make the plot
qplot(x = date2, y = total_bytes, data=ndf, xlab="date", ylab="total bytes")
dev.off()Enjoy:
(P.S. - Care to guess when I joined Iron Blogger? :-)