Ever since I bought Hadley Wickham’s lovely book “ggplot2: Elegant Graphics for Data Analysis (Use R!)” a few weeks back, I’ve been meaning to write up a simple end-to-end example of data collection and plotting using ggplot2.
Thus, without further delay, let’s try to make a pretty picture of the rate at which I’ve been writing here (and thus, of the rate at which my rather naive site search implementation’s dataset is growing).
Here’s what we’ll do:
sudo aptitude install r-base r-cran-ggplot2
Collect the data:
echo date bytes post > data.txt
(for f in $(find posts -name 'index.txt'); do
DATE=$(cat $f | head -n 3 | tail -n 1 | sed -e 's/^% //');
echo $(date -d "$DATE" +%s) \
$(stat -c '%s' $f) \
$(echo $f | sed -e 's,posts/,,' -e 's,/index.txt,,');
done) | sort -n -k1 >> data.txt
Sanity-check the resulting data:
$ head data.txt
date bytes post
1232600400 2038 joy_of_tex
1234674000 3947 irrefutability
1275796800 2076 openkey
1300248000 5958 afd_discussions
1300593600 1358 safe_phones
1301371200 1126 convergence
1302235200 1404 secrets
1302408000 2916 comment_systems
1307160000 833 scheduling
Make the plot:
$ R
library("ggplot2") # load ggplot2
<- read.table("data.txt", header = TRUE) # load the data
df
<- df[order(df$date),] # sort the data
ndf $date2 <- as.POSIXct(ndf$date, origin="1970-01-01") # convert timestamps to dates
ndf$total_bytes <- cumsum(ndf$bytes) # count total bytes over time
ndf
svg(filename="data.svg", width=6, height=4) # make the plot
qplot(x = date2, y = total_bytes, data=ndf, xlab="date", ylab="total bytes")
dev.off()
Enjoy:
(P.S. - Care to guess when I joined Iron Blogger? :-)