Musing about git's object store efficiency
I'm currently looking at git to see what it can and cannot do,
and one thing I've looked today is how effective the backing store
mechanism is. To recall: CVS stores a list of patches between versions
in a single file,
and git stores each new revision in full in a separate file in the
so-called object store.
Is that an issue for NetBSD?
One of the more frequently updated files is the i386 port's GENERIC
kernel config file, which is at revision 1.963 right now. This means
that since it's import into CVS, 963 different revisions have been
made. In CVS, all those files are kept in a single GENERIC,v file.
In git, this puts 963 files on the file system.
A bit of a difference.
Looking at the space requirements for storing the repository data
itself, the GENERIC,v file is 883,233 bytes. Extracting all 963
versions from revision 1.1 to revision 963 results in disk space
usage of 32,805,828 bytes[2,3]. And that's not counting the overhead of 962
inodes and the related directory bookkeeping.
In other words, the
git model requires about 37 times the space that CVS does.
Sure the example file is not exactly one with an average
number of revisions, and I know that git offers some
more efficient storage methods via "pack" files,
but investigating those is left as an exercise to
the reader. :-)
 Obtained via rsync from cvs.netbsd.org:
% ls -la GENERIC,v
-r--r--r-- 1 feyrer wheel 883233 Feb 12 16:57 GENERIC,v
% mkdir extracted
% chdir extracted
% sh -c 'for i in `jot 964`; do echo $i ; co -p -r1.$i ../GENERIC >GENERIC-`printf %04d $i` ; done'
% cat extracted/* | wc -c
[Tags: cvs, git]