hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20100212] Musing about git's object store efficiency
I'm currently looking at git to see what it can and cannot do, and one thing I've looked today is how effective the backing store mechanism is. To recall: CVS stores a list of patches between versions in a single file, and git stores each new revision in full in a separate file in the so-called object store. Is that an issue for NetBSD? Let's see;

One of the more frequently updated files is the i386 port's GENERIC kernel config file, which is at revision 1.963 right now. This means that since it's import into CVS, 963 different revisions have been made. In CVS, all those files are kept in a single GENERIC,v file. In git, this puts 963 files on the file system. A bit of a difference.

Looking at the space requirements for storing the repository data itself, the GENERIC,v file is 883,233 bytes[1]. Extracting all 963 versions from revision 1.1 to revision 963 results in disk space usage of 32,805,828 bytes[2,3]. And that's not counting the overhead of 962 inodes and the related directory bookkeeping.

In other words, the git model requires about 37 times the space that CVS does.

Sure the example file is not exactly one with an average number of revisions, and I know that git offers some more efficient storage methods via "pack" files, but investigating those is left as an exercise to the reader. :-)


[1] Obtained via rsync from cvs.netbsd.org:
% ls -la GENERIC,v 
-r--r--r--  1 feyrer  wheel  883233 Feb 12 16:57 GENERIC,v 

[2]

% mkdir extracted
% chdir extracted
% sh -c 'for i in `jot 964`; do echo $i ; co -p -r1.$i ../GENERIC >GENERIC-`printf %04d $i` ; done'

[3]

% cat extracted/* | wc -c
 32805828 


[Tags: , ]


Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 35115101
Copyright (c) Hubert Feyrer