Musing about git's object store efficiency
I'm currently looking at git to see what it can and cannot do,
and one thing I've looked today is how effective the backing store
mechanism is. To recall: CVS stores a list of patches between versions
in a single file,
and git stores each new revision in full in a separate file in the
so-called object store.
Is that an issue for NetBSD?
One of the more frequently updated files is the i386 port's GENERIC
kernel config file, which is at revision 1.963 right now. This means
that since it's import into CVS, 963 different revisions have been
made. In CVS, all those files are kept in a single GENERIC,v file.
In git, this puts 963 files on the file system.
A bit of a difference.
Looking at the space requirements for storing the repository data
itself, the GENERIC,v file is 883,233 bytes. Extracting all 963
versions from revision 1.1 to revision 963 results in disk space
usage of 32,805,828 bytes[2,3]. And that's not counting the overhead of 962
inodes and the related directory bookkeeping.
In other words, the
git model requires about 37 times the space that CVS does.
Sure the example file is not exactly one with an average
number of revisions, and I know that git offers some
more efficient storage methods via "pack" files,
but investigating those is left as an exercise to
the reader. :-)
 Obtained via rsync from cvs.netbsd.org:
% ls -la GENERIC,v
-r--r--r-- 1 feyrer wheel 883233 Feb 12 16:57 GENERIC,v
% mkdir extracted
% chdir extracted
% sh -c 'for i in `jot 964`; do echo $i ; co -p -r1.$i ../GENERIC >GENERIC-`printf %04d $i` ; done'
% cat extracted/* | wc -c
[Tags: cvs, git]
Mondo catch-up on source-changes (~Aug '07 'till Feb '08)
In the context of Mark Kirby
stopping his NetBSD CVS Digest,
I've felt an urge to catch up on
source-changes, and put
up some of the items here that I haven't found mentioned
or announced elsewhere (or that I've plainly missed)
after digging through some 7,000 mails. All those
available in NetBSD-current today
and that will be in NetBSD 5.0:
- Support C99 complex arithmetic was added by importing the
"cephes" math library
- POSIX Message queues were added
- bozohttpd was added as httpd.
- the x86 bootloader now reads /boot.cfg to configure banner
text, console device, timeout etc. - see boot.cfg(5)
- ifconfig(8) now has a "list scan" command to scan for access points
- SMP (multiprocessor) support is now enabled in i386 and amd64 GENERIC kernels
- Processor-sets, affinity and POSIX real-time extensions were added,
along with the schedctl(8) program to control scheduling of processes
- systrace was removed, due to security concerns
- the refuse-based Internet Access Node file system was committed, which
provides a filesystem interface to FTP and HTTP, similar to the old
alex file system,
- LKMs don't care for options MULTIPROCESSOR and LOCKDEBUG, i.e.
it's easier to reuse LKMs between debugging/SMP and non-debugging/SMP
- PCC, the Portable C Compiler that originates in the very beginnings of
Unix, was added to NetBSD. The idea is that it is used as alternative
to the GNU C Compiler in the long run.
- In addition to the iSCSI target (server) code that is already in
NetBSD 4.0, there'a also a refuse-based iSCSI initiator (client)
now, see http://mail-index.netbsd.org/source-changes/2007/11/08/0038.html
The above list is a mixed list of items. There are a number of
areas where there is very active development going on in NetBSD.
Andrew Doran is further working on SMP, fine-grained locking
inside the kernel and interrupt priority handling. Antti Kantee
has has done more work on his filesystems work (rump, puffs,
refuse/fuse), and Jared McNeill and Jörg Sonnenberger have
continued their work on NetBSD's power management framework.
Those changes are large and far-reaching, and I've yet to look
at them before I can report more here.
- Many driver updates and new drivers, see your nearest GENERIC kernel config file
- Many security updates, see list of security advisories
- Many 3rd software packages that NetBSD ships with were updated:
ipsec-tools (racoon), GCC 4.1, Automated Testing Framework 0.4,
OpenSSH 4.7, wpa_supplicant and hostapd 0.6.2, OpenPAM Hydrangea
So much on this subject for now. If someone's willing to help out
with continuing Mark Kirby's
NetBSD CVS Digest
either using his software-setup or by simply reading the list
and writing a monthly/weekly digest of the "interesting" changes,
I'd appreciate this very much. Put me on CC: for your postings! :)
[Tags: alex, bozohttpd, c99, cephes, cvs, cvs-digest, digest, ian, iscsi, lkm, pcc, refuse, smp, systrace]
CVS and stickiness
For the past few weeks, I've tried to build NetBSD-current on my
slow old PC, and it always bombed out in src/distrib/i386/cdroms,
complaining that my bootxx_cd9660 is busted:
/home/cvs/src-current/obj.i386/tooldir/bin/nbinstallboot -t raw \
-mi386 bootxx /home/cvs/src-current/obj.i386/destdir/usr/\
nbinstallboot: Invalid magic in stage1 bootstrap 0 != 7886b6d1
nbinstallboot: Set bootstrap operation failed
This worked fine a few weeks ago, and the only major change
that happened in NetBSD since then was the switch from gcc3 to gcc4.
Suspecting some breakage there, I started building everying
without any optimisation today ("nbinstallboot" needs HOST_CFLAGS="",
also "bootxx" and "bootxx_cd9660"), but that didn't change anything.
I've verified that daily releng builds work, so this was probably a problem
on my side, but where? I didn't want to blindly rebuild the whole toolchain
on this slow PC, so tried investigating. Comparing /usr/mdec/bootxx_cd990
from my own and the releng build showed that there *was* some difference,
so I continued looking in src/sys/arch/i386/stand/bootxx/bootxx_cd9660
to see what the matter was. Using hexdump -C showed that there was a difference
between my bootxx_cd9660 and the releng one, and after getting the intermediate
files of the build (bootxx_cd9660.tmp, cdboot.o) from a helpful being on
#NetBSD, nm(1) showed that my version of cdboot.o lacked several symbols, e.g. a
As the cdboot.o file is made directly from a cdboot.S file,
there's probably not much chance for the compiler to break
things, and I didn't really believe that the assembler
would add symbols on its own. Asking other people, they confirmed
that they had "start1" in their cdboot.S files, while my copy
of the same file lacked such a symbol. From there it
was just a quick look at src/sys/arch/i386/stand/cdboot/CVS/Entries
to fine the problem:
miyu% cat CVS/Entries
/Makefile/1.6/Wed Jun 28 20:23:05 2006//
/cdboot.S/1.2/Mon Aug 7 23:24:18 2006//T1.2
Apparently I used "cvs update -r1.2 cdboot.S" some time ago
to get that specific version, and forgot to tell CVS to remove
that sticky tag to get the latest version on later 'cvs update' runs.
Also, 'cvs update' doesn't tell that a file is sticky and
so this was never detected, until it exploded.
Now if the CVS update would print something for sticky
files as it does for modified files, that would have saved
me some time this evening. Doh!
Next thing to do: cd src ; cvs up -A,
just to be on the safe side.
[Tags: cvs, hubertf, rants]
Comparison of Version Control Systems
Subversion over CVS or not but what then else is an ongoing debate.
Personally SVN may be nice from a user PoV, but when I have to setup
an Apache and WebDAV and whatnot, then I prefer staying to CVS, thanks.
To get some ideas of what other VCSs there out there and how they
compare among each other, there is a nice
Version Control System Comparison
I found on Bluephod today.
From a quick glance, I think I should look at Monotone a bit more...
[Tags: cvs, svn]
Switching CVS servers easily
After copying a few old trees checked out from CVS today, and updating
them afterwards, I got the dreaded list of conflicts in files that
I've never touched:
cvs update: move away distrib/utils/sysinst/Makefile; it is in the way
cvs update: move away distrib/utils/sysinst/Makefile.inc; it is in the way
cvs update: move away distrib/utils/sysinst/SPELLING.en; it is in the way
Remembering that this is usually caused by having different values
in the CVS/Root files, this was the case for me too - I had
a generic address of a CVS server in most of the files, but the
files under src/distrib had a IPv4-only address in there, which
caused the problem.
To end the problem for all times, I remembered of a nice trick
I saw recently (I think mentioned by someone from the NetBSD admins
team) to use one file for all CVS/Root files, and hardlink that into
% cd .../src
% cat CVS/Root >r
% find . -name Root | grep CVS/Root \
? | sh -c 'while read r ; do echo rm $r ; echo ln r $r ; done' \
? | sh -v
% rm r
This command first copies the contents of .../src/CVS/Root to
a temporary file "r" - make sure it contains the value to be used
anywhere! After that, all CVS/Root files are searched, removed
and (hard)linked to the temporary file, which is then removed.
Now whenever the CVS repository needs to be switched, updating
a single file is enough, due to all the .../CVS/Root files being
hardlinks to the same file (watch the link count of 6494):
% ls -la CVS/Root
-rw-r--r-- 6494 feyrer wheel 32 Mar 20 20:39 CVS/Root
% cat bin/ls/CVS/Root
% echo firstname.lastname@example.org:/cvsroot >CVS/Root
% cat bin/ls/CVS/Root
Grab the RSS-feed,
or go back to my regular NetBSD page
Disclaimer: All opinion expressed here is purely my own.
No responsibility is taken for anything.