Looking at the new kernel modules in NetBSD-current
In contrast to the current and previous NetBSD releases,
NetBSD-current and the next major release (6.0) uses a new
system for kernel modules. Unlike the "old" loadable kernel
modules (LKMs), the new module framework supports dependencies
between modules, and loading of kernel modules on demand.
Today, I've found time to install NetBSD-current/i386, and configure
things that I use here - /kern, /proc, and some NFS, in addition to
a local disk. Now, looking at the list of loaded kernel modules reveals:
NAME CLASS SOURCE REFS SIZE REQUIRES
compat misc builtin 0 - -
coredump misc filesys 1 3067 -
exec_elf32 misc filesys 0 7225 coredump
exec_script misc filesys 0 1187 -
ffs vfs boot 0 166292 -
kernfs vfs filesys 0 11131 -
nfs vfs filesys 0 145345 -
procfs vfs filesys 0 28068 -
ptyfs vfs filesys 0 8975 -
Interesting points here are that nfs, kernfs and procfs are just listed
in /etc/fstab, and the related filesystem modules
are loaded automatically, without a need to worry if they are
needed or not. In fact I just assumed NFS is in the GENERIC kernel.
Seems it's loaded as module! ;)
Another interesting module is "coredump", which is loaded by the
module to execure 32bit ELF programs, exec_elf32. This is an example
of module dependencies, and again no manual intervention was needed.
So what modules are there? First, let's remember that kernel modules
are object code that implements facilities for the running kernel,
and which interfaces closely with the running kernel. As such, they
need to match the kernel version, ideally. When one of the kernel's
API or ABI interfaces changes, it's best to rebuild all modules.
For NetBSD, the kernel's version is bumped e.g. from 5.99.15 to 5.99.16
for such an interface change, which helps tracking those changes.
Back to the question of what modules are there. Now that we know
kernel modules are closely tied to the version of the kernel
(which still is in the file /netbsd, btw), associated modules
-- for the example of NetBSD/i386 5.99.15 -- can be found in
% cd /stand/i386/5.99.15/modules
% ls -F
accf_dataready/ drm/ lfs/ ptyfs/
accf_httpready/ efs/ mfs/ puffs/
adosfs/ exec_aout/ miniroot/ putter/
aio/ exec_elf32/ mqueue/ radeondrm/
azalia/ exec_script/ msdos/ smbfs/
cd9660/ ext2fs/ nfs/ sysvbfs/
coda/ fdesc/ nfsserver/ tmpfs/
coda5/ ffs/ nilfs/ tprof/
compat/ filecore/ ntfs/ tprof_pmi/
compat_freebsd/ fss/ null/ udf/
compat_ibcs2/ hfs/ overlay/ umap/
compat_linux/ i915drm/ portal/ union/
compat_ossaudio/ kernfs/ ppp_bsdcomp/ vnd/
compat_svr4/ ksem/ ppp_deflate/
coredump/ layerfs/ procfs/
% ls */*.kmod
% find . -type f -print | wc -l
There are directories with major kernel subsystems in the named
directory, each one containing various files with the ".kmod" extension,
for kernel modules. Subsystems include kernel accept filters,
various file systems, compatibility modules, execution modules
for various binary formats, and many others. Currently there are
58 kernel modules, and I guess we can expect more in the future.
P.S.: I've seen one confusion WRT systems that use kernel modules to
whatever extent, as they shrink the size of the actual kernel
binary: Even with kernel modules, an operating system
is still a monolithic kernel: The modules are tied in closely
into the system once loaded, ending in a monolithic system.
In contrast, a "microkernel" is something
and it doesn't have anything to do with kernel modules. :-)
[Tags: kernel, lkm, modules]
Another source-changes catch-up (late may until second week of july 2008)
The following list gives changes to NetBSD-current
between end of may to second week of july. Note that
NetBSD is currently in a feature-freeze to prepare the
5.0 release, so there are more stability improvements
going in than new features being added:
- Work on the wrstuden-revivesa is ongoing. The old Scheduler
Activations (SA) based threading code that was removed from NetBSD
after 4.0 is adapted for NetBSD-current, so any applications that
depend on SAs can continue to run. This is important for binary
- More changes towards the new kernel modules (kmod) framework:
- file systems' sysctl init code is now ran in a fashion so that
the modules can either be linked statically into the kernel,
or loaded as module during runtime, without recompiling the code.
(this used to be done via some #defines previously, which either
expanded to code for the LKM, or to code for static inclusion).
- the uaudio driver can now be compiled as kmod. More work is done
to actually attach audio to newly found devices, though.
- Wasabi's journaling filesystem support was added on the
simonb-wapbl branch. There are still a number of issues to be
resolved before this gets to flight under real life conditions.
- Support for LVM as part of this year's Google Summer of Code
was added on the haad-dm branch. Currently it is possible to
create a logical volume, newfs and mount it with the Linux
lvm2tools lvcreate utility - the NetBSD driver is API-compatible
- After TNF has changed its copyright from 4-clause to 2-clause,
other holders of material in NetBSD's code base have made similar
- The yamt-pf42 branch was merged, which merges in a newer PF packet
filter from OpenBSD 4.2.
- Management of processor sets and thread affinity was added, see the
cpuset(3), affinity(3), pthread_setaffinity_np(3) and
pthread_getaffinity_np(3) manpages as well as the cpuctl(8) and
- The Red-Black-Tree code was optimized more, and moved in a place
so that the same code can be used both from userland (libc) and
- ifconfig(8) was changed to allow easy adding/removal of features
such as address families (inet, inet6, iso, atalk) and protocols
(802.11, 802.3ad, CARP) via the Makefile.
- SSH was extended with the HPN-SSH patch, which aims at improving
performance of SCP and the underlying SSH2 protocol by dynamically
allocating buffers. See
the HPN-SSH homepage
for more information.
[Tags: kernel, source]
More kernel works: audio, benchmarks, modules
In the past few weeks, Andrew Doran has made another bunch of
changes to NetBSD's kernel area, including interrupts in NetBSD's
audio framework, benchmarks of the system, and the handling of
SMP & audio:
One area that hasn't been changed
for moving towards fine-grained kernel locking was NetBSD's audio
subsystem. As audio recording and playback is mostly done via
interrupts, and as latency in those is critical, the audio
subsystem was moved to the new interrupt handling system.
The work can be found on the ad-audiomp branch, more information
is available in Andrew's posting about the MP safe
audio framework and drivers.
Changing a system from inside out is a huge technical task.
On the way, performance measurements and tuning are needed to
make sure that the previous performance is still achieved while
getting better performance in the desired development area.
As a result,
benchmarks results from Sun's
posted, which allow comparison not only against Linux and FreeBSD, but
also between NetBSD-current and NetBSD 4.0, in order to identify if any bad
effects were added. All performance tests were made on a machine with 8
CPUs, and the areas tested cover "small" (micro) areas like various system calls. Of course this doesn't lead to a 1:1 statement on how
the systems will perform in a real-life scenario like e.g. in a
database performance test, but it still help identifying
problems and gives better hints where tuning can be done.
Another benchmark that was also made in that regard comes
from Gregory McGarry, who has published
performance measurements previously.
This time, Gregory
has run the lmbench 3.0 benchmark on
recent NetBSD and FreeBSD systems as well as a number of previous
NetBSD releases - useful for identifying performance degradation, too!
One other benchmark on dispatch latency
run was made by Andrew Doran: on a machine
that was (CPU-wise) loaded by some compile jobs, he started a
two threads on a CPU that wasn't distracted by device interrupts,
and measured how fast the scheduler reacted when one thread
woke up the other one. The resulting graph
shows that the scheduler handles the majority of requests in less than
10us - good enough for some realtime applications?
are another area that's under heavy change right now, and after
recent changes to load modules from the bootloader and the kernel,
the kernel build process was now changed
so that pre-built kernel
modules can be linked into a new kernel binary, resulting in a
non-modular kernel. Eventually, this could mean that
src/sys is built into separate modules, and that the (many) existing
kernels that are present for each individual platform -- GENERIC,
INSTALL is already gone, ALL, etc. etc. -- can be simply linked from
pre-compiled modules, without recompiling things over again for each
kernel. Of course the overal goal here is to speed up the system (and
kernel!) build time, while maintaining maximum flexibility between
modules and non-modular kernels.
With the progress in kernel modules, it is a question of time
when the new kernel module handling supercedes the existing
loadable kernel modules to such an extent that the latter will
be completely removed from the system -- at least the
was alredy proposed, but I'd prefer to
see some documentation of the new system first. We'll see
what comes first! (Documentation writers are always welcome! :-)
[Tags: benchmark, kernel, smp]
The great source-changes catch-up for late March, April, and May 2008
Ok, after more weeks of slacking, some gems that I've found noteworthy,
i.e. that have some "enduser" effect, where I also included developers and
programmers in that group. I.e. not purely cosmetic/internal changes.
"Fun stuff", i.e., not the hard labor that's still needed, and much appreciated!
Here we go:
Changes related to SMP:
Changes related to networking:
- Yamamoto Takashi has started the yamt-nfs-mp branch to make
the NFS client MP-safe
- After merge of the yamt-lazymbuf branch, the send(2) and recv(2)
system calls are MP-safe
- Other system calls that have been made MP-safe are for NTP,
PMC, reboot, sysarch and time. With the exception of the Darwin and
Irix emulations, all system falls are now MP-safe!
- Progress on the wrstuden-revivesa branch to get back support for
Scheduler Activations. Much of the code that was removed when
Andrew's 1:1 threading was added is put back in a way that
both threading mechanisms can co-exist. Affected areas are
the interface to the generic scheduler and locking.
Many other changes:
- In the networking code, stats for ICMP, ICMP6, UDP, TCP, IP and IPv6
were changed from a structure to an array of uint64_t values by
Jason Thorpe. This removes a few structs from the kernel header
files. The change is ABI compatible with the old structures,
as such tools like netstat(1) will continue to work.
- Also, while moving towards a multi-threaded network stack, stats
for protocols like UDP6, IP, PIM6, ARP, IGMP, IPSEC, IPSEC_FAST,
PF_KEY, Appletalk DDP,
and CARP are accounted on a per-cpu base, and routines were added
to support collating per-cpu-gathered network statistics.
- ifconfig(8) got a major overhaul towards improved modularity and
extensibility. The internal parser's cleaner, and it should be easier
to add new commands.
- In the search for replacing the ISC DHCP client dhclient(8) with
something smaller, Rob Marpled's DHCP Client Daemon dhcpcd(8)
was imported. It is 1/6 of the size, yet has about all the features
plus adds support for more modern RFCs like IPv4LL (RFC 3927),
Classless Static Routes (RFC 3442) and Node-specific Client Identifiers
- Kernel support for adding/removing link-layer (i.e. MAC/ethernet)
addresses using SIOCALIFADDR AND SIOCDLIFADDR, respectively.
Corresponding ifconfig(8) changes were announced to come soon.
[Tags: kernel, source]
More kernel works: preemption and realtime, devfs, modules, testing
The following kernel-related projects were raised in the
past few weeks:
- Kernel Preemption:
Andrew Doran has continued his work towards fine-grained
locking, and he has proposed a
patch to implement kernel preemption,
i.e. that in a realtime environment, high-priority processes can
interrupt system calls running inside the kernel.
Handling the Floating Point Unit (FPU) was
added later on --
the FPU needs special attention as saving and restoring is
expensive, and doesn't need to be done in many cases. But if a
program uses it, care must be taken to handle the case.
The exact handling is
explained by Christoph Egger.
Christoph also outlined the
roadmap for getting realtime support
in NetBSD - there are still a number of bits missing, but being
able to preempt the kernel is a good first step!
- Fine-grained socket locking:
In order to allow fine-grained locking (instead of blocking
all other processes from entering the kernel, as is done in the
"biglock" SMP approach), many kernel subsystems need to be changed.
The socket system is the core part of interprocess communication,
and Andrew Doran has changed it to
use fine-grained locking
In that context, the question of
what code still runs with the biglock held, and
Andrew gave an overview where
more work is needed: some file systems (lfs, ext2fs, nfs),
most of the drivers, protocols like TCP/IP, Veriexec, and
some machine-dependent parts.
Veriexec-Hacker Brett Lymn
added details on the status of Veriexec
with respect to its transition towards fine-grained locking.
- Kernel modules and ramdisk:
A change in kernel modules was proposed
some time ago, and
Andrew Doran has used this scheme now to unify the way
many ports handle the install media: There, the kernel loaded
contains a ramdisk (miniroot) image inside the kernel, which is then used
as root-filesystem for the kernel, containing the install tools.
In order to split things and eventually use a stock GENERIC kernel
for both running and installing, Andy has
x86 boot process to load the miniroot as a kernel module.
When booting it may be useful to select one of several ramdisks:
one for installing, and one for resuing the system,
For this, the recently introduced boot.cfg file was
extended to handle kernel modules in the boot menu.
Izumi Tsutsui has
made an ISO with all changes for testing available.
- Device File System (devfs):
Another area of the kernel where a lot of work is currently being
done by Matt Fleming is NetBSD's device driver infrastructure,
esp. under aspects of dynamic attaching, detaching, and suspending
(power management!). To talk to the various drivers, device nodes in
the /dev directory are kept right now, but those are static and
need to be updated when a new driver is added.
Matt is working on a Device Filesystem (devfs) that dynamically
created /dev from the list of devices inside the kernel. The
fileysstem will also handle dynamic creation and deletion of
nodes, and as an important case it will also keep permissions
across reboots, if someone changes permissions manually.
The work is at a very mature point right now and needs some
testing - see
Matt's mail to the tech-kern list
for more information!
- Testing driver attachment:
While talking about testing of device drivers,
David Young has
reminded driver developers to test
individual drivers' detachment and re-attachment,
suspension and resumption after changes.
He has also
posted a how-to for those tests,
(The manpage needs some updating, sorry --
[Tags: devfs, initrd, kernel, kmod, lkm, preemption, realtime]
Recent development related to puffs, ReFUSE, rump, and more (Updated)
NetBSD's kernel is under very active development these days, and
while many changes are related to improve SMP, it's not the only area.
An area where very interesting and unique work is being done is the
filesystem interfaces that Antti Kantee is working. Things started
out as a past year's Google "userfs" SoC project to implement an interface
for running filesystem code in userland. The project was imported
into NetBSD some time ago. On top of that, a library that mimics the
Linux interface for filesystems in userland. Following the Linux
name FUSE, the re-implementation is called ReFUSE (pun intended :).
webpage about puffs, refuse, FUSE
on the NetBSD website for more information.
Another project that was started by Antti after his work to
run filesystem code in userland is "rump". The project allows to use
"ordinary" filesystems that usually run inside the kernel, and
mimic an environment similar to what's available inside the kernel,
and move the whole filesystem into userland - verbatime, with
no code changes! This allows to develop filesystem code in userland,
and later on move it inside the kernel with no further changes - a
bit step forward for filesystem development!
This all sounds rather easy, but as filesystems need to move data between
storage and memory, a big issue in filesystems is interfacing with the
virtual memory subsystem, and adding interfaces like puffs and ReFUSE
also needs to consider VM for efficient transfers and caching.
Work in this area is still ongoing, and I've asked Antti about his
recent achievements in this area. While the only user-visible
change is caching and performance improvements in the Secure
Shell filesystem's handler "mount_psshfs", most of the changes
are on the inside. Antti wrote me:
``The interesting ones from a programmer's perspective are probably:
Finally, while not really useful for anything except puffs development,
I think the following is cool from the perspective of completeness:
rump, there are two very interesting and active projects
doing research in filesystems on NetBSD, which may lead to changes in
the way filesystems are understood in the Unix world. While there,
a third project that may be worth watching in this regards is
this year's Google
Summer of Code project by Marek Dopiera, which aims at
implementing Hurd translators for NetBSD,
Antti dropped me a note that another project related to
filesystems is this year's "fs-utils"
SoC project. The goal is to create a userland tool to manipulate filesystem
images, and the idea is to reuse kernel code with the
ukfs library. That way, no redundancy between kernel sources
and userland sources are created, and both areas benefit from mutual
testing and code maturity.
[Tags: filesystem, fuse, google-soc, kernel, puffs, refuse]
Another stab at kernel modules
Currently, NetBSD supports loadable kernel modules via the LKM
interface. The interface supports a few types of kernel modules, e.g.
for file systems, system calls and executable file formats. Support
for loadable device drivers is currently limited, and source code for LKMs
needs to be adjusted to the interface, so the same code cannot be
used inside the kernel and as module. Andrew Doran has
on improved support for kernel modules. His improvements include
The current state of the work is that this is a first version of the
code that needs quite some more work. For more information, see
which also includes examples for testing, and future directions
that need to be done to replace LKMs. More thoughts on what
else needs to do are outlines in
Andrew's second mail on the subject.
- an in-kernel loader/linker so there's no need to rely on running
the ld(1), and rely on have a working userland, thus
- module dependencies so that one module can request to
load other modules automatically
- support to load modules from
the boot loader and provide them to the kernel
- use the same code for kernel modules and in-kernel
use, so things that are currently used inside the kernel can
be moved to a module easily, without changes in code.
[Tags: kernel, lkm]
Kernel-tuning without recompiling
NetBSD's i386 GENERIC kernel has ACPI enabled nowadays.
Given that there's more than enough (i386ish) hardware out there that
plainly doesn't work with ACPI, it's sort of inconvenient to
have the default kernel not work. Possible workarounds for this
situation are offering a kernel that has ACPI disabled (like
the GENERIC_NOACPI kernel that I've added just in time for
NetBSD 4.0), or using userconf to disable ACPI. The drawback
is that you either need a special kernel, or that it's not permanent.
A possible solution is available in OpenBSD's config(8) command:
By running "config -e /kernel", userconf commands can be "saved"
into the kernel binary, preventing the need to re-run userconf
on every boot.
Jared McNeill has
another approach for NetBSD now:
Instead of modifying the kernel binary, have the bootloader read
a list of (userconf) commands, and have the kernel execute them
Instead of introducing yet another config file format,
Jared has opted for (re)using the
proplib API functions
to load the config file from disk and pass it on to the kernel.
Those crying "YEEK, XML!" now can rest assured: there's a policy in
NetBSD that XML is not used for config files that the user needs to
edit, and the idea is to use userconf as usual, then
dump the settings to the config file and use
that on the next boot, see
Jared's second patch
for the most recent code version.
With this scheme, there's a common file where boot-time information
can be stored, and the eventual idea is not only to have all ports'
bootloaders read that file, but also store further information into
the file to make settings other than those available via userconf
include storing bootloader settings (timeout, serial console speed, ...)
and kernel tuneables like PCI_*_FIXUPs in there. I guess we can stay tuned
to see what will happen on this front!
 What is userconf? Make sure you have "options USERCONF" in your
kernel, then interrupt the bootloader and type "boot -c". You can
then type "disable acpi" to, well, disable ACPI. It works for other
drivers as well, but it won't be persistant and has to be done
on every boot.
[Tags: kernel, proplib, userconf]
Merging newlock2: consequences on in-kernel locking, SMP and threading
Andrew Doran has made substantial progress on the newlock2 branch,
is now ready to merge the branch into NetBSD-current.
Some of the changes this will bring are (citing from Andrew's mail, mostly):
for all the details, and esp. on how to update your system
after the merge if you run -current!
- A new set of synchronization primitives in the kernel designed
to make programming for multiprocessor systems easier and more efficient:
mutexes, reader / writer locks, condition variables, sleep queues and MI
memory barrier operations.
- A number of underlying kernel facilties have
been made 'multiprocessor safe' including the scheduler, ktrace and the
general purpose method of kernel synchronisation: sleep & wakeup
- Some application facilities have been made MP safe and can now run without
the "big lock" on multiprocessor systems, including signalling, SysV
messaging, and system calls that inspect process state, for example:
- The number of system calls that will run without
the big lock went from 1 up to 56, with more in the pipeline. For workloads
that are fork intensive and make heavy use of signals this will show a
small yet quantifiable benefit on multi-way systems.
- The branch introduces a new 1:1 threading model that allows multithreaded
applications to take advantage of all available CPUs in a multi-way
system. The scheduler activations implementation used from NetBSD 2.0
through NetBSD 4.0 provides execellent performance on single CPU systems,
but restricts any instance of a threaded application to a single CPU in
the system. Given that multicore and multi-CPU systems are increasingly
commonplace and that single threaded CPUs are rapidly disappearing from
the market, we made the decision to move to a new threading model, on the
basis that providing increased concurrency is now the most important
factor in ensuring good performance for threaded workloads.
- Those following
already know what that new 1:1 threading model means for the
scheduler-activations based m:n threading model:
[Tags: kernel, smp, threads]
Driver development hints
There is a
OpenBSD driver development hints
over at the
I guess much of this applies to NetBSD as well, and
it's nice to start with. More data is available
in the NetBSD Internals Guide,
Jochen Kunz's Writing Device Drivers
and of course
all section 9 manpages.
(If someone wants to include Jochen's text into the NetBSD Internals
Guide, that'd be great... just like any other work in that area.
Any takers? Send your patches to netbsd-docs@, feel free to CC:
[Tags: Docs, kernel, openbsd]
Grab the RSS-feed,
or go back to my regular NetBSD page
Disclaimer: All opinion expressed here is purely my own.
No responsibility is taken for anything.