hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20161222] Bringing the scheduler saga to the finishing line
After my last blog postings on the NetBSD scheduler, some time went by. What has happened that the code to handle process migration was rewritten to give more knobs for tuning, and some testing was done. The initial problem state in PR kern/51615 is solved by the code. To reach a wider audience and get more testing, the code was committed to NetBSD-current today.

Now, two things remain to be seen:

  1. More testing. This best involved situations that compare the system's behaviour without and with the patch. Situations to test include
    • pure computation jobs that involve multiple parallel processes
    • a mix of CPU-crunching and input/output, again on a number of concurrent processes
    • full build.sh examples
    If you have time and an interesting set of numbers, please feel free to let us know on tech-kern@..

  2. Documentation. There is already a number of undocumented sysctls under "kern.sched", which was now extended by one more, "average_weight". While it's obvious to add the knob from the formula, testing it under various real-life conditions and see how things change is left to be determined by a PhD thesis or two - be sure to drop us your patches for src/share/man/man7/sysctl.7 if you can come up with a comprehensible description of all the scheduler sysctls!
So just now when you thought there is no more research to be done in scheduling algorithms, here is your chance to fame and glory! :-)

[Tags: , ]


[20161124] Apple Releases macOS 10.12 Sierra Open Source Darwin Code
Interesting news come in via slashdot: Apple Releases macOS 10.12 Sierra Open Source Darwin Code: ``Apple has released the open source Darwin code for macOS 10.12 Sierra. The code, located on Apple's open source website, can be accessed via direct link now, although it doesn't yet appear on the site's home page. The release builds on a long-standing library of open source code that dates all the way back to OS X 10.0. There, you'll also find the Open Source Reference Library, developer tools, along with iOS and OS X Server resources. The lowest layers of macOS, including the kernel, BSD portions, and drivers are based mainly on open source technologies, collectively called Darwin. As such, Apple provides download links to the latest versions of these technologies for the open source community to learn and to use.''

This may not only be of interest to the OpenDarwin folks (or rather their successors in PureDarwin) but more investigation not only on the code itself, but also the license it is released under is neccessary to learn if anything can be gained back for NetBSD.

Why "back"? As you may or may not remember, mac OS includes some parts of NetBSD (besides lots of FreeBSD, probably some OpenBSD, much other Open Source software and sure a big lot of Apple's own code).

[Tags: , , ]


[20161124] BSD now 169: Scheduling your NetBSD, plus a comment
BSD Now 169 is out, entitled "Scheduling your NetBSD". Yai, exciting contents in this video BSD centric video podcast!

As it turns out, Allan Jude and Kris Moore actually read from some guy's NetBSD blog starting at 0:22:50, going over the mumblings on the NetBSD scheduler there. Exciting - I think I want to blog about this to get more NetBSD content on the 'net. ;-)

Now, serious, to avoid getting into a recursive content loop, I'd like to add one thing that may have caused a bit of confusion at the end:

The problem mentioned at the end that led to the statement that the patch wasn't perfect wasn't to blame on the patch, but on my testing environment. Using all CPU cores on VMware left none for my normal operating system, and as such it was not funny to test. That was the reason why I aborted the build-test went from 4 to 2 CPU cores. Nothing related to the patch itself. Sorry to Allan and Kris if that didn't come out clear. Feel free to add that in BSD now 170! :-)

[Tags: , ]


[20161123] EuroBSDCon 2016 Talks and NetBSD
This year's EuroBSDCon took place in Belgrade, and the slides are now available. Have a look at the full lot - or pick the ones that are relevant to NetBSD: Reminder: Presentations about either NetBSD itself, its internals but also how to use NetBSD to do something cool, neat, useful or just utterly obscure are always welcome. Let me know, or even better: file your (Euro)BSDCon talk! :)

[Tags: , , ]


[20161123] In-kernel audio mixing ahead
NetBSD's sound device is currently only available for exclusive use. If one program uses it, another cannot. So if you want to play some music (mp3, audio stream) that's fine, but if you want to also have your web browser or mail client make some noise, this is not possible. Until now.

The solution is to mix multiple audio sources together, in effect allowing /dev/sound (etc.) access to be non-exclusive for a single process but several ones instead. To make this happen, audio from those sources needs to be mixed to come out of the same speaker, and since data writte to /dev/sound gets inside the kernel, that is a good place to do the mixing.

Challenges in the play are if audio sources are of different quality (bitrate, stereo/mono, bitrate), so some adjusting may be needed. All this is met by the latest patch by Nathanial Sloss, see his posting to tech-kern for more information.

Also, note his request for review and testing! :-)

[Tags: , , ]


[20161113] Learning more about the NetBSD scheduler (... than I wanted to know)
I've had another chat with Michael on the scheduler issue, and we agreed that someone should review his proposed patch. Some interesting things came out from there:
  1. I learned a bit more about the scheduler from Michael. With multiple CPUs, each CPU has a queue of processes that are either "on the CPU" (running) or waiting to be serviced (run) on that CPU. Those processes count as "migratable" in runqueue_t. Every now and then, the system checks all its run queues to see if a CPU is idle, and can thus "steal" (migrate) processes from a busy CPU. This is done in sched_balance().

    Such "stealing" (migration) has the positive effect that the process doesn't have to wait for getting serviced on the CPU it's currently waiting on. On the other side, migrating the process has effects on CPU's data and instruction caches, so switching CPUs shouldn't be taken too easy.

    If migration happens, then this should be done from the CPU with the most processes that are waiting for CPU time. In this calculation, not only the current number should be counted in, but a bit of the CPU's history is taken into account, so processes that just started on a CPU are not taken away again immediately. This is what is done with the help of the processes currently migratable (r_mcount) and also some "historic" average. This "historic" value is taken from the previous round in r_avgcount. More or less weight can be given to this, and it seems that the current number of migratable processes had too little weight over all to be considerend.

    What happens in effect is that a process is not taken from its CPU, left waiting there, with another CPU spinning idle. Which is exactly what I saw in the first place.

  2. What I also learned from Michael was that there are a number of sysctl variables that can be used to influence the scheduler. Those are available under the "kern.sched" sysctl-tree:
    % sysctl -d kern.sched
    kern.sched.cacheht_time: Cache hotness time (in ticks)
    kern.sched.balance_period: Balance period (in ticks)
    kern.sched.min_catch: Minimal count of threads for catching
    kern.sched.timesoftints: Track CPU time for soft interrupts
    kern.sched.kpreempt_pri: Minimum priority to trigger kernel preemption
    kern.sched.upreempt_pri: Minimum priority to trigger user preemption
    kern.sched.rtts: Round-robin time quantum (in milliseconds)
    kern.sched.pri_min: Minimal POSIX real-time priority
    kern.sched.pri_max: Maximal POSIX real-time priority 
    The above text shows that much more can be written about the scheduler and its whereabouts, but this remains to be done by someone else (volunteers welcome!).

  3. Now, while digging into this, I also learned that I'm not the first to discover this issue, and there is already another PR on this. I have opened PR kern/51615 but there is also kern/43561. Funny enough, the solution proposed there is about the same, though with a slightly different implementation. Still, *2 and <<1 are the same as are /2 and >>1, so no change there. And renaming variables for fun doesn't count anyways. ;) Last but not least, it's worth noting that this whole issue is not Xen-specific.
So, with this in mind, I went to do a bit of testing. I had already tested running concurrent, long-running processes that did use up all the CPU they got, and the test was good.

To test a different load on the system, I've started a "build.sh -j8" on a (VMware Fusion) VM with 4 CPUs on a Macbook Pro, and it nearly brought the machine to a halt - What I saw was lots of idle time on all CPUs though. I aborted the exercise to get some CPU cycles for me back. I blame the VM handling here, not the guest operating system.

I restarted the exercise with 2 CPUs in the same VM, and there I saw load distribution on both CPUs (not much wonder with -j8), but there was also quite some idle times in the 'make clean / install' phases that I'm not sure is normal. During the actual build phases I wasn't able to see idle time, though the system spent quite some time in the kernel (system). Example top(1) output:

    load averages:  9.01,  8.60,  7.15;               up 0+01:24:11      01:19:33
    67 processes: 7 runnable, 58 sleeping, 2 on CPU
    CPU0 states:  0.0% user, 55.4% nice, 44.6% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user, 69.3% nice, 30.7% system,  0.0% interrupt,  0.0% idle
    Memory: 311M Act, 99M Inact, 6736K Wired, 23M Exec, 322M File, 395M Free
    Swap: 1536M Total, 21M Used, 1516M Free
    
    PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
    27028 feyrer    20    5    62M   27M CPU/1      0:00  9.74%  0.93% cc1
      728 feyrer    85    0    78M 3808K select/1   1:03  0.73%  0.73% sshd
    23274 feyrer    21    5    36M   14M RUN/0      0:00 10.00%  0.49% cc1
    21634 feyrer    20    5    44M   20M RUN/0      0:00  7.00%  0.34% cc1
    24697 feyrer    77    5  7988K 2480K select/1   0:00  0.31%  0.15% nbmake
    24964 feyrer    74    5    11M 5496K select/1   0:00  0.44%  0.15% nbmake
    18221 feyrer    21    5    49M   15M RUN/0      0:00  2.00%  0.10% cc1
    14513 feyrer    20    5    43M   16M RUN/0      0:00  2.00%  0.10% cc1
      518 feyrer    43    0    15M 1764K CPU/0      0:02  0.00%  0.00% top
    20842 feyrer    21    5  6992K  340K RUN/0      0:00  0.00%  0.00% x86_64--netb
    16215 feyrer    21    5    28M  172K RUN/0      0:00  0.00%  0.00% cc1
     8922 feyrer    20    5    51M   14M RUN/0      0:00  0.00%  0.00% cc1 
All in all, I'd say the patch is a good step forward from the current situation, which does not properly distribute pure CPU hogs, at all.

[Tags: , ]


[20161109] Looking at the scheduler issue again (Updated)
I've encountered a funny scheduler behaviour the other day in a Xen enviroment. The behaviour was that CPU load was not distributed evenly on all CPUs, i.e. in my case on a 2-CPU-system, two CPU-bound processes fought over the same CPU, leaving the other one idle.

I had another look at this today, and was able to reproduce the behaviour using VMWare Fusion with two CPU cores on both NetBSD 7.0_STABLE as well as -current, both with sources as of today, 2016-11-08. I've also made a screenshot available that shows the issue on both systems. I have also filed a problem report to document the issue.

The one hint that I got so far was from Michael van Elst that there may be a rounding error in sched_balance(). Looking at the code, there is not much room for a rounding error. But I am not familiar enough (at all) with the code, so I cannot judge if crucial bits are dropped here, or how that function fits in the whole puzzled.

Update: Pondering on the "rounding error", I've setup both VMs with 4 CPUs, and the behaviour shown there is that load is distributed to about 3 and a half CPU - three CPUs under full load, and one not reaching 100%. There's definitely something fishy in there. See screenshot.

Splitting up the four CPUs on different processor sets with one process assigned to each set (using psrset(8)) leads to an even load distribution here, too. This leads me to thinking that the NetBSD scheduling works well between different processor sets, but is busted within one set.

[Tags: , , ]


[20161105] NetBSD 7.0/xen scheduling mystery, and how to fix it with processor sets
Today I had a need to do some number crunching using a home-brewn C program. In order to do some manual load balancing, I was firing up some Amazon AWS instances (which is Xen) with NetBSD 7.0. In this case, the system was assigned two CPUs, from dmesg:
    # dmesg | grep cpu
    vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
    vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
I started two instances of my program, with the intent to have each one use one CPU. Which is not what happened! Here is what I observed, and how I fixed things for now.

I was looking at top(1) to see that everything was running fine, and noticed funny WCPU and CPU values:

      PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
      2791 root      25    0  8816K  964K RUN/0     16:10 54.20% 54.20% myprog
      2845 root      26    0  8816K  964K RUN/0     17:10 47.90% 47.90% myprog
I expected something like WCPU and CPU being around 100%, assuming that each process was bound to its own CPU. The values I actually saw (and listed above) suggested that both programs were fighting for the same CPU. Huh?!

top's CPU state shows:

    load averages:  2.15,  2.07,  1.82;               up 0+00:45:19        18:00:55
    27 processes: 2 runnable, 23 sleeping, 2 on CPU
    CPU states: 50.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 50.0% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
Which is not too useful. Typing "1" in top(1) lists the actual per-CPU usage instead:
    load averages:  2.14,  2.08,  1.83;               up 0+00:45:56        18:01:32
    27 processes: 4 runnable, 21 sleeping, 2 on CPU
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
This confirmed my suspicion that both processes were bound to one CPU, and that the other one was idling. Bad! But how to fix?

One option is to kick your operating system out of the window, but I still like NetBSD, so here's another solution: NetBSD allows to create "processor sets", assign CPU(s) to them and then assign processes to the processor sets. Let's have a look!

Processor sets are manipulated using the psrset(8) utility. By default all CPUs are in the same (system) processor set:

    # psrset
    system processor set 0: processor(s) 0 1
First step is to create a new processor set:
    # psrset -c
    1
    # psrset
    system processor set 0: processor(s) 0 1
    user processor set 1: empty
Next, assign one CPU to the new set:
    # psrset -a 1 1
    # psrset
    system processor set 0: processor(s) 0
    user processor set 1: processor(s) 1
Last, find out what the process IDs of my two (running) processes are, and assign them to the two processor sets:
    # ps -u 
    USER  PID %CPU %MEM   VSZ  RSS TTY     STAT STARTED     TIME COMMAND
    root 2791 52.0  0.0  8816  964 pts/4   R+    5:28PM 22:57.80 myprog
    root 2845 50.0  0.0  8816  964 pts/2   R+    5:26PM 23:33.97 myprog
    #
    # psrset -b 0 2791
    # psrset -b 1 2845
Note that this was done with the two processes running, there is no need to stop and restart them! The effect of the commands is imediate, as can be seen in top(1):
    load averages:  2.02,  2.05,  1.94;               up 0+00:59:32        18:15:08
    27 processes: 1 runnable, 24 sleeping, 2 on CPU
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
    Swap:

      PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
     2845 root      25    0  8816K  964K CPU/1     26:14   100%   100% myprog
     2791 root      25    0  8816K  964K RUN/0     25:40   100%   100% myprog
Things are as expected now, with each program being bound to its own CPU.

Now why this didn't happen by default is left as an exercise to the reader. Hints that may help:

    # uname -a
    NetBSD foo.eu-west-1.compute.internal 7.0 NetBSD 7.0 (XEN3_DOMU.201509250726Z) amd64
    # dmesg
    ...
    hypervisor0 at mainbus0: Xen version 4.2.amazon
    VIRQ_DEBUG interrupt using event channel 3
    vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
    vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 
AWS Instance type: c3.large
AMI ID: NetBSD-x86_64-7.0-201511211930Z-20151121-1142 (ami-ac983ddf)

[Tags: , , , , , ]


[20161030] NetBSD 7.0.2 released
Why 7.0.2? Following NetBSD's release scheme, there are major releases (e.g. 7.0) with subsequent updates (e.g. 7.1). Those "major" release and their updates include both new features as well as bug fixes - the latter one again with and without security relevance. New code, new risks - as a result for getting updates, existing interfaces may change and lead to incompatibiltites. This may affect either binary compatibility between programs and their required shared libraries, as well - though rare - incompatible chances on the source code level.

NetBSD takes quite some effort to keep such incompatibilites low, yet they happen. The only real solutions is: no updates. "Never change a running system" is nice for availability, but it poses security risks. The time when a big server uptime was considered a sign of good system administration are gone. Today, a long update means the system (probably) runs outdated and as such vulnerable code.

So to solve the problem a compromise is needed: little updates, but crucial security updates do get done. Which is where NetBSD's "minor" release like NetBSD 7.0.2 come into play. With its set of changes, a number of external software packages got security-related updates (e.g. OpenSSL, NTP, BIND, X), and a smaller number of security related changes were also added, e.g. a race condition in mail.local(8), crashes in the Networking File System (NFS) and the native Fast File System (FFS) plus some platform-specific crashes on MIPS, PowerPC and SPARC64.

For more information on downloading and installation see the release announcement as well as the platform-specific install documentation, e.g. for NetBSD 7.0.2/arm64's INSTALL.html file.

[Tags: , , , , , , ]


[20161007] Interview with spz@ on BSDnow
There is an interview of Petra "spz@" at BSDnow. She talks about how she got into Unix and NetBSD, and talks about all the different hats she has in the NetBSD Project and The NetBSD Foundation, TNF. The interview starts at Minute 26 - have a look!

[Tags: , ]


More recent 10 entriesPrevious 10 entries
Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 35746323
Copyright (c) Hubert Feyrer