Looking at the scheduler issue again (Updated)
I've encountered a
funny scheduler behaviour the other day in a Xen enviroment.
The behaviour was that
CPU load was not distributed evenly on all CPUs, i.e. in my case
on a 2-CPU-system, two CPU-bound processes fought over the same
CPU, leaving the other one idle.
I had another look at this today, and was able to reproduce the
behaviour using VMWare Fusion with two CPU cores
on both NetBSD 7.0_STABLE as well as -current, both with
sources as of today, 2016-11-08.
made a screenshot available
that shows the issue on both systems.
I have also
filed a problem report
to document the issue.
The one hint that I got so far was from Michael van Elst that
there may be a rounding error in sched_balance().
Looking at the code, there is not much room for a rounding error.
But I am not familiar enough (at all) with the code, so I cannot judge
if crucial bits are dropped here, or how that function fits in the
Pondering on the "rounding error", I've setup both VMs with
4 CPUs, and the behaviour shown there is that load is
distributed to about 3 and a half CPU - three CPUs under
full load, and one not reaching 100%. There's definitely
something fishy in there.
Splitting up the four CPUs on different processor sets with one process
assigned to each set (using psrset(8)) leads to an even load distribution
here, too. This leads me to thinking that the NetBSD scheduling works
well between different processor sets, but is busted within one set.
[Tags: amd64, scheduler, xen]