hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20161105] NetBSD 7.0/xen scheduling mystery, and how to fix it with processor sets
Today I had a need to do some number crunching using a home-brewn C program. In order to do some manual load balancing, I was firing up some Amazon AWS instances (which is Xen) with NetBSD 7.0. In this case, the system was assigned two CPUs, from dmesg:
    # dmesg | grep cpu
    vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
    vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
I started two instances of my program, with the intent to have each one use one CPU. Which is not what happened! Here is what I observed, and how I fixed things for now.

I was looking at top(1) to see that everything was running fine, and noticed funny WCPU and CPU values:

      PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
      2791 root      25    0  8816K  964K RUN/0     16:10 54.20% 54.20% myprog
      2845 root      26    0  8816K  964K RUN/0     17:10 47.90% 47.90% myprog
I expected something like WCPU and CPU being around 100%, assuming that each process was bound to its own CPU. The values I actually saw (and listed above) suggested that both programs were fighting for the same CPU. Huh?!

top's CPU state shows:

    load averages:  2.15,  2.07,  1.82;               up 0+00:45:19        18:00:55
    27 processes: 2 runnable, 23 sleeping, 2 on CPU
    CPU states: 50.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 50.0% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
Which is not too useful. Typing "1" in top(1) lists the actual per-CPU usage instead:
    load averages:  2.14,  2.08,  1.83;               up 0+00:45:56        18:01:32
    27 processes: 4 runnable, 21 sleeping, 2 on CPU
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
This confirmed my suspicion that both processes were bound to one CPU, and that the other one was idling. Bad! But how to fix?

One option is to kick your operating system out of the window, but I still like NetBSD, so here's another solution: NetBSD allows to create "processor sets", assign CPU(s) to them and then assign processes to the processor sets. Let's have a look!

Processor sets are manipulated using the psrset(8) utility. By default all CPUs are in the same (system) processor set:

    # psrset
    system processor set 0: processor(s) 0 1
First step is to create a new processor set:
    # psrset -c
    1
    # psrset
    system processor set 0: processor(s) 0 1
    user processor set 1: empty
Next, assign one CPU to the new set:
    # psrset -a 1 1
    # psrset
    system processor set 0: processor(s) 0
    user processor set 1: processor(s) 1
Last, find out what the process IDs of my two (running) processes are, and assign them to the two processor sets:
    # ps -u 
    USER  PID %CPU %MEM   VSZ  RSS TTY     STAT STARTED     TIME COMMAND
    root 2791 52.0  0.0  8816  964 pts/4   R+    5:28PM 22:57.80 myprog
    root 2845 50.0  0.0  8816  964 pts/2   R+    5:26PM 23:33.97 myprog
    #
    # psrset -b 0 2791
    # psrset -b 1 2845
Note that this was done with the two processes running, there is no need to stop and restart them! The effect of the commands is imediate, as can be seen in top(1):
    load averages:  2.02,  2.05,  1.94;               up 0+00:59:32        18:15:08
    27 processes: 1 runnable, 24 sleeping, 2 on CPU
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
    Swap:

      PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
     2845 root      25    0  8816K  964K CPU/1     26:14   100%   100% myprog
     2791 root      25    0  8816K  964K RUN/0     25:40   100%   100% myprog
Things are as expected now, with each program being bound to its own CPU.

Now why this didn't happen by default is left as an exercise to the reader. Hints that may help:

    # uname -a
    NetBSD foo.eu-west-1.compute.internal 7.0 NetBSD 7.0 (XEN3_DOMU.201509250726Z) amd64
    # dmesg
    ...
    hypervisor0 at mainbus0: Xen version 4.2.amazon
    VIRQ_DEBUG interrupt using event channel 3
    vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
    vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 
AWS Instance type: c3.large
AMI ID: NetBSD-x86_64-7.0-201511211930Z-20151121-1142 (ami-ac983ddf)

[Tags: , , , , , ]


Tags: , 2bsd, 34c3, 3com, 501c3, 64bit, acl, acls, acm, acorn, acpi, acpitz, adobe, adsense, Advocacy, advocacy, advogato, aes, afs, aiglx, aio, airport, alereon, alex, alix, alpha, altq, am64t, amazon, amd64, anatomy, ansible, apache, apm, apple, arkeia, arla, arm, art, Article, Articles, ascii, asiabsdcon, aslr, asterisk, asus, atf, ath, atheros, atmel, audio, audiocodes, autoconf, avocent, avr32, aws, axigen, azure, backup, balloon, banners, basename, bash, bc, beaglebone, benchmark, bigip, bind, blackmouse, bldgblog, blog, blogs, blosxom, bluetooth, board, bonjour, books, boot, boot-z, bootprops, bozohttpd, bs2000, bsd, bsdca, bsdcan, bsdcertification, bsdcg, bsdforen, bsdfreak, bsdmac, bsdmagazine, bsdnexus, bsdnow, bsdstats, bsdtalk, bsdtracker, bug, build.sh, busybox, buttons, bzip, c-jump, c99, cafepress, calendar, callweaver, camera, can, candy, capabilities, card, carp, cars, cauldron, ccc, ccd, cd, cddl, cdrom, cdrtools, cebit, centrino, cephes, cert, certification, cfs, cgd, cgf, checkpointing, china, christos, cisco, cloud, clt, cobalt, coccinelle, codian, colossus, common-criteria, community, compat, compiz, compsci, concept04, config, console, contest, copyright, core, cortina, coverity, cpu, cradlepoint, cray, crosscompile, crunchgen, cryptography, csh, cu, cuneiform, curses, curtain, cuwin, cvs, cvs-digest, cvsup, cygwin, daemon, daemonforums, daimer, danger, darwin, data, date, dd, debian, debugging, dell, desktop, devd, devfs, devotionalia, df, dfd_keeper, dhcp, dhcpcd, dhcpd, dhs, diezeit, digest, digests, dilbert, dirhash, disklabel, distcc, dmesg, Docs, Documentation, donations, draco, dracopkg, dragonflybsd, dreamcast, dri, driver, drivers, drm, dsl, dst, dtrace, dvb, ec2, eclipse, eeepc, eeepca, ehci, ehsm, eifel, elf, em64t, embedded, Embedded, emips, emulate, encoding, envsys, eol, espresso, etcupdate, etherip, euca2ools, eucalyptus, eurobsdcon, eurosys, Events, exascale, ext3, f5, facebook, falken, fan, faq, fatbinary, features, fefe, ffs, filesystem, fileysstem, firefox, firewire, fireworks, flag, flash, flashsucks, flickr, flyer, fmslabs, force10, fortunes, fosdem, fpga, freebsd, freedarwin, freescale, freex, freshbsd, friendlyAam, friendlyarm, fritzbox, froscamp, fsck, fss, fstat, ftp, ftpd, fujitsu, fun, fundraising, funds, funny, fuse, fusion, g4u, g5, galaxy, games, gcc, gdb, gentoo, geode, getty, gimstix, git, gnome, google, google-soc, googlecomputeengine, gpio, gpl, gprs, gracetech, gre, groff, groupwise, growfs, grub, gumstix, guug, gzip, hackathon, hackbench, hal, hanoi, happabsd, hardware, Hardware, haze, hdaudio, heat, heimdal, hf6to4, hfblog, hfs, history, hosting, hotplug, hp, hp700, hpcarm, hpcsh, hpux, html, httpd, hubertf, hurd, i18n, i386, i386pkg, ia64, ian, ibm, ids, ieee, ifwatchd, igd, iij, image, images, imx233, imx7, information, init, initrd, install, intel, interix, internet2, interview, interviews, io, ioccc, iostat, ipbt, ipfilter, ipmi, ipplug, ipsec, ipv6, irbsd, irc, irix, iscsi, isdn, iso, isp, itojun, jail, jails, japanese, java, javascript, jetson, jibbed, jihbed, jobs, jokes, journaling, kame, kauth, kde, kerberos, kergis, kernel, keyboardcolemak, kirkwood, kitt, kmod, kolab, kvm, kylin, l10n, landisk, laptop, laptops, law, ld.so, ldap, lehmanns, lenovo, lfs, libc, license, licensing, linkedin, links, linksys, linux, linuxtag, live-cd, lkm, localtime, locate.updatedb, logfile, logging, logo, logos, lom, lte, lvm, m68k, macmini, macppc, macromedia, magicmouse, mahesha, mail, makefs, malo, mame, manpages, marvell, matlab, maus, max3232, mbr95, mbuf, mca, mdns, mediant, mediapack, meetbsd, mercedesbenz, mercurial, mesh, meshcube, mfs, mhonarc, microkernel, microsoft, midi, mini2440, miniroot, minix, mips, mirbsd, missile, mit, mixer, mobile-ip, modula3, modules, money, mouse, mp3, mpls, mprotect, mtftp, mult, multics, multilib, multimedia, music, mysql, named, nas, nasa, nat, ncode, ncq, ndis, nec, nemo, neo1973, netbook, netboot, netbsd, netbsd.se, nethack, nethence, netksb, netstat, netwalker, networking, neutrino, nforce, nfs, nis, npf, npwr, nroff, nslu2, nspluginwrapper, ntfs-3f, ntp, nullfs, numa, nvi, nvidia, nycbsdcon, office, ofppc, ohloh, olimex, olinuxino, olpc, onetbsd, openat, openbgpd, openblocks, openbsd, opencrypto, opendarwin, opengrok, openmoko, openoffice, openpam, openrisk, opensolaris, openssl, or1k, oracle, oreilly, oscon, osf1, osjb, paas, packages, pad, pae, pam, pan, panasonic, parallels, pascal, patch, patents, pax, paypal, pc532, pc98, pcc, pci, pdf, pegasos, penguin, performance, pexpect, pf, pfsync, pgx32, php, pie, pike, pinderkent, pkg_install, pkg_select, pkgin, pkglint, pkgmanager, pkgsrc, pkgsrc.se, pkgsrcCon, pkgsrccon, Platforms, plathome, pleiades, pocketsan, podcast, pofacs, politics, polls, polybsd, portability, posix, postinstall, power3, powernow, powerpc, powerpf, pppoe, precedence, preemption, prep, presentations, prezi, Products, products, proplib, protectdrive, proxy, ps, ps3, psp, psrset, pthread, ptp, ptyfs, Publications, puffs, puredarwin, pxe, qemu, qnx, qos, qt, quality-management, quine, quote, quotes, r-project, ra5370, radio, radiotap, raid, raidframe, rants, raptor, raq, raspberrypi, rc.d, readahead, realtime, record, refuse, reiserfs, Release, Releases, releases, releng, reports, resize, restore, ricoh, rijndael, rip, riscos, rng, roadmap, robopkg, robot, robots, roff, rootserver, rotfl, rox, rs323, rs6k, rss, ruby, rump, rzip, sa, safenet, san, sata, savin, sbsd, scampi, scheduler, scheduling, schmonz, sco, screen, script, sdf, sdtemp, secmodel, Security, security, sed, segvguard, seil, sendmail, serial, serveraptor, sfu, sge, sgi, sgimips, sh, sha2, shark, sharp, shisa, shutdown, sidekick, size, slackware, slashdot, slides, slit, smbus, smp, sockstat, soekris, softdep, softlayer, software, solaris, sony, sound, source, source-changes, spanish, sparc, sparc64, spider, spreadshirt, spz, squid, ssh, sshfs, ssp, statistics, stereostream, stickers, storage, stty, studybsd, subfile, sudbury, sudo, summit, sun, sun2, sun3, sunfire, sunpci, support, sus, suse, sushi, susv3, svn, swcrypto, symlinks, sysbench, sysctl, sysinst, sysjail, syslog, syspkg, systat, systrace, sysupdate, t-shirt, tabs, talks, tanenbaum, tape, tcp, tcp/ip, tcpdrop, tcpmux, tcsh, teamasa, tegra, teredo, termcap, terminfo, testdrive, testing, tetris, tex, TeXlive, thecus, theopengroup, thin-client, thinkgeek, thorpej, threads, time, time_t, timecounters, tip, tk1, tme, tmp, tmpfs, tnf, toaster, todo, toolchain, top, torvalds, toshiba, touchpanel, training, translation, tso, tty, ttyrec, tulip, tun, tuning, uboot, ucom, udf, ufs, ukfs, ums, unetbootin, unicos, unix, updating, upnp, uptime, usb, usenix, useradd, userconf, userfriendly, usermode, usl, utc, utf8, uucp, uvc, uvm, valgrind, vax, vcfe, vcr, veriexec, vesa, video, videos, virtex, virtualization, vm, vmware, vnd, vobb, voip, voltalinux, vpn, vpnc, vulab, w-zero3, wallpaper, wapbl, wargames, wasabi, webcam, webfwlog, wedges, wgt624v3, wiki, willcom, wimax, window, windows, winmodem, wireless, wizd, wlan, wordle, wpa, wscons, wstablet, X, x.org, x11, x2apic, xbox, xcast, Xen, xen, xfree, xfs, xgalaxy, xilinx, xkcd, xlockmore, xmms, xmp, xorg, xscale, youos, youtube, zaurus, zdump, zfs, zlib

'nuff. Grab the RSS-feed, index, or go back to my regular NetBSD page

Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 35847722
Copyright (c) Hubert Feyrer