All this information is stored inside the the kernel, and to access these data, one has to know in which "variables" the kernel stores the information, then dig into the kernel memory to get the values. This approach has several disadvantages: It's not portable, as various operating systems store the information in different formats and with different names. Plus access to information can't be restricted that way - malicious processes can read/write any information in the kernel, not only the one they're interrested in. Access to the kernel memory is usually handled via /dev/kmem, on BSD based systems a user/process has to be a member of the group "kmem" to have access to this device file. If such a process does something it shouldn't, the system's security can be compromised - this is often used by programs exploiting so-called "buffer overruns", writing past buffer boundaries to write their own program code to the process, which then executes this malicious code. The results range from harmless core dumps over Denial of Service attacks to modifications of the system, usually installing backdoors to the system.
The problems of /dev/kmem-based programs to access kernel data structures has lead to the design of some alternatives. One of them being the "sysctl" facility, usually found on BSD based systems. With sysctl, one accesses information which is stored in a MIB structure like that of SNMP, e.g. to access some data of the IP stack, one would specify "net.inet.tcp.keepidle" to access (only) that bit of information. MIB entries are either read-only or read-write, so you cannot modify values like the kernel's load average which are read-only.
A problem of sysctl is that to access the information, the MIB must be specified as a series of numbers instead of strings, and thus contradicts the traditional "everything is a file" approach.
Following this concept more closely are several filesystems, which make certain information from the kernel visible to user space via a filesystem interface:
Basically the exploint does open /proc/<pid>/mem, seek to a stack address, and then use this file descriptor as stderr. Then you fork and exec two suid programs, and make one of them write to stderr, which points to /proc/<pid>/mem@stack of the other setuid process. That way, you can manupulate the second process in an arbitrary way, just like any buffer overflow exploit does.
Writing to the other process' memory is possible because the procfs descriptor is left open after the parent process exec()s. A possible fix to this is to mark the descriptor as close-on-exec automatically from the kernel, but the process could unset this. A better fix is to invalidate the descriptor when the process it points to calls exec(2).
Implementation of this invalidation can be done in the exec-module of the kernel, or in a more general fashion, using a generic "process-exec hook", that can be used for other purposes, should the need arise.
|<hubertf>||Frank, can you tell me about the exec-hook you added?|
|<fvdl>||It's a simple interface, the same as e.g. shutdown hooks.|
|<hubertf>||I can imagine what shutdown hooks do, but exec hooks? Are they called before any exec ?|
|<hubertf>||This sounds slow. What sort of hooks would one add there?|
|<fvdl>||Why would it be?|
|<hubertf>||Traversing a list of hooks, calling a function, checking the return value - sounds slow to me (but what do I know...)|
|<fvdl>||In this case, there are is only one hook present, and only if a process is sugid, was accessed through procfs, and execs. The return value isn't checked. If you look at everything that's going on during an exec(), it's minor.|
|<hubertf>||So what does that hook then do - check if stderr is on a procfs mem file, and bomb out if so?|
|<fvdl>||The hook revokes all vnodes that reference the process, through procfs, if it's about to exec an suid binary.|
|<hubertf>||Why revoke all vnodes, not only the ones for stdin/out/err, i.e. for file descriptors 0, 1 and 2?|
|<fvdl>||The kernel should not have knowledge about their special status. Any potential problems with their special status should be solved in userspace.|
|<hubertf>||OK. So getting back to the exploit, that evil binary will not get a malliciously setup stderr, even though it tried to do so?|
|<fvdl>||It won't get a bad stderr, because the vnode for it was nuked.|
|<hubertf>||Like a closed stderr?|
|<fvdl>||The process trying to write it will get EIO, see revoke(2) (a low-level, in-kernel version of it of course)|
|<hubertf>||Ok. So, this exec-hook basically says "get your fingers off my /proc/.../mem" ?|
|<fvdl>||"get your fingers off my /proc/getpid()/anything"|
|<hubertf>||Thank you for your time! :)|
setenv CVSROOT email@example.com:/cvsroot cvs rdiff -r1.106 -r1.107 syssrc/sys/kern/kern_exec.c cvs rdiff -r1.52 -r1.53 syssrc/sys/kern/kern_subr.c cvs rdiff -r1.100 -r1.101 syssrc/sys/sys/systm.h cvs rdiff -r1.27 -r1.28 syssrc/sys/miscfs/procfs/procfs.h cvs rdiff -r1.28 -r1.29 syssrc/sys/miscfs/procfs/procfs_subr.c cvs rdiff -r1.31 -r1.32 syssrc/sys/miscfs/procfs/procfs_vfsops.c