Dumping a truss log while running a process
This is probably one of the first few things you should do if you cannot figure out why a process is aborting, or simply not running. What you do here is execute truss followed by the command that is not working too well, eg:
# truss -vall -f -o /tmp/truss.log nslookup sarah
In this case, we are capturing to a log file (/tmp/truss.log) everything that the command nslookup is doing. This includes files it is trying to open, reads, writes.
If you examine the log file, you will see certain highlights like:
- open("/etc/resolv.conf", O_RDONLY) = 3
- open("/home/sextone/.nslookuprc", O_RDONLY) Err#2 ENOENT
- open("/dev/udp", O_RDWR) = 3
- ioctl(3, I_PUSH, "sockmod") = 0
- write(1, " 1 0 . 2 0 . 0 . 5 7".., 14) = 14
- _exit(0)
Explanation of individual lines:
- nslookup is trying to open /etc/resolv.conf as read only. This makes sense since this is the location of the name server addresses. In order to have DNS you have to have servers in this file. You know it successfully opened this file because of the = 3 part. This means it opened the file and assigned it a file handle (file descriptor) of 3.
- Hey, you learn something new every day. Who would have thought one could have a resource configuration file called .nslookuprc in your home directory! Since I did not make one, it can't find it.
It is returning an error: Err#2 ENOENT. We better check out what that abbreviation means.
# grep ENOENT /usr/include/sys/*
errno.h:#define ENOENT 2 /* No such file or directory */
Excellent! This makes sense. Remember the errno.h file.
- Okay, nslookup is not trying to ride on /dev/udp. This must mean that it does not depend on TCP/IP but rather UDP.
- IOCTL, sockmod... hmmm. Sound like a socket connection to the name server. Not sure though.
- Something is writing back the actual DNS lookup of sarah, which is 10.20.0.57
- Exit(0). Okay return.
Another useful tactic, especially on a very big truss log, is to grep for open. This way, you can see which files the process is trying to open up. Here is an example:
Run "ps -aef | grep in.named" and get the process ID number
for in.named. Then, run "truss -vall -f -o /tmp/outfile -p PID"
(where PID is the process ID number number for in.named).
The output file (here stored in /tmp/outfile, but you can put it
anywhere you want), should contain a list of all system calls
made by the in.named process.
When the process dies, it should report at the end of the truss
output what the failure code is. Something like the following:
Incurred fault #6, FLTBOUNDS %pc = 0xEF2617FC
siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
Received signal #11, SIGSEGV [caught]
siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
You may be able to determine from the truss output alone what's
wrong (it could show that the process was trying to access a
certain file, or directory, but did not have access, for example).
This would probably be listed a few lines above the error/fault
section of the truss output.
If nothing is obvious from the truss output, you can use the "%pc"
entry, which is listed as part of the fault/failure code section of the
truss output, as input to adb to determine which instruction failed.
For example, you could run:
adb -k /usr/sbin/in.named core
0xEF2617FC/i
send_msg+0x44: save %sp, -0x68, %sp
You should get output showing the instruction which failed
(here the send_msg instruction).
You'll need to use $q to quit out of adb.