The kernel must keep track of the following data for each process on the system:
A process has certain attributes that directly affect execution, these include:
umbc9[8]# ps -l F S UID PID PPID C PRI NI P SZ:RSS WCHAN TTY TIME COMD 30 S 0 11660 145 1 26 20 * 66:20 88249f10 ttyq6 0:00 rlogind 30 S14066 11662 11661 0 26 36 * 129:43 88249f10 ttyq6 0:00 zwgc 30 S 0 11681 11663 0 39 36 * 85:27 88246890 ttyq6 0:00 csh 30 S14066 11661 11660 0 29 36 * 86:33 8815012c ttyq6 0:00 login.kr 30 S14066 11663 11661 0 39 36 * 86:27 88246890 ttyq6 0:00 csh 30 R 0 12539 11681 46 98 36 0 207:171 ttyq6 0:01 ps
The man page for ps describes all the fields displayed with the ps command as well as all the command options. Some important fields you must know are the following:
For many signals there is really nothing that can be done other than
printing an appropriate error message and terminating the process. The
signals that system administrators will use the most are the
A common problem system administrators will see is one where a user made a mistake and is continuely forking new processes. While all users have some limit on the number of processes they can fork, as they reach that limit they will wait, if you kill a process the system will resume creating new processes on behalf of the user. The best way to handle this is to send the STOP signal to all processes. In this way, all processes are now suspended, then you can send a KILL signal to the processes. Since the processes were first suspended they can't create new processes as you kill the ones off.
SIGHUP 01 hangup
SIGINT 02 interrupt
SIGQUIT 03[1] quit
SIGILL 04[1] illegal instruction (not reset when caught)
SIGTRAP 05[1][5] trace trap (not reset when caught)
SIGABRT 06[1] abort
SIGEMT 07[1][4] EMT instruction
SIGFPE 08[1] floating point exception
SIGKILL 09 kill (cannot be caught or ignored)
SIGBUS 10[1] bus error
SIGSEGV 11[1] segmentation violation
SIGSYS 12[1] bad argument to system call
SIGPIPE 13 write on a pipe with no one to read it
SIGALRM 14 alarm clock
SIGTERM 15 software termination signal
SIGUSR1 16 user-defined signal 1
SIGUSR2 17 user-defined signal 2
SIGCLD 18[2] death of a child
SIGPWR 19[2] power fail (not reset when caught)
SIGSTOP 20[6] stop (cannot be caught or ignored)
SIGTSTP 21[6] stop signal generated from keyboard
SIGPOLL 22[3] selectable event pending
SIGIO 23[2] input/output possible
SIGURG 24[2] urgent condition on IO channel
SIGWINCH 25[2] window size changes
SIGVTALRM 26 virtual time alarm
SIGPROF 27 profiling alarm
SIGCONT 28[6] continue after stop (cannot be ignored)
SIGTTIN 29[6] background read from control terminal
SIGTTOU 30[6] background write to control terminal
SIGXCPU 31 cpu time limit exceeded [see setrlimit(2)]
SIGXFSZ 32 file size limit exceeded [see setrlimit(2)]
Once you understand the political implications on who should get priority you are ready to manage the technical details. As root, you can change the priority of any process on the system. Before doing this it is critical to understand how priority works and what makes sense. First, while CPU is the most watched resource on a system it is not the only one. Memory usage, disk usage, IO activity, number of processes, all tie togethor in determining throughput of the machine. For example, given two groups, A and B. Both groups require large amounts of memory, more than is available when both are running simultaneously. Raising the priority of group A over Group B may not help things if Group B does not fully relinguish the memory it is using. While the paging system will do this over time, the process of swapping a process out to disk can be intensive and greatly reduce performance, especially if this becomes a recurring problem if process b gets swapped back in. Possibly a better alternative is to completely stop process b with a signal and then continue it later when A has finished.
Unix does provide the command nice [increment] command to lower a priority of a process. This is a command that should be run by users who are running large jobs. As system administrator you may need to explain this to them. There are two versions of this command that have an opposite syntax. The csh uses a positive increment to change the NICE value for this process, the larger the value the lower the priority. The bourne shell version of the nice command uses a negative increment to change the value. Remeber to use the appropriate form depending on your shell.
As system administrator you can use the
/etc/renice priority [ [ -p ] pid ... ] [ [ -g ] pgrp ... ] [ [ -u ] user
User education is always the key. One common problem seen is using running three jobs simultaneously when it would be much better if they ran these jobs chained back-to-back.
Another command that is useful is lsof for viewing open files of processes. This command is NOT part of standard Unix but is available over the net. The command fuser is another command that is useful for showing what processes are using certain files.