A Perl subroutine for processing the output of the
Solaris ps -ef command
 

Introduction

Recently I created a series of Perl programs to help me weed through the output of the Solaris "ps -ef" command.  In particular, I was interested in understanding the parent/child relationship between processes running on the system, and understanding what processes were "hogging" the CPU on my system.

In this article we're going to analyze a subroutine from these programs that processes the output of the ps -ef command into a series of hashes that can be used in your programs.
 

Analyzing the output of the ps -ef command

As part of a recent project, I decided to create several Perl programs to help analyze the output of the Solaris ps -ef command.  As part of the debugging process of new software I was creating, I was constantly running ps -ef to track the parent/child relationships between processes.  I was concerned about processes being spawned and terminated properly, and running ps -ef was a good way to track what was really happening.  I created a program named "ancestors.pl" to track the parent/child relationship of processes.

As I proceeded further into my project, I also needed to determine which processes were consuming the most CPU time on the system.  I decided to again use Perl to run the ps -ef command every 10 seconds and analyze it's output.  The Perl program was designed to report the most active processes during those 10-second intervals.  For instance, if a particular process consumed six seconds of CPU time over a 10-second interval, this Perl program (named "CPU hog") reported a CPU usage factor of 60% during that time period.
 

Output from ps -ef

Output from the Solaris ps -ef command consists of eight fields of information per record.  The first few lines of output from a ps -ef command might look like this on a Solaris system:

As you can see, these are eight text fields.  Most fields are variable-width fields, which can make processing the data a challenge with some languages.  Fortunately it's fairly easy with Perl.

For those who are not familiar with Solaris or the ps command, Table 1 provides a brief description of these output fields.
 
 
Field Name Field description
UID User ID of the current process (i.e., the user this process belong to)
PID Process ID of the current process
PPID Process ID of the parent process
C Obsolete (processor utilization for scheduling)
STIME Starting time of the process
TTY Controlling terminal for the process
TIME Cumulative execution time for the process
CMD Unix command name
 
Table 1:  This table defines the eight fields of output generated the the Solaris "ps -ef" command. 
 

The purpose of the GetPsefData subroutine

The purpose of the GetPsefData subroutine is to run the ps -ef command, and store the results of this command into a series of hashes.  The primary fields of interest are the process-id (pid), user-id (uid), parent process-id (ppid), time, and Unix command-name (ucmd) fields.  The remainder of the ps -ef output fields are ignored for the time being.

Each time a program wants new ps -ef data, it calls this subroutine to run the ps -ef command and put the data into the proper hashes.

The GetPsefData subroutine is shown in Listing 1.
 
sub GetPsefData { 
   open(PSEF_PIPE,"ps -ef|"); 
   $i=0; 
   while (<PSEF_PIPE>) { 
      chomp; 
      @psefField      = split(' ', $_, 8); 
      $pid[$i]        = $psefField[1]; 
      $uid{$pid[$i]}  = $psefField[0]; 
      $ppid{$pid[$i]} = $psefField[2]; 
      ($min,$sec)     = split(/:/,$psefField[6]); 
      $time{$pid[$i]} = $min * 60 + $sec; 
      $ucmd{$pid[$i]} = $psefField[7]; 
      $i++; 
   } 
   close(PSEF_PIPE); 
} 
 
 
Listing 1:  The GetPsefData subroutine opens a pipe to the Unix "ps -ef" command, and processes the output of the command into various hashes. 
 

Analyzing the GetPsefData subroutine

The GetPsefData subroutine uses the open statement to execute the ps -ef command and create a pipe that we can read from.  It attaches this to a file handle named "PSEF_PIPE" that we read from inside the while loop.

The first thing we do inside the while loop is run the chomp function.  The chomp command is used to remove the trailing newline character from the string, because I don't want it to end up in a variable during later processing.
 

Breaking a ps -ef record into fields with the split function

The next line in the while loop creates an array named "@psefField" that contains eight array elements.  In this statement, the split function uses the variable $_ as input to generate the eight array elements.

The first parameter in the split function call, ' ' (a blank space), indicates that the "fields" contained in the variable $_ are separated by "whitespace".  It also means that whitespace at the beginning of a line should be ignored, which is very important in this case, because each record begins with whitespace.  (If you're not familiar with it, the term whitespace is defined as any combination of blanks, tabs, or newline characters.)  The last two parameters in the split function identify that we want to split the variable $_ into a maximum of eight fields.

The next statement in the while loop assigns the value of $psefField[1] to an array variable named $pid[$i].  The PID is an important field in the output, because it is unique for each record.  Each process running on the system will have a different PID value.  We'll use this to our advantage in this program.
 

Putting the data into hashes

The next two lines in the code are:

Here we're creating two hash variables, uid and ppid. Within this subroutine four hash variables are created - uid (the user id), ppid (the parent process of the current PID), time (the current cumulative runtime of the pid), and ucmd (the name of the Unix command corresponding to the PID).  The other two variables will be assigned shortly.

Because uid, ppid, time, and ucmd correspond to the current process-id (pid), it makes life easier to create these as hash variables.  This works out great, because later on, when I want to refer to the uid, ppid, time, and ucmd of process id 100, I simply type:

Hashes make this type of problem a breeze to work with.  From now on, any time I want to access information about a given PID, I just refer to that information using the PID as the subscript.
 

Splitting again

Returning to the code in Listing 1, the next line in the loop uses the split command again.  This time, instead of splitting the entire input line, I'm just splitting the contents of the seventh @psefField array element.  Because it's a cool language, Perl lets us split this array element into two variables, $min and $sec, with one statement.

Notice that in this step we use the ":" character as the field delimiter.  Because the time field looks something like "10:15" (10 minutes, 15 seconds of CPU time consumed), we know that the $min information is to the left of the ":" character, and the $sec information is to the right of the ":" character.  In this example the variable $min is assigned the value 10, and the variable $sec is assigned the value 15.

The next line of the program calculates the time consumed by the current process in total seconds, and assigns this value to the hash variable $time{$pid[$i]}.  This array lets us track time differences in programs like a "CPU hog" program.

The final line puts the name of the current Unix command in the hash %ucmd.  Notice that as a result of our split command used earlier, $ucmd{$pid[$i]} may contain one word or several words with embedded blank spaces. This is important (and desirable), because the command name is a variable length field, and may contain one or more words.
 

Summary of GetPsefData

In summary, the GetPsefData subroutine does several things for us:


Important notes

This subroutine was written to process the output of the "ps -ef" command on Sun's Solaris operating system.  This is one of those cases where there are differences between Unix operating systems, so the same function may not work properly on other versions of Unix, such as AIX, HP-UX, UnixWare, or freeBSD.  The fields of output on those systems may be different than the output fields generated by Solaris, and we haven't tested those systems.  More than likely, small changes will be required for other Unix systems.
 

The future

In our next article, we'll examine how this subroutine is used within real Perl programs to generate useful information from the output of the ps -ef command.  Specifically, we'll show how to track the ancestry of processes using the ancestors.pl program, which relies heavily on this subroutine.