Load Average, to know how busy your server is

Submitted by Alexis Wilke on Sat, 02/06/2010 - 01:27

Whenever you run many applications on a server, such as many instances of your web server (i.e. Apache), you may want to know how much of the system is being used. This can be important to avoid overloading your system.

The load is computed by the kernel. The kernel knows how many processes are running and how much processing power your server has (i.e. how many processors.) With that information, it infers how many processes are being held back for a period of time. The default periods are 1, 5 and 15 minutes (I'm sure you can recompile your kernel with different timings!)

There are several ways to know about the current server load. An instant load review, use uptime. You will a label named load average followed by three numbers: the number of processes waiting in the last minute, last 5 minutes and last 15 minutes.

When the load is increasing, the numbers go from larger to smaller. When the load is decreasing, the numbers go from smaller to larger.

A server that doesn't do much will have a load under 1. A very busy server can have a load of 50. Note that at 50, you better have a lot of memory to keep all the processes alive...

The other command to check your load is top (Note: I did not say gtop). Top shows the load average at the top right of your screen and refreshes your screen every 5 seconds or so (you can change the refresh rate.) top shows you another line of information that is very important, the CPU usage. Why is that important? Because it will show you whether your processors are being used efficiently. Look at the entry followed by "id" and the one followed by "wa".

"id" stands for Idle. That's when one or more of your processors is sitting around whistling... (i.e. doing nothing.)

"wa" stands for Wait. That's when the processor initiated a hardware command is is waiting for the results. In case of a server, those are, 99% of the time, waits on the hard drive.

If, when your load average is high, your Idle and Wait are high, then that means your computer is not used efficiently. In other words, it is waiting for data more than running code. There are two things you can do here, if you can afford it, get better hardware (better as in faster hard drives with larger caches,) otherwise, look at your memory, if you still have memory available, you may want to increase the number of processes run in parallel. Those could then be running while other processes are waiting for data. This is especially true if you are using a database system.

Another way to find out about your load average is to look at the content of the /proc/loadavg file. You can do so with a simple cat command in your shell:

   cat /proc/loadavg

Note that the output will include some other numbers as shown here:

   3.03 2.84 2.88 6/249 16526

We recognize the first 3 entries as the load average we've seen with uptime and top.

Then comes 6/249 which means 6 processes will run in the next switches and 249 processes are instantiated (most are sleeping.)

Then, the last number, 16526, is the process identifier (pid) of the process that will run next.

As we can see here, on a server with 2 processors, a load average of 3 is a little over what the server can handle in realtime. This means it will respond in a slower manner than what you'd want to... but it won't crash. Note that with 2 processors you can run 2 processes simultaneously.

Now, if we had a load of 50, then it becomes very slow. The system will have to run 2 processes at a time to slowly get rid of all 50 processes. And if you start swapping (using the swap memory to save some data that does not fit in memory,) it will be really very slow... and at that point you may experience some crashes. Mainly because the system cannot keep up between all the processes and the swap (and most programs are not handling out of memory errors properly.)

If you record a set of load average from /proc/loadavg, you will have a way to look at the load over time. This is quite useful to manage your server. I have a small Shell script to do that:

  #!/bin/sh
  OUTPUT=loadavg-`date +%Y%m%d`.txt
  echo "`date +%H:%M:%S` \c" >>$OUTPUT
  cat /proc/loadavg >>$OUTPUT

The script saves the load average preceded by a time stamp in a file named <year,month,day>.txt. Very simple. You should run that script once every now and then using cron. Probably 1, 5 or 15 minutes, more than 15 minutes and you won't get very precise information, 1 you may create some extra problems on your server... so 5 or 10 is a good idea. The crontab could look something like this:

# Run once per minute
SHELL=/bin/sh
HOME=/path/to/your/loadavg/code
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

*/5 * * * * $HOME/run

Change the HOME path to your liking. It is a good idea to define that variable. The */5 means run once every 5 minutes.

Be careful, this script has no delete ability, it will only add one new file per day. Either stop it after a little while or add some other script to delete older files once a day or week.

Add new comment

Main Menu

Load Average, to know how busy your server is