ntp[123]: Soliciting pool server 10.10.0.1 -- failing as main NTP server is not running

Submitted by Alexis Wilke on Sun, 07/19/2020 - 14:22

Introduction

Today I noticed a problem with one of my computer which clock was behind.

My main server NTP server somehow closed down (or was never started?) and so the time was not available to that slave.

In my ntp.conf I change the default URLs to time servers and use my own like so:

pool 10.10.0.1 iburst

This means this computer will make use of the time on server 10.10.0.1 and not some random (public) time servers.

Why would you do that you ask?

This way the computers are in close sync. between each others, especially if you are on an Intranet at home with a relatively slow connection to the Internet. In a service center such as DigitalOcean, it may be less of a concern, although there too, if you have a webapp which requires close synchronization, you probably will want to have a way to synchronize those computers together rather than all to varying sources.

Fixing the Message Above

Today I noticed the clock was off and checked the NTP on both servers. It was running on the slave but not my master computer. The master must have had a problem at boot time or something was updated in such a way that NTP did not stick around.

Solution

The solution was that ntp is not compatible with ntpdate. The ntpdate tool listens on the same ports (123) and as a result it can prevent ntp from restarting (i.e. with a Port Already In Use error. The ntpdate tool is useful to reset the date of your computer once a day, but the ntp deamon already does that for you on an ongoing basis so you really don't need both. The simplest to fix this problem is to purge ntpdate:

apt-get purge ntpdate

The one reason for using the ntpdate as well, is to set the date jut after an installation. Yet again, if you set your date and time approximately and then run your NTP service, you'll roughly get the same results. So I strongly advice that you remove the ntpdate when you want to run an NTP service.

Needless to say, when the master NTP service is not available, the slave is not going to ever be able to synchronize to anything.

It is possible to notice that the connection can't be obtained by looking at the syslog and seeing entries such as:

ntp[123]: Soliciting pool server 10.10.0.1

This means the local NTP server is sending UDP packets to port 123 formatted for a remote NTP server and that remote server never replies with anything.

In my case, I just had to restart the NTP server on my master computer and after a little while the NTP on the slave detected the service and it started adjusting the clock.

To test and see whether your messages do something, you can use the ntpdate tool like so:

ntpdate -d 10.10.0.1

If you recently changed your firewall, you probably want to have a look at it. That would definitely prevent packets from propagating properly. Keep in mind that in UDP packets need to go as input in both directions (although we now have a concept of "Connected" in our UDP firewall rules!)

If you don't use a local IP address, you can verify that the domain name is valid using the dig or similar command:

dig my-master-computer

This will work with any domain name. If dig can't figure out what IP address that name represents, then you won't be able to access anything.

For a main NTP setup, you generally have multiple server entries. You should check the domain name of each entry. It could be that one of these was shutdown or renamed. You can comment it out and fix the name.

NTP uses port 123 by default. It will also listen and send packets on IPv4 and IPv6 unless you properly turn off one or the other. Your firewall has to let go of packets on port 123 in both directions: IN and OUT.

Checkout the peers

There is a way to look at the current status of the peers. This is easily achieved using the ntpq command like so:

$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.10.0.1       .POOL.          16 p    -   64    0    0.000    0.000   0.000
 agent007        129.250.35.251   3 u   38   64    1    0.181  -16.413  13.52

Note: the delay, offset, and jitter columns are in milliseconds by default. Newer versions of ntpq allow you to change the unit using the -u command line option.

If you do not see the second line, then NTP is not able to connect.

The "reach" column will go from 0 to 377. It is a mask of 8 bits showing you how many times replies were received by this NTP server. When 0, nothing is being sent or received. When a number other than 377, then some packets were lost. When you just started the server (like just after a reboot), it is normal to see a number other than 377. For a local network, it should otherwise become and stay 377 for the whole time your server is running.

Still doesn't work?

So you removed/purged the ntpdate package and you still have problems?

One important thing to check out in this case is your firewall. There isn't much else that would prevent the UDP traffic between your LAN computers.

You may also have a VPS and that computer may not have access to your host computer. This is the default for a VPS. To fix that, you have to select the proper Network Mode. I use the Bridged Adapter to make my VPS work in this way. Just be careful since that gives your VPS systems direct access to your host which may not be what you want...

Add new comment

Main Menu