The Linux Page

BIND errors between masters and slaves

Twin Apples

Using BIND as a Master and a Slave simultaneously

I got BIND on two servers (the named service). Each server is at the same time a master and a slave depending on the domain name. For instance, I host http://www.alexishouses.com on my server and http://www.m2osw.com on my company's server.

Up to here, easy.

The slaves are expected to update themselves whenever the master emits a change in a zone.

Somehow, one of my domain names would not work at all on the slave (company's server). I searched and searched and I just couldn't find out why it was not working. Finally, I tried to search the named.conf.local of the slave using the master domain name (i.e. copy & paste). That's how I discovered that I misspelled the name in the slave!!!

Errors using the wrong spelling
Mar 18 17:21:40 web named[21982]: zone giovanna-lorruso.com/IN: refresh: unexpected rcode (REFUSED) from master 64.166.38.38#53 (source 0.0.0.0#0)
Mar 18 17:21:40 web named[21982]: zone giovanna-lorruso.com/IN: Transfer started
Mar 18 17:21:40 web named[21982]: transfer of 'giovanna-lorruso.com/IN' from 64.166.38.38#53: connected using 69.55.238.181#53954
Mar 18 17:21:40 web named[21982]: transfer of 'giovanna-lorruso.com/IN' from 64.166.38.38#53: failed while receiving responses: NOTAUTH
Mar 18 17:21:40 web named[21982]: transfer of 'giovanna-lorruso.com/IN' from 64.166.38.38#53: end of transfer

As you can see this error was not clear. The problem was that lorusso is one R and two S's...

Now that was not all. I was getting another kind of error too:

Errors using BIND 9.3, notification refused
Mar 18 21:34:14 web named[16331]: zone giovanna-lorusso.com/IN: sending notifies (serial 46)
Mar 18 21:34:14 web named[16331]: client 69.55.238.181#45565: received notify for zone 'giovanna-lorusso.com'
Mar 18 21:34:14 web named[16331]: zone giovanna-lorusso.com/IN: refused notify from non-master: 69.55.238.181#45565

As we can see the BIND tool wants to update the domain name by sending a notification. But that notification is being refused. This happens since BIND 9.3 apparently. The concept is simple but quite a gotcha! The server is actually sending a notification to itself and chances are if you are reading this you have the same problem:

You need to allow the server to notify itself!

I had a hard time to see that with that error because of the other error I had before. I found a site where someone else ran in that problem and had a simple solution:

https://benjamin.sonntag.fr/a37-named_zone_sonntag_eu_org_refused_notify_from_non-master.html

Add a notify-allow { <myself>; } where <myself> is your slave server IP address. If you are like me, you want to add that to both your servers.

More info about BIND (but really no solution to any of your everyday problems...):

http://www.bind9.net/manuals

Determining whether BIND is running as Master or Slave

By default, BIND will be a master.

It is possible to always run BIND as a master and simply duplicate your zones by hand (with a mechanism of your own such as an rsync) on all your servers.

However, the named service offers a way to copy zones between servers. In that case, one server is called the master and the others are slaves. The number of servers is not limited, although you probably want to keep it small to keep it manageable.

Slave servers are good if you have several offices or when you own domain names in which case you need at least two servers. (You do not need your own DNS, though, most domain name resellers will offer you free access to their DNS).

To transform a master named server into a slave, you use the allow-update option. When this option is used in the global space of named.conf, then the BIND service runs 100% as a slave.

If, like me, you want to run as a Master for some domains and as a Slave for others, then you want to be very careful on how you use the allow-update option.

I suggest that you use the option only in the zones marked as slave (i.e. type slave;). Yet, if you have just one or two masters, you can as well write one global allow-update and then cancel that declaration in your master entries as in:

zone "alexishouses.com" {
    type master;
    file "/etc/bind/alexis/alexishouses.com";
    allow-update { none; };
};

An example of allow-update with an IP address appears below. allow-update can include several IP addresses separated by semi-colons.

Errors about permissions denied
22-Aug-2010 15:40:27.246 general: error: dumping master file: /etc/bind/alexis/tmp-I2lHgTnElm: open: permission denied
22-Aug-2010 15:40:27.247 xfer-in: error: transfer of 'alexishouses.com/IN' from 64.166.38.38#53: failed while receiving responses: permission denied

Today I noticed that the bind server was working in only one direction. Wondering, I looked at the logs (after I turned on the feature) and noticed that the tool was telling me that it did not have permissions to write to the alexis folder (often called slave although in our case we have several people and used several folders to segregate the names). I verified all the folders and files, made sure they were owned by bind, just like on the other server, and it still did not work.

So I search on the Internet and found a post on Ubuntu that gave up the answer:

Although all the permissions look correct as per the old days,
they are not correct for the new days of apparmor.
Yes! apparmor will prevent the write whether you want it or not.

Look at the settings of apparmor and see that the /etc/bind folder is a read-only folder now (which is a GREAT thing, since that's how it should always have been!)

Instead, BIND is expected to write dynamic files in /var/cache/bind/*

However, if you are like me, you probably have something like this in your configuration file:

zone "alexishouses.com" {
    type slave;
    file "/etc/bind/alexis/alexishouses.com";
    masters { 64.166.38.38; };
    allow-update { 64.166.38.38; };
};

When such a zone configuration on your server, the zone is considered a slave. Here we also tell that slave that the master is at a specific IP address. That IP address is the only trusted source. Good.

Then, in between, we notice the file specification. This is a path and file name of the zone used to save the slave server version of the zone...

In your apparmor /etc/apparmor.d/usr.sbin.named configuration file you will see this:

/etc/bind/** r,
/var/lib/bind/** rw,
/var/lib/bind/ rw,
/var/cache/bind/** rw,
/var/cache/bind/ rw,

As you can see, anything under /etc/bind will be write protected from bind (DO NOT CHANGE IT!). Whether or not bind wants it that way does not matter here.

On the other hand, we have two read-write folders: /var/lib/bind and /var/cache/bind. The second one is where dynamic slave files are saved once you fix you configuration. The file "..." specification needs not include a path at all and the system automatically puts the files under the right folder.

zone "alexishouses.com" {
    type slave;
    file "alexishouses.com";
    masters { 64.166.38.38; };
    allow-update { 64.166.38.38; };
};

As you can see, that is nearly the same as the previous configuration, we just removed the path in the file specification.

Now everything works! Don't forget, if you want immediate update of slave name servers you've got to restart your name servers AND bump up the serial number of zone files that you modify. This is how they know that it was bumped up.

Allowing Recursion

You may notice that you get a warning about recursion:

WARNING: recursion requested but not available

You can allow recursion between your DNS server, which is safe, but you must make sure to prevent anyone else from using your recursion mechanism.

In most cases, recursion is only necessary on an Intranet where a server on your LAN queries your DNS on the main server connected to the Internet. In that case, you want your DNS to accept such requests.

BIND still allows us to open up widely like so:

allow-recursion { any; };  # NEVER DO THAT!!!

I had that on one of my BIND servers and I got the massive surprise of a DNS Amplification Attack.

Instead, you want to limit the computers that can use recursion to only your trusted network:

allow-recursion { trusted-servers; };

where trusted-servers is an ACL where you can enter all of your computer IP addresses.

No Error, but still not accessible?

I have had problems with perfectly valid zones, but getting no proper reply when I try to ping the DNS server.

So for example, say I have a problem with m2osw.com and I check the following:

dig @ns1.m2osw.com www.m2osw.com

Then you would expect the usuall answer giving us the IP address of www.m2osw.com sub-domain, but instead we get:

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> @ns1.m2osw.com www.m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13338
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.m2osw.com.            IN    A

;; Query time: 15 msec
;; SERVER: 138.197.205.139#53(138.197.205.139)
;; WHEN: Fri Jun 05 15:21:41 PDT 2020
;; MSG SIZE  rcvd: 42

which more or less says "I kind of understand what you're telling me, but I don't really get it."

The fact is that on startup, BIND reads the data of a zone from the corresponding journal and depending on something or other (I would imagine a couple of dates somewhere?), it decides to use the journal rather than the new definitions but there is a form of conflict and in the end it uses neither.

To resolve this problem, you need to delete the BIND cache file. These are saved in the /var/lib/bind folder and have a name equivalent to the zone plus the .jnl as the extension to that file.

sudo rm /var/lib/bind/m2osw.com.zone.jnl

Then restart the service so it works as expected (until you restart, it will use the old data it has in memory and still won't work):

sudo systemctl restart bind9

Try the dig command again and you shall see the correct response. Something like this:

$ dig @ns1.m2osw.com www.m2osw.com

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> @ns1.m2osw.com www.m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31538
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.m2osw.com.            IN    A

;; ANSWER SECTION:
www.m2osw.com.        300    IN    A    138.197.205.139

;; AUTHORITY SECTION:
m2osw.com.        86400    IN    NS    ns1.m2osw.com.
m2osw.com.        86400    IN    NS    ns2.m2osw.com.

;; ADDITIONAL SECTION:
ns1.m2osw.com.        86400    IN    A    138.197.205.139
ns2.m2osw.com.        86400    IN    A    96.67.192.225

;; Query time: 19 msec
;; SERVER: 138.197.205.139#53(138.197.205.139)
;; WHEN: Fri Jun 05 15:30:26 PDT 2020
;; MSG SIZE  rcvd: 126

Side Note:

Why do things go bad?

I'm not too sure. Especially, if I modify my zone manually and try again, it doesn't change the fact that it's going to use the invalid journal. My take would be that there is some other process which updates the journal, somehow, and it's not 100% properly synchronized if we receive a restart at about the same time. Either that or the journal was given a date of validity which will make it last a long time.

Other errors?

Got another error and don't know how to fix it? Post a comment below with the info.