As we can see the BIND tool wants to update the domain name by sending a notification. But that notification is being refused. This happens since BIND 9.3 apparently. The concept is simple but quite a gotcha! The server is actually sending a notification to itself and chances are if you are reading this you have the same problem:
You need to allow the server to notify itself!
I had a hard time to see that with that error because of the other error I had before. I found a site where someone else ran in that problem and had a simple solution:
https://benjamin.sonntag.fr/a37-named_zone_sonntag_eu_org_refused_notify_from_non-master.html/
Add a notify-allow { <myself>; } where <myself> is your slave server IP address. If you are like me, you want to add that to both your servers.
More info about BIND (but really no solution to any of your everyday problems...):
http://www.bind9.net/manuals
Determining whether BIND is running as Master or Slave
By default, BIND will be a master.
It is possible to always run BIND as a master and simply duplicate your zones by hand (with a mechanism of your own such as an rsync) on all your servers.
However, the named service offers a way to copy zones between servers. In that case, one server is called the master and the others are slaves. The number of servers is not limited, although you probably want to keep it small to keep it manageable.
Slave servers are good if you have several offices or when you own domain names in which case you need at least two servers. (You do not need your own DNS, though, most domain name resellers will offer you free access to their DNS).
To transform a master named server into a slave, you use the allow-update option. When this option is used in the global space of named.conf, then the BIND service runs 100% as a slave.
If, like me, you want to run as a Master for some domains and as a Slave for others, then you want to be very careful on how you use the allow-update option.
I suggest that you use the option only in the zones marked as slave (i.e. type slave;). Yet, if you have just one or two masters, you can as well write one global allow-update and then cancel that declaration in your master entries as in:
zone "alexishouses.com" {
type master;
file "/etc/bind/alexis/alexishouses.com";
allow-update { none; };
};
An example of allow-update with an IP address appears below. allow-update can include several IP addresses separated by semi-colons.
Today I noticed that the bind server was working in only one direction. Wondering, I looked at the logs (after I turned on the feature) and noticed that the tool was telling me that it did not have permissions to write to the alexis folder (often called slave although in our case we have several people and used several folders to segregate the names). I verified all the folders and files, made sure they were owned by bind, just like on the other server, and it still did not work.
So I search on the Internet and found a post on Ubuntu that gave up the answer:
Although all the permissions look correct as per the old days,
they are not correct for the new days of apparmor.
Yes! apparmor will prevent the write whether you want it or not.
Look at the settings of apparmor and see that the /etc/bind folder is a read-only folder now (which is a GREAT thing, since that's how it should always have been!)
Instead, BIND is expected to write dynamic files in /var/cache/bind/*
However, if you are like me, you probably have something like this in your configuration file:
zone "alexishouses.com" {
type slave;
file "/etc/bind/alexis/alexishouses.com";
masters { 64.166.38.38; };
allow-update { 64.166.38.38; };
};
When such a zone configuration on your server, the zone is considered a slave. Here we also tell that slave that the master is at a specific IP address. That IP address is the only trusted source. Good.
Then, in between, we notice the file specification. This is a path and file name of the zone used to save the slave server version of the zone...
In your apparmor /etc/apparmor.d/usr.sbin.named configuration file you will see this:
/etc/bind/** r,
/var/lib/bind/** rw,
/var/lib/bind/ rw,
/var/cache/bind/** rw,
/var/cache/bind/ rw,
As you can see, anything under /etc/bind will be write protected from bind (DO NOT CHANGE IT!). Whether or not bind wants it that way does not matter here.
On the other hand, we have two read-write folders: /var/lib/bind and /var/cache/bind. The second one is where dynamic slave files are saved once you fix you configuration. The file "..." specification needs not include a path at all and the system automatically puts the files under the right folder.
zone "alexishouses.com" {
type slave;
file "alexishouses.com";
masters { 64.166.38.38; };
allow-update { 64.166.38.38; };
};
As you can see, that is nearly the same as the previous configuration, we just removed the path in the file specification.
Now everything works! Don't forget, if you want immediate update of slave name servers you've got to restart your name servers AND bump up the serial number of zone files that you modify. This is how they know that it was bumped up.
Allowing Recursion
You may notice that you get a warning about recursion:
WARNING: recursion requested but not available
You can allow recursion between your DNS server, which is safe, but you must make sure to prevent anyone else from using your recursion mechanism.
In most cases, recursion is only necessary on an Intranet where a server on your LAN queries your DNS on the main server connected to the Internet. In that case, you want your DNS to accept such requests.
BIND still allows us to open up widely like so:
allow-recursion { any; }; # NEVER DO THAT!!!
I had that on one of my BIND servers and I got the massive surprise of a DNS Amplification Attack.
Instead, you want to limit the computers that can use recursion to only your trusted network:
allow-recursion { trusted-servers; };
where trusted-servers is an ACL where you can enter all of your computer IP addresses.
No Error, but still not accessible?
I have had problems with perfectly valid zones, but getting no proper reply when I try to ping the DNS server.
So for example, say I have a problem with m2osw.com and I check the following:
dig @ns1.m2osw.com www.m2osw.com
Then you would expect the usuall answer giving us the IP address of www.m2osw.com sub-domain, but instead we get:
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> @ns1.m2osw.com www.m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13338
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.m2osw.com. IN A
;; Query time: 15 msec
;; SERVER: 138.197.205.139#53(138.197.205.139)
;; WHEN: Fri Jun 05 15:21:41 PDT 2020
;; MSG SIZE rcvd: 42
which more or less says "I kind of understand what you're telling me, but I don't really get it."
The fact is that on startup, BIND reads the data of a zone from the corresponding journal and depending on something or other (I would imagine a couple of dates somewhere?), it decides to use the journal rather than the new definitions but there is a form of conflict and in the end it uses neither.
To resolve this problem, you need to delete the BIND cache file. These are saved in the /var/lib/bind folder and have a name equivalent to the zone plus the .jnl as the extension to that file.
sudo rm /var/lib/bind/m2osw.com.zone.jnl
Then restart the service so it works as expected (until you restart, it will use the old data it has in memory and still won't work):
sudo systemctl restart bind9
Try the dig command again and you shall see the correct response. Something like this:
$ dig @ns1.m2osw.com www.m2osw.com
; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> @ns1.m2osw.com www.m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31538
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.m2osw.com. IN A
;; ANSWER SECTION:
www.m2osw.com. 300 IN A 138.197.205.139
;; AUTHORITY SECTION:
m2osw.com. 86400 IN NS ns1.m2osw.com.
m2osw.com. 86400 IN NS ns2.m2osw.com.
;; ADDITIONAL SECTION:
ns1.m2osw.com. 86400 IN A 138.197.205.139
ns2.m2osw.com. 86400 IN A 96.67.192.225
;; Query time: 19 msec
;; SERVER: 138.197.205.139#53(138.197.205.139)
;; WHEN: Fri Jun 05 15:30:26 PDT 2020
;; MSG SIZE rcvd: 126
Side Note:
Why do things go bad?
I'm not too sure. Especially, if I modify my zone manually and try again, it doesn't change the fact that it's going to use the invalid journal. My take would be that there is some other process which updates the journal, somehow, and it's not 100% properly synchronized if we receive a restart at about the same time. Either that or the journal was given a date of validity which will make it last a long time.
Other errors?
Got another error and don't know how to fix it? Post a comment below with the info.