And here's another story of an #
update gone haywire.
From the beginning:
One of my hosts awaited a bunch of upgrades, including network card #
firmware, linux kernel and a bunch of other #
software updates. All in all nothing extraordinary, it's not like this never happened before.
What
did happen, was that this "bunch of other software" did include some internal changes which either were not documented clearly enough, or, equally probable, I didn't bother to look, because it were just revision updates, no minor or even major release number increments. Thus I didn't expect much.
I was dead wrong.
Once the upgrade was completed the host started to show unexpected behaviour. One was that the hardware monitoring software kept bragging about a lost connection to one of my servers. It did come back up in a couple of minutes, but the messages kept piling up in my inbox every other hour. The other issue was the machine was now unable to synchronize its distribution's package list with the master servers - because it could not access any of the distribution's keyservers anymore.
Two unrelated functions went out of service, coinciding with the last reboot (and subsequent firmware upgrade) of the host. Also, other services began to become slow to respond.
My first assumption was thus the new #
Linux kernel on the host and thus I booted into my backup kernel - nothing changed.
Next up: network card firmware. My internet connection uses exactly that network card, and it wasn't the first time one of my cards didn't work too well with newer drivers. So I downgraded that too again. To no avail.
I began to believe I'd have to live with the new situation, as the website of my monitoring software wasn't too encouraging on a fix of that intermittent and recurring loss of connectivity problem. But I could live with that.
The other issue was way more serious. I needed to be able to synchronize the package repository. This was especially strange as my other hosts didn't suffer from that problem.
It turned out all hosts without the issue had not yet updated their package management software. Those host which had the update did not even bother to talk to the keyservers which were clearly operational as the other hosts showed. By chance I found the updated hosts were instead trying to talk to my local package proxy instead of the keyservers.
So the solution here was that the revision update of the package manager now used its proxy setting not only for the packages itself but for every HTTP connection it opened - which obviously created an issue as my package proxy was unable to forward the keyserver requests.
That solved the remaining issue was the lost connection problem of the monitoring software.
Again, chance came to help and I just accidentally looked at the affected host, discovering it now had, instead of the intended one interface already three of them, none of which was deletable.
Now that
was a pointer.
Manually sifting through the software's database it became clear that each of the three interfaces, all pointing to the same target, had some of the monitor's items bound, thus they couldn't be removed.
The solution proved to be as simple as the proxy-problem from above: Stop the monitoring, manually edit the database to have all items bound back to the first and intended interface, remove the other two and restart monitoring.
That problem has never shown up again since.
Again, it was just a revision update, no minor or even major release.
I still haven't bothered to read through all the ChangeLogs but was confirmed both changes "just appeared" somewhere in the past.
Apparently nobody bothered to think about wider implications to the changes implemented, not to speak of what would happen if they didn't go clean as with the apparent reorganization in the monitoring software.
In the end it turned out neither the Linux kernel nor the firmware upgrade was at fault here, it was just the affected software itself.
As for the "other services" which were slow to respond, I can only guess it was just a correlation of events, not causality.
Long story short: even revision updates
can cause unexpected issues, so keep a backup, keep a way back open and be prepared to dive deep when these issues occur.
I, for myself, would whish such changes would not show up in updated revisions but cause at least a jump in the minor version number. But I guess that's as always at the discretion of the maintainer of the project in question and can't be guaranteed at all.